(Quirky) Roadmap for New Grad Data Scientists

2022-03-13

41 minute read

New Grad Timeline

Since last November, I regularly get questions about how to get data science jobs. I’ve hesitated to give advice because no advice applies to everyone. As more people ask, however, I want to write a post for those in my shoes:

(PhD) students applying to machine learning or product data scientist roles at tech companies through university recruiting.

What’s special about university recruiting

July to October is high time to start: Some say companies have more headcounts at the beginning of each year — that’s just not true for university recruiting. Most new grads get hired in the Fall (August to December), much fewer next Spring. Apply early, even if you need to get ready later!

Every year around August, you’ll start to see openings with “New Grad”, “University Grad”, “Masters/PhD Graduate”, etc. in job titles. Take advantage of them if you’re a current student. University recruiting is special in several regards:

Companies don’t expect you to start right away. If you’re 3+ months away from graduating, you likely won’t be considered for “normal” positions. Only to new grad roles can you apply more than half a year before you start. (Data points: Recruiters at Waymo and Amex reached out to me in June 2021. When I told them I graduate in May 2022, they said they’d only consider those starting soon-ish.)
Most companies don’t have a new grad track for DS. Usually, only big tech companies (e.g., Meta, Google, Uber, Microsoft, Robinhood, TikTok, DoorDash, Adobe, Pinterest, Zoom, PayPal, NVIDIA), established startups (e.g., Figma, Quora, Faire, Nuro), some hedge funds (e.g., Two Sigma, which hires both DS and quants), and consulting companies (BCG GAMMA, McKinsey) do. A bit to my surprise, Apple, Amazon, and Airbnb don’t really hire new grad DS and Lyft seems to fill new grad roles exclusively with return interns. I don’t remember if Stripe did or not. By contrast, far more companies hire new grad software engineers or machine learning engineers.
The hiring bar is different. You’re expected to only have school and internship experience; your resume and performance are compared to other students. So for students, university recruiting may be the path of least resistance.

Average-case scenario for Spring grads

Say Bob graduates in Spring 2025 (mid-May at most universities) and wants to apply through university recruiting. We can reverse-engineer an “ideal” timeline:

Caveats

If you graduate in December 202X, your timeline is similar to those graduating in Summer (202X + 1), except that you miss the last Spring cycle (January to March (202X + 1)). Some caveats:

You don’t have to apply through university recruiting. If you’re graduating in $<$ 2 months or already graduated, you can apply to regular entry-level jobs. You’ll have far more options but may face more fierce competitions (other candidates might have worked for 1-2 years).
Better late than never: Bob’s timeline is an average-case scenario (best case: got a return offer and don’t need to interview at all). What if you start preparing or applying “too late”, like I did for my internship? For various reasons, I only started applying to Summer 2021 internships in late February 2021 and got an offer in the eleventh hour of mid-May. It makes your odds worse, no doubt, but better late than never (as that RedHook song goes, “These bad decisions haunt me, but they make good stories”).
Everyone’s background differs: Steps 2-5 in are pretty set, but how long does it take to build an attractive resume for internship apps (step 1)? It depends on what you already know and have done. If you’re well-versed in scientific computing (stats, ML, programming) and experiment designs and have DS projects you’re proud of and ready to show, you can just polish your resume and apply. If you’ve never trained models, run experiments, or analyzed data, it may take 1-2 years to learn and do stuff. Most cases are somewhere in between (e.g., most STEM students know stats quite well but don’t use SQL on a regular basis).

Make no mistake, data science is a hard field to go from zero to hero. I’ve had my fair share of naysayers saying “Every CS/stats kid wants a DS job now. Why can you get it?" To that I wanna respond with a Tool song, “I want what I want." If you love modeling or analytics (or both), I’ll show you how to learn things bit by bit.

Track #1: Machine Learning

TL;DR

If you want a career in ML nowadays, there are more opportunities in non-tech industries (e.g., banks, insurance, healthcare, consulting, telecommunication, etc.). To work as an ML person at a tech company, you likely need to master state-of-the-art models (Research Scientist, Applied Scientist, or Machine Learning Scientist), software engineering (Machine Learning Engineer or Software Engineer, Machine Learning), or both — the days when data scientists could simply hand an out-of-box model in a Jupyter notebook to an ML engineer to productionize may be gone in tech.

Increasingly more candidates have exactly the same ML skills (e.g., LightGBM, PyTorch, transformers) on paper — I guess what sets you part is your judgment and hard-won practical skills (e.g., feature engineering, model tuning and evaluation). Harsh as it may sound, companies pay you for your ability to navigate through difficult choices and bring in revenues, not just the ability to import libraries.

500 names of ML data scientists 🦄

Master’s degrees, bootcamps, and online courses prepare you to be an ML data scientist, which is stereotypically what an DS is. That may be true in tech years ago and still true in traditional industries (e.g., pharmas, banks, telecommunication, etc.). Yet, at major tech companies today, ML data scientists are unicorns. I heard a saying that this change might have begun with Meta’s “Data Scientist, Analytics” track, which fully separates responsibilities of data scientists (A/B testing + SQL) and machine learning engineers (model training + deployment). Other tech companies have followed suit and hire data scientists to do pure product analytics.

ML data scientists are rebranded into different roles at big tech companies, often with higher expectations in research or engineering. Companies like Uber, Amazon, and Microsoft have “Applied Scientist” roles that typically target Ph.D. grads who have strong ML skills but may come from non-ML fields (e.g., econ, EE, IEOR, cognitive science). Depending on the company and the team, applied scientists may or may not write production code. Research-oriented ML roles that don’t regularly touch production code are called “Research Scientist” or “Machine Learning Scientist”. These roles typically target Ph.D. grads from relevant fields with decent publications. Prestigious AI labs (e.g., DeepMind, FAIR, OpenAI, Google Brain) may have expectations on a par with faculty hires. At companies like Lyft and sometimes Amazon, the line between research and applied scientists is somewhat blurred.

As is more often the case, if you train models, you’re also expected to deploy them. These roles are called “Machine Learning Engineer” (e.g., Twitter, Adobe, Stripe) or “Software Engineer, Machine Learning” (e.g., Google, Waymo, Quora). ML engineers on different teams at different companies can do wildly different things, from DL research, MLOps (e.g., modeling training, deploying, and serving), to ML infrastructure. In all cases, you’re essentially an ML data scientist and a software engineer meshed into one, just the fraction of each component may differ.

Some major tech companies still offer the “Data Scientist, Machine Learning” title to new grads. You’re generally not expected to deploy models or do theoretic research (the Machine Learning section in DoorDash’s engineering blog has examples of what their ML DS do). When I applied in Fall 2021, I only saw 3: Robinhood, DoorDash, and Figma. There are of course more I missed. For instance, Airbnb has a well-known “Data Scientist, Algorithms” track but didn’t hire new grads in my year.

Resources/preparation for ML DS

Regardless of the title, an ML practitioner can’t escape any of the following, IMO:

ML/DL foundations: You should know how learning algorithms work in general (loss, optimization criteria, optimization routine) and have intuitions about common algorithms in classic ML (regression-, instance-, and tree-based models) and DL (multilayer perceptrons, CNN, RNN, transformers…). It’s good if you understand their mathematical foundations. At a bare minimum, you should be able to use existing frameworks and libraries to train, tune, and test models.
- Quick & dirty: The Hundred-Page Machine Learning Book (my notes)
- Practical: Machine Learning with PyTorch and Scikit-Learn, Deep Learning for Coders, 3Blue1Brown’s Neural Networks series
- Foundational: The Elements of Statistical Learning, Machine Learning: A Probabilistic Perspective, Deep Learning by Ian Goodfellow
- Math (linear algebra + calculus): It’s hard to grasp what your models are doing without understanding linear algebra and calculus. A quick example: I’d been confused by degrees of freedom until I learned they can be thought of as the dimension in the subspace. Similarly, many ML concepts and operations make much more sense via linear algebraic interpretations. And understanding calculus helps you understand the optimization process (e.g., gradient descent) during model training. If you’re new to ML, get your hands dirty first, but at the end of the day, you may still want to understand the math behind (e.g., if asked to implement a method from a research paper at work, you may feel intimidated by not understanding). I love 3Blue1Brown’s essence of linear algebra and essence of calculus series. Neuroscientist Mike X. Cohen also wrote a practical linear algebra book that combines theory and implementations. Check these out!
If I were to start over and can only pick 2, I’d first read the 100-page book and then work my way through Machine Learning with PyTorch and Scikit-Learn.

People learn differently, but in general, you don’t wanna spend 3 months just reading or, reversely, only call Scikit-Learn packages without understanding. Every time I learn a new family of models (e.g., all sorts of boosting, graph embedding, transformers), I’d read a book chapter or a seminal paper on it, watch several YouTube videos explaining the intuitions, and apply it to a toy problem I care about (end product is a notebook, an app, or a presentation). If I get stuck, I’d Google my way out (which often leads to Kaggle kernels, Medium blogs, or Analytics Vidhya articles) or ask questions on StackOverflow. Getting stuck is no pleasant feeling, but I never forget Prof. Aaron Fisher’s words,

“Learning is pain. If you don’t feel the pain, you’re not learning."
ML system design: You need to know how to apply ML to solving business problems (e.g., how Twitter shows feed, how Google ranks results, how Uber matches riders to drivers, how Stripe detects fraudulent transactions). Unlike software engineering system design, ML system design focuses more on the end-to-end model training process and less on the tech stack.
- General approach: Introduction to Machine Learning Interviews Book covers all grounds in ML interviews (Discord for discussing answers). Rules of Machine Learning: Best Practices for ML Engineering and Design a Machine Learning System are less comprehensions but great for a quick grasp.
- Specific systems: Grokking the Machine Learning Interview 👉 This Educative course covers 6 common types of ML systems, including self-driving, ads, search ranking, recommendation, feed, etc.. I think it’s great for interview preparation and guiding personal projects.
- Cool tricks: Machine Learning Design Patterns 👉 I got this gem coincidentally (I was too early for a movie and spotted it in a bookstore) but it has guided me through many tricky situations. Examples: How do you deal with high-cardinality categorical data (e.g., airport names)? What to do if one observation can have multiple labels (e.g., hashtags of Tweets)? How can we combine tabular, text, and image data to make predictions?…
- Invest in feature engineering + modeling tuning: Friends and acquaintances often ask, “I know LightGBM and PyTorch, just like any other candidate. Why do I keep failing take-home assignments?" I think nowadays, far too many candidates have identical ML skills on paper — the devil often lies in feature engineering, modeling tuning, and other judgment calls.
  
  How do you handle missing data and outliers (e.g., what counts as an outlier, impute or not to impute, when & how to impute…)? What loss function should you use and what metric to optimize for? Can you engineer meaningful, non-redundant features from things like timestamps, coordinates, ID’s, etc.? When aggregating features (e.g., by time, region, user, etc.), can you decide on the right level of granularity? When using models that don’t have intrinsic abilities to capture feature interactions, can you properly cross features? Do you know how to use techniques such as feature hashing to prevent “out-of-vocabulary” problems when new data might have unseen categories (e.g., new ID’s)?… These are a few random examples I thought of. I learned those practical tricks by doing projects (on which I probably spent hundreds of hours, if not $>$ 1,000, in 2+ years), Googling, asking DS/MLE/RS/AS friends, studying Kaggle kernels, and reading Medium blogs and books like Machine Learning Design Patterns.
  
  Hyperparameter tuning is another beast that takes a ton of patience and experience to conquer. For simple models with few hyperparameters to tune, grid search or random search may be enough. However, when the model has many floating parts or each epoch takes a long time (e.g., neural nets), you may want to consider Bayesian optimization, genetic algorithms, or other non-exhaustive, adaptive methods. A friend recommended PyCaret that automatically selects and tunes models for you. But as a beginner, you should tune models “by hand” to build intuitions and judgments.
Data structures & algorithms: As mentioned, ML roles have engineering components and therefore often require SWE-style coding interviews. The bar may differ slightly by job title (often but not always: MLE & SWE > AS > RS $\approx$ DS).
- Foundations: It doesn’t matter how you learn it, but you need basic data structures and algorithms knowledge before you grind coding questions. 👉 Resources (you don’t need them all): CS Dojo, CS 61A @Berkeley, Problem Solving with Python, and Practical Algorithms and Data Structures
- LeetCode: I initially refrained from applying to MLE jobs because I didn’t want to grind LC questions. Later, I realized several roles I started interviewing for (AS @Uber, SWE @Google, quant @Two Sigma) would require LC-style coding interviews. So I signed up for 九章算法 in October 2021, about halfway through my interview cycle, and was really burnt out. If you want to reduce stress and not limit your choices, I strongly recommend that you seriously invest in coding interviews the summer before you apply to full-time jobs. Upon hindsight, it wasn’t that scary because on the PhD level at least, coding interviews for MLE/AS/RS/DS are not meant to be hard. Most of my friends did about 200-300 questions when they interviewed.
  
  It doesn’t matter if you grind 200 or 1,000 questions — the end goal is that you can quickly recognize patterns in new problems you got, work collaboratively with interviewers to gradually solve them, and clearly communicate your thoughts and justify your approach along the way.
  
  For Mandarin speakers, taking 九章算法 is the most efficient way to learn patterns that apply to almost all algorithm questions. The lectures and course support are amazing. The English counterpart, Grokking the Coding Interview, is in a text format and doesn’t have teach assistants.
  
  VS Code users can use the LeetCode plugin to grind LC locally.
- Write clean code: As a student, you may not realize the importance of writing clean code. I’ve seen many Jupyter notebooks (including my own before 2019) with unreadable spaghetti code. That won’t fly in MLE/SWE interviews where coding style is an essential part of the rubrics — after all, companies don’t want to bring in someone who might mess up their code base. And I’ve heard of engineers and data scientists getting poor performance reviews because of bad styling habits. So it’s a huge plus, if not a necessity, that you pay attention to cleanness. Modulate your code so that each function preferably does one simple thing (or two), use descriptive variable names, follow the PEP 8 style guide, be “Pythonic” (e.g., use list or dictionary comprehensions instead of C-style for-loops, learn object-oriented programming), use comments when necessary (doesn’t mean you write essays), etc.. Highly recommend Python Tricks to Python folks.
Interpretability, fairness, and privacy: Great, you’ve learned the end-to-end model training pipeline, but do you know how your black-box model makes predictions (interpretability: Interpretable Machine Learning)? Do you know whether it treats different demographics “fairly” (fairness: Fairness and Machine Learning, The Ethical Algorithm, Corbett-Davies & Goel, 2018)? How do you build reasonably good models when certain identifiable information is taken out (privacy: CS 860 @Waterloo)? As a responsible ML practitioner, it’s good to know these. If you’re interviewing with companies (e.g., fintech, banks, insurance, Apple) where these issues are super prominent or under heavy regulations, I’m guessing (I don’t know) you may get related interview questions.

Between the day I first imported a Scikit-Learn model (October 2019) and when I got an offer as an ML data scientist (November 2021), 2 years had passed. In those two years, I did dozens of toy projects, deployed 4-5 personal projects, and worked on a researchy DL project in my internship. There’s a reason why people joke about ML being “modern fortune-telling” (“炼仙丹”): From data validation + wrangling, feature engineering, to modeling selection, tuning, evaluation, and deployment, should anything go wrong, predictions would be nonsense. ML is a craft to patiently cultivate (this may be why you see many product DS YouTube channels but few ML DS channels — not only does ML have fewer spots, but it’s also harder to master all of a sudden).

Track #2: Product Analytics

TL:DR

At big tech companies and established startups, if the job title “Data Scientist” is not suffixed by “Machine Learning”, “Algorithms”, “Engineering”, or the likes, it most likely refers to product analysts. The analytics track is by far the most common way for new grads (or anyone, for that matter) to land a DS job. In this quirky roadmap, shall I offer some quirky advice: All new grads looking for DS jobs, regardless of the track, should learn product knowledge, because 1) you can’t count on getting an ML offer given the rarity, and 2) product sense goes a long way in ML interviews as well (a model is only useful insofar as it solves a business problem and you need product sense to understand what it is).

Ubiquity and range of product DS

While writing this post, I searched “Data Scientist + New Grad” on LinkedIn and below is the first result shown to me, from Wealthfront:

This fits a typical product DS role that requires “A/B testing + SQL + something”. In this case, that “something” is automating their internal analysis tools. In other cases, “something” could be a bit of machine learning (e.g., Quora), causal inference (e.g., Netflix, which hires DS interns but not new grads), and so on.

Resources/preparation for product DS

Between the day I first heard “product sense” (August 2021) and when I got my first product DS offer (October 2021), it took me about 2 months. At its core, product analytics hinges on the scientific method, which is ingrained into PhD training in empirical disciplines. With regard to the timeline — you may not need a whole year before applying to product DS internships if you’re an experimental scientist.

Many people (Emma, 课代表立正) have talked about product interviews. So instead of repeating what they said, I’ll mention what I may or may not have done differently.

A/B testing: I did 4 things all product DS candidates did (in theory) — reading Kohavi’s book and Lean Analytics, watching Emma’s videos, and taking the Udacity course. You should do these, too. Moreover:
- Find a dataset to analyze: Unlike ML, there aren’t many public datasets for A/B testing. In their NeurIPS paper, Liu et al. (2021) summarized a dozen online controlled experiment datasets with different levels of granularity. Read this paper, pick a dataset, and analyze the experiment. Chances are, this process may expose gaps in your knowledge (How to load, join, and aggregate data? Which assumptions to check? What test to perform?) that merely reading a book or watching videos won’t do.
- Implement power analysis and $t$-tests: While reading the Kohavi book, I didn’t understand the sample size formula, $\frac{16\sigma^2}{\Delta^2}$, so I Googled how to derive it and implemented different variations. For $t$-tests, I similarly implemented different variations (comparing 2 means vs. 2 proportions, pooled vs. unpooled variances) from scratch using NumPy and SciPy.
  
  I know myself — I’m awful at memorizing random stuff (e.g., I don’t recall my own plate number) and have to make sense of something to remember it. IMO, it doesn’t matter how you choose to learn, but you need to be honest about whether it’s working or not and do whatever works for yourself.
- Consider complications: If you’re interviewing with companies that ask notoriously difficult A/B testing questions (e.g., Quora, Netflix, Uber, DoorDash, Google), you want to think deeper. For instance, when talking about A/B tests, we often think about independent groups, but some experiments use matched pairs (e.g., the same user saw different variants). What are the pros and cons of matched pairs vs. independent samples? And how do you analyze results differently ($t-$tests, bootstrapping…)? Another tricky problem is computing the average click-through rate (CTR) per user in each variant. Do you add clicks together and divide it by total impressions? Or do you compute each user’s CTR and average it by user? Quora DS Tom Xing called the former “ratio-of-averages” and the latter “average-of-ratios”. What are their pros and cons? Complications may also arise from the unit of randomization/analysis. Since an average Netflix user watches 0-1 movie per week, randomizing by user would mean month-long experiments. To compare algorithms A vs. B, Netflix often uses “interleaving”, showing items recommended by two algorithms in an alternating fashion ($A_1, B_1, A_2, B_2…, A_n, B_n$) and comparing the average metric of each algorithm. When A/B testing fails, you may need other causal inference methods. To prepare, you can read these companies' engineering blogs.
Stats: Most product DS stats questions are packaged within the “A/B testing combo” (sample size calculations, $t-$tests, etc.). Additional questions may cover regression (linear, logistic), sampling (e.g., stratified vs. random sampling), and resampling (permutation tests, bootstrapping). Try implementing the most used (frequentist) statistical tests yourself in Python (SciPy) or R.
- Build intuitions: If your stats knowledge is rusty, read through OpenIntro, which covers axioms of probability, distributions, $t-$tests, OLS, etc.. StatQuest has an amazing statistics fundamentals series. I watched them at the beginning of the pandemic (didn’t want to watch shows I used to watch with someone else…), which got me hooked by statistics.
- Learn the nitty-gritty: I’ve used stats daily for maybe 7 years now (since junior year in college?) but only became curious about why and how statistical methods come about in the past two years. I watched Prof. Simon DeDeo derive the NYC cab waiting time distribution from the principle of maximum entropy and was hoping to see a book that derives all common distributions and tests from first principles. I’ve yet to find that book (if you know one such book, I’d be so grateful for your recommendation!).
  
  Among stats books out there, some good ones are Introduction to Probability for Data Science, Mathematical Statistics with Resampling and R, and, of course, The Elements of Statistical Learning.
Probability: Most probability questions have to do with probability distributions, combinatorics (permutation and combination), and conditional probabilities (including Bayes' theorem). Meta has one round of onsite interview dedicated to this kind of questions. Other companies may have one or two buried in their stats round or technical phone interviews.
- Know your distributions: Gaussian and binomial are by far the most common distributions you’ll be asked about, and sometimes heavy-tailed distributions. The geometric distribution is also common, especially in churn analysis and customer lifetime value (LTV) calculations. Almost every probability theory or stats book has one chapter or two talking about distributions. I somehow particularly like OpenIntro.
- Know your combinatorics: In the case of discrete probability, you can derive most answers from 2 rules, the rule of product (if it takes $n$ step to do a thing and there are $m_i$ ways to do it at step $i$, you have $\prod_{i}^{n}{m_i}$ ways to complete the whole thing) and the rule of sum (if there are $n$ mutually exclusive things and $m_i$ ways to do thing $i$, there are $\sum_{i}^{n} m_i$ ways to do the $n$ things). I highly enjoy this tutorial from Meta recruiters.
- Know your conditionals: As someone using Bayesian models to study causal cognition, I find it intuitive to think about conditional probabilities in terms of directed acyclic graphs (DAG’s). Consider a simple DAG: $A \rightarrow B$ (COVID causes coughing). We can calculate conditional probabilities forward (if someone has COVID, how likely they might cough, $P(B|A)$) or backwards (if someone coughs, how likely they might have COVID, $P(A|B)$). If calculating backwards, we need to enumerate why else someone might cough (e.g., cold, lung cancer) and use Bayes' theorem, $P(A|B) = \frac{P(B|A)\cdot P(A)}{P(B)}$. The annoying part is the marginal probability $P(B)$, which is $\sum_{i}^{N}P(B|H_i)\cdot P(H_i)$ ($H_i$ is one hypothetical cause of coughing among $N$).
- Practice beforehand: Probability questions are often confusing and unintuitive. Even if you know how to slowly derive the answers, that still may not cut it in time-limited interviews. It’s reassuring to have worked on similar problems before the interview. The “quant green book” covers most probability questions under the sun, but many questions are far too hard for DS interviews. The “DS red book” may be a less overwhelming choice. When I was interviewing, I just Googled “probability questions for data science interviews” and practiced some questions.
SQL: Even though SQL is a core skill of product DS, I somehow only did SQL interviews with one company (Meta). Someone else may offer better suggestions.
- For absolute beginners: Select Star SQL is the best beginner tutorial created by a former Quora data scientist. Among all the good things, I particularly appreciate how it fosters query processing intuitions. For instance, when you FROM t1 JOIN t2 ON key1 = key2, it’s as if that for each row in t1, the machine compares its key1 with all key2 in t2 and assigns TRUE for matches and FALSE for mismatches. After doing it for all rows in t1, TRUE rows are kept and FALSE rows dropped. With this intuition in mind, we know FROM t1 JOIN t2 ON TRUE is a valid query because all the ON operation needs are Boolean values, however created. When working on head-boggling problems later, I’d manually transform data on a piece of paper, relying on such visual intuitions.
  
  This tutorial didn’t explain aggregations like GROUP BY visually or the query processing order, which you can find in this LinkedIn course.
- Learn window functions: If you’re focusing on the analytics track, you’ll run into window functions sooner or later. I attempted to get away with learning but realized that for many problems, if you don’t use them, you must compensate with a sh*t ton of JOIN’s, CTE’s and, God forbid, nested queries… I recommend the window function course on LearnSQL. I tried a bunch of other ones, which just didn’t click. Here are my notes.
- Practice out loud: I know how to answer algorithm questions in front of a human but didn’t know that about SQL questions. After watching Nate’s YouTube videos, I realized there’s not much of a difference — just like in the former, you clarify the goal of the problem, confirm the input/output (especially edge and corner cases) with the interviewer, think silently/frantically for a minute or two, then verbally walk the interviewer through your step-by-step approach, and finally code it up after the interviewer green-lights your idea. When you practice SQL questions on LeetCode or StrataScratch (created by Nate?), treat it as an interview and explain your approach out loud before writing any queries. It’s also a good idea to time your solutions to create realistic pressure (easy: 5 min; medium: 10 min; hard: 15 min).
Product sense: There’s so much said about product sense already and I wrote a post about choosing metrics myself. To some degree, I think product sense (at least metric investigation) is about recovering finer-grained information from highly aggregated statistics using knowledge about the business and the users. A quick example is that, when Meta aggregates the number of posts per user across all users and all time, of course you lose information about different user segments and time periods. Then when this aggregate metric goes wrong (dropped by 20%!), can you tell why? Of course not, because the aggregation wiped out more granular details. But if you know something about your users (e.g., new users tend to post less), the context (e.g., a major event had passed), etc., then you can make an educated guess about how to “disaggregate” the data (e.g., GROUP BY user_age) and look again. If I were to design product interviews, I’d follow up the candidate’s guesswork with a SQL question (“Okay, you wanna look at the trend across time. How to write the query?")
- Learn the “classics”: If you’re just getting started, read Lean Analytics and watch Emma’s product case videos before you do anything else. In most cases, these are already enough. I skimmed Cracking the PM Interview and Decode & Conquer — there’s good stuff here and there, like competitor analysis and behavior questions, but I don’t remember anything else.
- Picture yourself as the investor: Everyone says picture yourself as the user, which is great advice: You should use the product made by the company you’re interviewing with if you can. But what if you’re interviewing with Waymo, SpaceX, or a 2B company? You can picture yourself as the investor. If a company is public, I’d read the earnings calls to see the business' strategy and focus at the moment, how various products work together to achieve its goals, which key metrics the company chooses to report, and what concerns investors raised. Below is a screenshot from DoorDash’s 2021 Q4 financial report, which is publicly available on their investor relations page. As you can see, one of their main growth strategies is to find more ways to be useful in people’s lives (e.g., deliver groceries and stationery). You can also see that while DashPass helps retain customers and encourage ordering, the company needs to strike a delicate balance between order frequency and profit per active user.
- Mock interviews: Had it not been for mock interviews, I literally wouldn’t have gotten a job — my resume was initially passed by DoorDash; I joked to a mock interview friend that I’d order from somewhere else now, was referred to the same job, and got an interview next day. “Concrete benefits” aside, it’s super useful to hear how different people approach open-ended cases and verbalize your thoughts socially and interactively.
  
  A perennial question is how many mock interviews one should do. It doesn’t matter; some do dozens while others three — it’s the end result you strive for. Ask yourself, given any arbitrary product, can you quickly ask the right questions in order to figure out how it works, who it serves, when it succeeds, and how it might fail? Given example results, can you gauge how well it’s doing? When goals clash, can you seek a good trade-off?… If these thoughts and ideas come naturally to you, then doing more only gives you a diminishing return. On the other hand, if people often say you need to think more thoroughly and organize your thoughts more clearly, then you’d probably benefit from practicing more with people. Do what you need.

Interview Preparation

The “Squid Game”

Right around when I was interviewing, the Netflix show Squid Game went viral. I was joking to friends how DS interviews are just like the game: Candidates play 6-ish rounds of high-stake games (e.g., SQL, product, stats, ML, coding, behavior) and those not “eliminated” by the end get the offer. I doubt if anyone can tell you accurately the bare minimum preparation it takes to get a DS job, because we were doing as much as possibly can to “be safe” (unless 艺高人胆大🤑, which I’m not).

Take all advice with many grains of salt.

A Squid Game lantern I drew last October, a week before my first offer. — A *Squid Game* lantern I drew last October, a week before my first offer.

Knowing the basics $\gg$ memorizing answers

“If I am allowed to give only one suggestion to a candidate, it will be ‘know the basics very well’.” — Xinfeng Zhou, The “Quant Green Book”

I wholeheartedly agree with Xinfeng Zhou’s views on interviews. I think good interview preparation consolidates your broad knowledge base accumulated over time and brings out the logical, insightful, and methodical person that you are, rather than helping you memorize answers to sample interview questions you don’t understand or pretend to be someone you’re not. Sure, even if you know the basics well and normally think logically, you may still flunk interviews if you’re nervous or unprepared. However, if you’re interviewing for a product role but can’t think of ways to analyze a product, or interviewing for an ML role but can’t choose between linear regression and neural nets, no amount of interview skills will help.

You might be interviewing with certain companies famous for re-using product analytics questions and some people do get offers just by memorizing answers. Sure, practice those questions but don’t skip the basics. What if you end up having to interview with other companies? What if the questions change? Even if you got the same questions you prepared for, any lack of understanding would still show:

“Unless you truly understand the underlying concepts and can analyze the problems yourself, you will fail to elaborate on the solutions and will be ill-equipped to answer many other problems.”

Apply early and get ready gradually

A subset of books I used to prepare for DS interviews. In case you’re curious: Each folder has all the files of each company I interviewed with.

If I devoured all the books above before I applied, I’d never be ready. For me and many friends, upcoming interviews are the only force to push things forward. During my last week of internship, a good friend interning at Meta said, “I’m leaving next week so I can only refer you now!" I was planning on taking some days off the week after but instead sent out my resume and started my interview cycle unprepared.

Back in August 2021, I had no idea what product analytics is. I’d forgotten the basic SQL I learned. I checked Glassdoor and saw that even Meta recruiters ask simple SQL and product questions, which is highly unusual for recruiters. I spent a few days going over Select Star SQL. I also asked a product DS friend how to approach product questions and got some generic advice, “Think about the goal. Think about what data to pull. Think about when the data says you achieved the goal or not." These might seem crude now but were enough to get me through that 15-min call.

Right after getting Meta’s phone interview, I heard from Quora. I was glad both are product roles at social media companies, so I didn’t have to prepare wildly different stuff. I searched “product interviews” on YouTube and watched a bunch in a row: User growth by Meta’s Alex Schultz, metrics by Quora’s Adam D’Angelo, and, of course, videos from Emma’s channel. I got Kohavi’s book recommended by Emma and dug out Lean Analytics a friend gave me. I read both in two days because I was so very intrigued: As a kid, I loved detective novels to the core; product analytics seemed just like detective work — something went wrong with your product so you peel the onion to find the culprit. I thought of checking if Emma offered mock interviews and was happy that she did. Otherwise, my interview outcomes would’ve been very different (e.g., I thought it was a great idea to say I know her frameworks and was told that’d be a sure fire way to bomb an interview), and perhaps my life, too.

I passed both interviews in early September and was moved forward to onsite. Around that time, I started to hear from other places (e.g., Two Sigma, Figma, Uber, Google, Robinhood, and some I don’t remember). Looking back, all of them are highly competitive, but I never thought about my odds. I was kinda “myopic”, thinking I’d study just enough to pass 45-minute phone interviews, and then I’d worry about pulling off 4-hour onsite interviews. Only much later did I learn that many students would check applicant counts and feel discouraged by the seeming impossibility of being chosen. My advice: Don’t. Again, if you’re shaky on the basics, you won’t get the job going “up against” just one other person; on the flip side, you’ll get the offer if you truly understand how things work and can give crystal clear explanations, even if 2,000 applied.

In mid-September, I got a few more onsites. Another good piece of advice I got from Emma is that you don’t have to schedule interviews right away if you’re not ready. Recruiters get paid only when you accept an offer (usually $\propto$ your compensation), so they’re motivated to help you get ready. I was worried about headcounts at Quora and Figma since there might not be as many openings as would be at larger companies. I asked both recruiters explicitly and figured out a timeline together:

“Before I provide my availability, I wonder what would be an ideal date range on your end. Is it preferable that I interview in late September, or is early to mid-October still a good time? Asking since I wish to be as prepared as possible but don’t want to miss out on this opportunity (which I really appreciate) because of headcount-related issues.”

I got myself about a month’s time to prepare for the onsites. Companies usually do onsite prep calls and share materials telling candidates exactly how many rounds there will be and what each round entails. Piece by piece, I prepared for each round at each place and felt ready a day or two before each onsite (4-5 rounds).

Organize your time and learning

As interviews piled up and the content diversified, I was overwhelmed by stress (couldn’t sleep or eat well; constantly irritated…). So I made a risky move, asking my PhD advisor for a month off from research,

“What do you think about me postponing experiments till November to focus on interviews? If I get an offer, I can squarely focus on my dissertation from then on. If I don’t, I’ll still come back to research right away. It’s risky, but I think the expected utility is higher than if I spend several months doing full-time research and interviewing unprepared.”

to which she generously agreed. My October schedule alternated between two modes:

Review a topic (70%, $>$ 3 days from onsite): For each type of interview, I created a Notion page and spent 3-7 days filling it with content. Several were created back in September (e.g., A/B testing, SQL, product) and most in October (e.g., behavior, Pandas, ML design, probability + stats, applied data). Some of the best ones were later turned into blog posts and YouTube videos.

Reminds me of that Nirvana song Lithium (‘I’m not gonna crack…')

For procedural knowledge like SQL and Pandas, I usually Google around for a good problem set (e.g., Pandas: pandas_exercise, SQL: StrataScratch) to work through. Afterwards, I’d reflect on how I like to approach an arbitrary problem during interviews, extract patterns (e.g., when to use window functions), and analyze problems that I not only got wrong but still found challenging.

A screenshot of my SQL notes. Advice was given by Nathanael Rosidi.

For probability and stats, I flipped through several textbooks (I was teaching stats) to create an interview “cheat-sheet”, including properties of common distributions, formulas and assumptions of common tests (e.g., $t$, $F$, $\chi^2$), regression (OLS, $l_1$, $l_2$), and major concepts (central limit theorem, the law of large numbers, $p$-values, Bayesian HDI vs. frequentist confidence intervals…).

For the “analytics combo” (A/B testing + product cases + metrics), I re-read Lean Analytics, skimmed Cracking the PM Interview and Decode & Conquer, read Medium posts by Emma and friends, and went over prep materials from recruiters (Quora and Meta both provided great articles to read and videos to watch).

For applied data and ML system design, I went through my old ML notes (warning: I wrote it a while ago and might have said wrong things) and quickly went over Grokking the Machine Learning Interview again. I wanted to read Chip Huyen’s ML interview book but at that point, I chose to sleep more instead….

Once I was done with one topic, I moved on to the next. People have different preferences; I like to focus on one small thing at a time and really nail it.
Get ready for onsite (30%, $\leq$ 3 days from onsite): For each company, I also created a Notion page. Three days before an onsite, I read articles about special problems faced by the company (e.g., software pricing, conversion funnels, churn analysis, ETA predictions, feed ranking, content moderation and recommendation), browsed through the company’s engineering blogs, watched talks on YouTube by the company’s data scientists, leaderships, and investors, read earnings calls (public) or TechCrunch articles (public + private), and thought of insightful/interesting/endearing questions to ask my interviewers.

This was also the time I did mock interviews. Partly because I packed my schedule with too much stuff and partly because of I was pretty sure what I was doing, I only did 3 mock interviews as the candidate (for DoorDash, Quora, and Meta). Upon hindsight, I should’ve done more behavior mock interviews — I was caught off guard in the Figma behavior round when asked about how I’d deal with common situations, which could’ve been prevented by practicing with people.

Optimize your timeline

As shown in the screenshot above, I withdrew from Google, Two Sigma, and Uber because there was no way I could finish. I completed the first 4 in early November and scheduled the rest from late November to early December. I did that because I wanted to leave more time for each company. That probably wasn’t necessary: Even though each company has different business models and its own concerns, stats is stats and metrics are metrics — the basics are the same everywhere, so you may only need 3-5 days for each additional place to prepare for company-specific questions.

Knowing what I know now, below is what I should have done:

Finish phone interviews in early to mid-September: Phone interviews are not meant to be hard — they serve as smokescreens to weed out people who really don’t know the basics (e.g., don’t know $t-$tests). Don’t drag out this step.
Finish most onsite interviews in mid- to late October: Offers commonly expire in 2 weeks (some companies are happy to extend it for you but hot startups have limited spots and no shortage of strong candidates). If you space onsite interviews too far apart, you won’t have competing offers for negotiation. It’s also a jarring experience waiting to hear from a place you wanna go while sitting on an soon-to-expire offer. To avoid that, the best strategy may be to study intensively for a month and finish all your interviews in two weeks.
Keep applying: Different companies open university recruiting at different times. In my year, Quora held an open house as early as July and Faire sent out interview invites as late as December. As noted earlier, new grad DS positions are hard to come by, so keep applying whenever you see an opening. That way, even if you fail some interviews, you may still have things in the pipeline.

Of course, it’s impossible to seize all opportunities. Some companies send out decisions in two days while others take a few weeks. Some companies have long interview cycles (famously Google and Two Sigma) while others short. But when scheduling interviews, do take a moment to think about how things would play out (I didn’t😬).

What makes you stand out

Two candidates could’ve gone through exactly the same preparation yet come away with different outcomes. If I knew why, I’d switch careers to being a fortune-teller. That said, I do have some hunches about what makes a candidate stand out, based on my own actual/mock interview experience and talking to people.

Knowledge and honesty: For things like stats, probability, SQL, ML, and algorithms, either you know it or not — it is impossible to fake expertise. It’s OK to admit you don’t know something (which I did on multiple occasions), but it’s frowned upon if you use buzzwords to cover what you don’t understand. For instance, if you propose using deep learning to solve a business problem (e.g., friend recommendation), can you justify why it’s better than regression- or tree-based methods? What architecture and loss function should you choose?… People may tell you differently, but I just think that one cannot be a good data science candidate if they cannot be a competent data scientist.
Logic and clarity: Do you have that friend who just won’t get to the point, jumping from one story to another, forgetting how it all started? Don’t be that friend in interviews. Your interviewer is a busy data scientist taking an hour out of their day to interview you and dying to go back to work. Make it easy.

This comic probably contains all the ‘interview skills’ you need.

Logical thinking in interviews is the same as logical thinking in everyday life. As a guitar player, I want to illustrate this point using a common headache I face: “My guitar is buzzing. Why the heck? How can I make it stop?"

Diagram of the simplest system (‘just the guitar and the amp’).
1. Clarify the problem: What’s the frequency of the buzz? 60Hz or 120Hz?
  
  You don’t ask clarification questions for the sake of asking clarification questions. You ask because you wanna figure out which beast you’re dealing with. You can only ask useful questions if you know the domain. In this case, I ask the frequency question because I know 120Hz buzz is usually caused by grounding issues and 60Hz buzz some shielding or cable problems.
2. Confirm information received: Okay, the buzz is 60Hz. What now? — “Thanks for the information! Given the buzz is 60Hz, we can rule out grounding issues and focus on cable problems."
  
  Interviews are kinda like those “Choose Your Own Adventure” games — the problem space branches into different possibilities depending on the information you receive. This is another reason why simply remembering sample interview answers won’t do, or companies feel safe to reuse questions.
3. Lay out areas to investigate: You want to thoroughly consider the problem for a few minutes (preferably silently) and then give the interviewer a “TL;DR” — “The problem may come from 4 places: Two instrumental cables (guitar $\rightarrow$ amp, amp $\rightarrow$ speaker), the speaker, the amp, or the guitar itself. Are there areas I’m missing, or do you want me to investigate each?"
  
  I learned this trick from Emma’s videos. Giving a high-level summary at the beginning of each “chapter” in an interview not only makes your answer easy to follow but also allows the interviewer to correct your oversights and lead you towards a promising direction. Ex-Quora DS William Chen called this technique “signposting” in his Quora answer (one of the best interview tips I read, along with this DoorDash blog).
4. Propose tests and fixes: Finally, you want to walk through each hypothesis, in a descending order of priority — “First, let’s check the cables, because they’re the most common culprit and the easiest to fix.” “We can first check if the cables are loose. If the buzz persists after we push them all the way in, then we can try new cables." “If we still hear the buzz, we can check if it’s the amp, the speaker, or the guitar. We can try, for instance, a guitar that works elsewhere and see if the buzz disappears. If so, the guitar is the root cause. If not, the guitar may or may not be broken — leave it on and change each of the other components with one that for sure works till the buzz goes away."
  
  Every hypothesis should be backed up by some distinct patterns in your data (“If $X$ is true, then we’d see $Y$"). Refrain from throwing out random guesses you cannot tie back to (hypothetical) data. Don’t propose a “Carl Sagan’s dragon” (an invisible, immaterial dragon🐲 that breathes heatless and also invisible fire…) that you can never empirically verify.
Try this for a problem that you often run into in real life (e.g., some say skincare is a multi-variate causal inference problem that requires this sort of methodical reasoning😳). If you can solve it logically, you won’t have communication problems in interviews; if you can’t, practice until it becomes a habit.
Likeability and authenticity: Think about people you enjoy working work with. What do they have in common? I find it pleasant to work with those who listen closely when I speak and respond thoughtfully and respectfully. Most people I like are polite, creative, chill, and optimistic. It makes me nervous to work with someone who got defensive upon disagreement or constructive feedback, doesn’t let me finish, or complains in the face of challenges. Then, answer honestly: Would you like to work with yourself? If the answer is “meh”, then work on yourself before working on interviews — be whom you wanna work with.

I have this unscientific theory that emotional intelligence = problem solving. In the last 4 years, I’ve taught hundreds of Berkeley undergrads and noticed the smartest students are also the most thoughtful. This may be no coincidence: It’s more computationally expensive to simulate how your actions make people feel than being careless. I guess the good news is, just like practice can help you solve technical problems, it also helps you become more considerate.

As for authenticity, whether it makes you a better candidate or not, I don’t know, but it’s less mentally taxing to be who you are. In the “Q&A” part of an onsite interview, I asked the interviewer for tips since I heard the hiring manager in the next around is famously sharp and scarily smart. Turned out the interviewer was nervous for the same reason when they were a new grad. I might have wasted a chance to “show my company research”, but that was honestly what I wanted to ask and the interviewer’s answer made me much more relaxed.

Have Fun

Fun is all you can control

Before my very first onsite, I confided in a friend, “I’ve wanted to be a data scientist for two years now and it may just come true. It feels so heavy." I didn’t sleep at all that night and didn’t get the job later (not causally related…).

Because of the validation I got from my internship host and knowing what I’d learned over the years, I didn’t doubt my ability to do data science. But the rejection made me worried about my arrangement with my advisor (regardless of the outcome, I must focus on research in November) and if I could find a job in time.

Then I watched a Hilary Hahn documentary where she talked about stage fright (spoiler: the violin goddess has never had a stage fright):

If you’re afraid you’re not gonna do well enough, you’ll just replicate what you think people want you to do. Replicating is never the way to convince people you know what you’re doing. You just have to be very well prepared, as prepared as you can be, but you’re never completely ready — you just do your best. I think the best thing is just to look forward to it, expecting it to be a lot of fun and realizing the audience wants to have fun — they wanna enjoy the performance. If you have a satisfying experience being on stage, then I think that really helps with the unity of the performance.

Her words shifted my perspective. If I couldn’t get a job in November, so what? I’d make time for job search while doing my dissertation. The only thing I could control was to give my interviewers thoughtful answers, and hopefully, an “entertaining” performance. I got offers after all the later interviews.

No fun without friends

In the end, I want to thank many DS friends who made this journey much more fun than it could’ve been. Yishu — thank you for saying you already thought of me as a data scientist long before I even landed an internship, and for your friendship in good and bad times. Emma — I remember all your advice when I was interviewing and absolutely nothing from our chats afterwards, but musing about data science with you was a great enjoyment. My mock interview buddies — I found interview preps hard but never dull because of you and may our friendships continue as our careers grow.

Fun fact: During my pandemic job search, I’d often take a walk after dinner and listen to The River by Bruce Springsteen. The lyrics couldn’t be more fitting:

I got a job working construction
For the Johnstown Company
But lately there ain’t been much work
On account of the economy
Now all them things that seemed so important
Well, mister, they vanished right into the air

Is a dream a lie if it don’t come true
Or is it something worse
That sends me down to the river
Though I know the river is dry

— Written during the early 1980s recession