How to Ace a Data Science Interview

As I mentioned in my first post, I have just finished an extensive tech job search, which featured eight on-sites, along with countless phone screens and informal chats. I was interviewing for a combination of data science and software engineering (machine learning) positions, and I got a pretty good sense of what those interviews are like. In this post, I give an overview of what you should expect in a data science interview, and some suggestions for how to prepare.

An interview is not a pop quiz. You should know what to expect going in, and you can take the time to prepare for it. During the interview phase of the process, your recruiter is on your side and can usually tell you what types of interviews you’ll have. Even if the recruiter is reluctant to share that, common practices in the industry are a good guide to what you’re likely to see.

In this post, I’ll go over the types of data science interviews I’ve encountered, and offer my advice on how to prepare for them. Data science roles generally fall into two broad ares of focus: statistics and machine learning. I only applied to the latter category, so that’s the type of position discussed in this post. My experience is also limited to tech companies, so I can’t offer guidance for data science in finance, biotech, etc..

Here are the types of interviews (or parts of interviews) I’ve come across.

Always:

Coding (usually whiteboard)
Applied machine learning
Your background

Often:

Culture fit
Machine learning theory
Dataset analysis
Stats

You will encounter a similar set of interviews for a machine learning software engineering position, though more of the questions will fall in the coding category.

Coding (usually whiteboard)

This is the same type of interview you’d have for any software engineering position, though the expectations may be less stringent. There are lots of websites and books that will tell you how to prepare. Practice your coding skills if they’re rusty. Don’t forget to practice coding away from the computer (e.g. on paper), which is surely a skill that’s rusty. Review the data structures you may never have used outside of school — binary search trees, linked lists, heaps. Be comfortable with recursion. Know how to reason about algorithm running times. You can generally use any “real” language you want in an interview (Matlab doesn’t count, unfortunately); Python’s succinct syntax makes it a great language for coding interviews.

Prep tips:

If you get nervous in interviews, try doing some practice problems under time pressure.
If you don’t have much software engineering experience, see if you can get a friend to look over your practice code and provide feedback.

During the interview:

Make sure you understand exactly what problem you’re trying to solve. Ask the interviewer questions if anything is unclear or underspecified.
Make sure you explain your plan to the interviewer before you start writing any code, so that they can help you avoid spending time going down less-than-ideal paths.
If you can’t think of a good way to do something, it often helps to start by talking through a dumb way to do it.
Mention what invalid inputs you’d want to check for (e.g. input variable type check). Don’t bother writing the code to do so unless the interviewer asks. In all my interviews, nobody has ever asked.
Before declaring that your code is finished, think about variable initialization, end conditions, and boundary cases (e.g. empty inputs). If it seems helpful, run through an example. You’ll score points by catching your bugs yourself, rather than having the interviewer point them out.

Applied machine learning

All the applied machine learning interviews I’ve had focused on supervised learning. The interviewer will present you with a prediction problem, and ask you to explain how you would set up an algorithm to make that prediction. The problem selected is often relevant to the company you’re interviewing at (e.g. figuring out which product to recommend to a user, which users are going to stop using the site, which ad to display, etc.), but can also be a toy example (e.g. recommending board games to a friend). This type of interview doesn’t depend on much background knowledge, other than having a general understanding of machine learning concepts (see below). However, it definitely helps to prepare by brainstorming the types of problems a particular company might ask you to solve. Even if you miss the mark, the brainstorming session will help with the culture fit interview (also see below).

When answering this type of question, I’ve found it helpful to start by laying out the setup of the problem. What are the inputs? What are the labels you’re trying to predict? What machine learning algorithms could you run on the data? Sometimes the setup will be obvious from the question, but sometimes you’ll need to figure out how to define the problem. In the latter case, you’ll generally have a discussion with the interviewer about some plausible definitions (e.g., what does it mean for a user to “stop using the site”?).

The main component of your answer will be feature engineering. There is nothing magical about brainstorming features. Think about what might be predictive of the variable you are trying to predict, and what information you would actually have available. I’ve found it helpful to give context around what I’m trying to capture, and to what extent the features I’m proposing reflect that information.

For the sake of concreteness, here’s an example. Suppose Amazon is trying to figure out what books to recommend to you. (Note: I did not interview at Amazon, and have no idea what they actually ask in their interviews.) To predict what books you’re likely to buy, Amazon can look for books that are similar to your past Amazon purchases. But maybe some purchases were mistakes, and you vowed to never buy a book like that again. Well, Amazon knows how you’ve interacted with your Kindle books. If there’s a book you started but never finished, it might be a positive signal for general areas you’re interested in, but a negative signal for the particular author. Or maybe some categories of books deserve different treatment. For example, if a year ago you were buying books targeted at one-year-olds, Amazon could deduce that nowadays you’re looking for books for two-year-olds. It’s easy to see how you can spend a while exploring the space between what you’d like to know and what you can actually find out.

Your background

You should be prepared to give a high-level summary of your career, as well as to do a deep-dive into a project you’ve worked on. The project doesn’t have to be directly related to the position you’re interviewing for (though it can’t hurt), but it needs to be the kind of work you can have an in-depth technical discussion about.

To prepare:

Review any papers/presentations that came out of your projects to refresh your mind on the technical details.
Practice explaining your project to a friend in order to make sure you are telling a coherent story. Keep in mind that you’ll probably be talking to someone who’s smart but doesn’t have expertise in your particular field.
Be prepared to answer questions as to why you chose the approach that you did, and about your individual contribution to the project.

Culture fit

Here are some culture fit questions your interviewers are likely to be interested in. These questions might come up as part of other interviews, and will likely be asked indirectly. It helps to keep what the interviewer is looking for in the back of your mind.

Are you specifically interested in the product/company/space you’d be working in? It helps to prepare by thinking about the problems the company is trying to solve, and how you and the team you’d be part of could make a difference.
Do you care about impact? Even in a research-oriented corporate environment, I wouldn’t recommend saying that you don’t care about company metrics, and that you’d love to just play with data and write papers.
Will you work well with other people? I know it’s a cliché, but most work is collaborative, and companies are trying to assess this as best they can. Avoid bad-mouthing former colleagues, and show appreciation for their contributions to your projects.
Are you willing to get your hands dirty? If there’s annoying work that needs to be done (e.g. cleaning up messy data), will you take care of it?
Are you someone the team will be happy to have around on a personal level? Even though you might be stressed, try to be friendly, positive, enthusiastic and genuine throughout the interview process.

You may also get broad questions about what kinds of work you enjoy and what motivates you. It’s useful to have an answer ready, but there may not be a “right” answer the interviewer is looking for.

Machine learning theory

This type of interview will test your understanding of basic machine learning concepts, generally with a focus on supervised learning. You should understand:

The general setup for a supervised learning system
Why you want to split data into training and test sets
The idea that models that aren’t powerful enough can’t capture the right generalizations about the data, and ways to address this (e.g. different model or projection into a higher-dimensional space)
The idea that models that are too powerful suffer from overfitting, and ways to address this (e.g. regularization)

You don’t need to know a lot of machine learning algorithms, but you definitely need to understand logistic regression, which seems to be what most companies are using. I also had some in-depth discussions of SVMs, but that may just be because I brought them up.

Dataset analysis

In this type of interview, you will be given a data set, and asked to write a script to pull out features for some prediction task. You may be asked to then plug the features into a machine learning algorithm. This interview essentially adds an implementation component to the applied machine learning interview (see above). Of course, your features may now be inspired by what you see in the data. Do the distributions for each feature you’re considering differ between the labels you’re trying to predict?

I found these interviews hardest to prepare for, because the recruiter often wouldn’t tell me what format the data would be in, and what exactly I’d need to do with it. (For example, do I need to review Python’s csv import module? Should I look over the syntax for training a model in scikit-learn?) I also had one recruiter tell me I’d be analyzing “big data”, which was a bit intimidating (am I going to be working with distributed databases or something?) until I discovered at the interview that the “big data” set had all of 11,000 examples. I encourage you to push for as much info as possible about what you’ll actually be doing.

If you plan to use Python, working through the scikit-learn tutorial is a good way to prepare.

Stats

I have a decent intuitive understanding of statistics, but very little formal knowledge. Most of the time, this sufficed, though I’m sure knowing more wouldn’t have hurt. You should understand how to set up an A/B test, including random sampling, confounding variables, summary statistics (e.g. mean), and measuring statistical significance.

Preparation Checklist & Resources

Here is a summary list of tips for preparing for data science interviews, along with a few helpful resources.

Coding (usually whiteboard)
- Get comfortable with basic algorithms, data structures and figuring out algorithm complexity.
- Practice writing code away from the computer in your programming language of choice.
- Resources:
  - Pretty exhaustive list of what you might encounter in an interview
  - Many interview prep books, e.g. Cracking the Coding Interview
Applied machine learning
- Think about the machine learning problems that are relevant for each company you’re interviewing at. Use these problems as practice questions.
Your background
- Think through how to summarize your experience.
- Prepare to give an in-depth technical explanation of a project you’ve worked on. Try it out on a friend.
Culture fit
- Think about the problems each company is trying to solve, and how you and the team you’d be part of could make a difference.
- Be prepared to answer broad questions about what kind of work you enjoy and what motivates you.
Machine learning theory
- Understand machine learning concepts on an intuitive level, focusing especially on supervised learning.
- Learn the math behind logistic regression.
- Resources:
  - The Shape of Data blog provides a nice intuitive overview.
  - A Few Useful Things to Know about Machine Learning
  - To really go in depth, check out Andrew Ng’s Stanford machine learning course on Coursera or OpenClassroom.
Dataset analysis
- Get comfortable with a set of technical tools for working with data.
- Resources:
  - If you plan to use Python, work through the scikit-learn tutorial (you could skip section 2.4).
Stats
- Get familiar with how to set up an A/B test.
- Resources:
  - Quora answer about how to prepare for interview questions about A/B testing
  - How not to run an A/B test
  - Sample size calculator, which you can use to get some intuition about sample sizes required based on the sensitivity (i.e. minimal detectable effect) and statistical significance you’re looking for

Are there things I missed? Other resources you’d recommend? Please comment!

20 thoughts on “How to Ace a Data Science Interview”

Sean says:

on October 3, 2014 at 11:38 am

Curious as to where you landed…

LikeLiked by 1 person

- alyaabbott says:
  
  on October 8, 2014 at 8:33 pm
  
  I’ll be joining the data science team at Elance-oDesk.
  
  LikeLiked by 1 person
  
eyalkazin says:

on October 6, 2014 at 4:08 am

Excellent review Alya!
Thanks for sharing your insights.

LikeLiked by 1 person

alyaabbott says:

on November 3, 2014 at 8:27 pm

A data scientist friend suggested some additional things to keep in mind:

– I’d definitely be familiar with MapReduce concepts. It’s fine if interviewees haven’t used MapReduce themselves, but they should know what Mappers and Reducers are, and how they work. (even if companies rarely use raw MR now anyways)

– I agree supervised learning questions are more common, but I see basic unsupervised questions come up now and then. (So make sure to know k-means at least.)

– This is probably more common in “stats-y” data science interviews (as opposed to the “machine learning” data science interviews I think you’re focusing on), but for those types, I’d be prepared to talk about what metrics I’d choose to measure the success of a product.

LikeLike

crm says:

on January 13, 2015 at 8:37 pm

A comprehensive coverage of Machine Learning Interview questions at one place.
http://www.allaboutcomputerscience.com/2014/12/machine-learning-interviews-refresher.html

LikeLike

Pingback: Preparing for a Data Science Interview: Part I (Link Roundup) | random measures
softinx says:

on September 25, 2015 at 2:07 am

Nice tips

LikeLike

riddhi says:

on September 25, 2015 at 2:45 am

Thanks for the article..

LikeLike

printable Calendar 2016 says:

on November 26, 2015 at 11:46 pm

This is my first time visit at here and i am really happy to read everthing at one place.

LikeLike

Hao says:

on December 12, 2015 at 4:39 pm

Thanks Alyaa, your tips are extremely helpful in my preparation process.

LikeLike

Neer Kumar says:

on March 18, 2016 at 6:49 pm

Thanks. This was so useful. I also recently went through many job interviews and finally managed to get an offer for a job as a data scientist! I am so happy about it and your blog definitely helped me a lot!
Many of your sections describe exactly what I got as questions. Especially I would always get the applied machine learning and the A/B test questions. One slight difference I noticed is that recently companies started skipping the phone interview and they send you a takehome test. If you pass that, you go to the onsite. I’d say 80% of companies used that approach. For that, it helped me a lot to prepare using the collection of data science takehome challenge book as well as practicing my R skills extensively.
Personally, I don’t like the takehome tests cause they tend to be much easier for people with work experience and harder for people out of grad school, but anyway.

LikeLike

- alyaabbott says:
  
  on March 19, 2016 at 8:37 pm
  
  Thank you! It’s great to hear that the post was helpful, and I’m sure other readers will appreciate the update.
  
  LikeLike
  
Destiny says:

on May 3, 2016 at 2:27 pm

Hi Alya. Thank you very mush for your post. I have recently graduated from a Data science degree, and I already feel less alone because I am experiencing some issues you have to deal with during your job search.

LikeLike

tsien2015 says:

on August 22, 2016 at 9:02 pm

Thanks for this great post! It really helps a lot!

LikeLike

Pingback: How to Become a Data Scientist - Springboard Blog
Pingback: ŷhat | Becoming a Data Scientist – fengtasy blog
Pingback: How to Ace a Data Science Interview | Perplexity is the beginning of knowledge
Pingback: 如何成为一名数据科学家（附学习资源） | 数盟社区
Beh3d says:

on August 7, 2017 at 6:58 pm

Thanks for the post. Very useful and productive!

LikeLiked by 1 person

ajayram198 says:

on November 8, 2019 at 3:08 am

Nice tips. Other questions that is a favourite among interviewers in ML theory is the curse of dimensionality, the Bias-Variance Trade off, Anomaly Detection, difference between Correlation and Causation.

LikeLike