Data Scientist Interview Questions – Explain what precision and recall are.
After the predictive model has been finished, the most important question is: How good is it? Does it predict well?
Evaluating the model is one of the most important tasks in the data science project, it indicates how good predictions are. Very often for classification problems we look at metrics called precision and recall, to define them in detail let’s quickly introduce confusion matrix first.
Confusion Matrix for binary classification is made of four simple ratios:
- True Negative(TN): case was true negative and predicted negative
- True Positive(TP): case was true positive and predicted positive
- False Negative(FN): case was true positive but predicted negative
- False Positive(FP): case was true negative but predicted positive
Understanding the confusion matrix, calculating precision and recall is easy.
Precision – is the ratio of correctly predicted positive observations to the total predicted positive observations, or what percent of positive predictions were correct?
Precision = TP/TP+FP
Recall – also called sensitivity, is the ratio of correctly predicted positive observations to all observations in actual class – yes, or what percent of the positive cases did you catch?
Recall = TP/TP+FN
There are also two more useful matrices coming from confusion matrix, Accuracy – correctly predicted observation to the total observations and F1 score the weighted average of Precision and Recall. Although intuitively it is not as easy to understand as accuracy, the F1 score is usually more useful than accuracy, especially if you have an uneven class distribution.
Example Python Code to get Precision and Recall:
from sklearn.linear_model import LogisticRegression from sklearn import datasets from sklearn.cross_validation import train_test_split from sklearn.metrics import precision_recall_fscore_support as score data = datasets.load_iris() X = data['data'] y = data['target'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) model = LogisticRegression() model.fit(X_train,y_train) preds = model.predict(X_test) precision, recall, fscore, support = score(y_test, preds) print('precision:',precision) print('recall:',recall)
Was the above useful? Please share with others on social media.
Recommended reading list:
|Data Science Interviews Exposed
Data Science Interviews Exposed offers data science career advice and REAL interview questions to help you get the six-figures salary jobs! A data science job is extremely rewarding. It empowers to you make real impact in the world! And besides, it offers competitive salaries, and it develops your creative as well as quantitative skills. No wonder the data science job is rated as one of the sexist jobs in 21st century.
|The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists
The Data Science Handbook contains interviews with 25 of the world s best data scientists. We sat down with them, had in-depth conversations about their careers, personal stories, perspectives on data science and life advice. In The Data Science Handbook, you will find war stories from DJ Patil, US Chief Data Officer and one of the founders of the field. You ll learn industry veterans such as Kevin Novak and Riley Newman, who head the data science teams at Uber and Airbnb respectively.
|Getting a Big Data Job For Dummies
Hone your analytic talents and become part of the next big thing
Getting a Big Data Job For Dummies is the ultimate guide to landing a position in one of the fastest-growing fields in the modern economy. Learn exactly what "big data" means, why it's so important across all industries, and how you can obtain one of the most sought-after skill sets of the decade. This book walks you through the process of identifying your ideal big data job, shaping the perfect resume, and nailing the interview, all in one easy-to-read guide.
|A collection of Data Science Interview Questions Solved in Python and Spark: Hands-on Big Data and Machine Learning (A Collection of Programming Interview Questions) (Volume 6)
|Developing Analytic Talent: Becoming a Data Scientist
Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value.
|Practical Statistics for Data Scientists: 50 Essential Concepts
Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.
|Python Data Science Handbook: Essential Tools for Working with Data
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.
|Doing Data Science: Straight Talk from the Frontline
Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.