Assessment

The following question set is adapted from the pre-interview assessment form used in [Algoritma]'s (https://algorit.ma) hiring for teaching members of the team.

Fundamentals

Assuming a simple linear regression (ordinary least squares) trained on a dataset with one predictor. This model is likely to exhibit:
- A high bias and high variance
- A low bias and low variance
- A high bias but low variance
- A low bias but high variance
Which of the following is the most fitting definition of p-value:
- Probability of obtaining a result or value more extreme than was observed
- Probability of a null hypothesis to evaluate to False
- Probability of the alternate hypothesis to be correct
- Probability of a variable being insignificant to the true parameters of a model
Which is the correct formula for calculating model's sensitivity?
- True Positives / (True Positives + False Negatives)
- True Positives / (True Positives + False Positives)
- True Positives / Total Predictions
- True Negatives / Total Predictions

Refer to threemodels.png directly in the same directory if the image is not rendered for the following question.

You want a model that identify hateful tweets on Twitter and you're presented with three candidates (Model A, Model B, and Model C). You are asked to pick the model with the highest precision. Which of the following models have the highest precision?
- Model A
- Model B
- Model C
We want to be confident that our model can perform reasonably in real world environments, and not overfitted to the dataset it was trained on. What is a strategy that greatly diminish the possibility of overfitting?
- Gradient optimization
- Grid Search
- Train-Test Splitting
One difference between a supervised learning task and an unsupervised learning task is the presence of a target variable. Which of the following best describes a target variable?
- A target variable is also an indendent variable
- Target variable is an isolated variable taken in a separate data collection process
- Target variable is dependent to independent variable

Practical Hands-On

Download analytics.csv, which is export as-is from the company's Google Analytics dashboard. Values in the Language column is formatted to capture both the client (browser) language and keyboard language, but for this exercise we're only interested about the former. A value of en-id should hence be stored as en, and a value of id-jp should similarly be id. Fill missing values with missing. This should result in en, id, th and missing as valid values in the Language column. Which language has on average, the highest Pages / Session count?
- en or English
- id or Indonesian
- th or Thai
Use any tools of your choice, run a closed-form, simple linear regression to predict Goal Conversion Rate (target) using the values of Pages / Session (predictor). Call this model_A. What is the multiple R-squared from your simple linear regression, model_A, rounded to 3 decimal points? You can retrieve this value through sklearn.metrics.r2_score or summary(model)$r.squared
- 0.786
- 0.826
- 0.866
Let beta0 be the intercept and beta1 be your slope. What is the value of beta0?
- -25.188
- 8.65
- 0
- 0.00268
Add Language as an additional predictor to the earlier linear regression model. Call this model_B. Did your multiple R-squared model improved as a result? Compare the adjusted R-squared of two models model_A and model_B.
- model_A has a higher multiple R2 and adjusted R2 value
- model_B has a higher multiple R2 and adjusted R2 value
- model_A has a higher multiple R2 but lower adjusted R2 value
- model_B has a higher multiple R2 but lower adjusted R2 value

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quiz.md

quiz.md

Assessment

Fundamentals

Practical Hands-On

Files

quiz.md

Latest commit

History

quiz.md

File metadata and controls

Assessment

Fundamentals

Practical Hands-On