Skip to content

Commit

Permalink
Updated the lab
Browse files Browse the repository at this point in the history
  • Loading branch information
NovaVolunteer committed Oct 19, 2023
1 parent 6600aae commit 7505c01
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 27 deletions.
46 changes: 19 additions & 27 deletions 08_DT_Class/DT_Class_Lab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,13 @@
"\n",
"Your boss is the head of the studio and wants to know if they can gain a competitive advantage by predicting new movies that might get high imdb scores (movie rating). \n",
"\n",
"You would like to be able to explain the model to the mere mortals but need a fairly robust and flexible approach so you've chosen to use decision trees to get started. \n",
"You would like to be able to explain the model to mere mortals but need a fairly robust and flexible approach so you've chosen to use decision trees to get started. \n",
"\n",
"In doing so, similar to great data scientists of the past you remembered the excellent education provided to you at UVA in a undergrad data science course and have outline 20ish steps that will need to be undertaken to complete this task. As always, you will need to make sure to #comment your work heavily. \n",
"\n",
" Footnotes: \n",
"-\tYou can add or combine steps if needed\n",
"-\tAlso, remember to try several methods during evaluation and always be \n",
"mindful of how the model will be used in practice.\n",
"-\tAlso, remember to try several methods during evaluation and always be mindful of how the model will be used in practice.\n",
"- Make sure all your variables are the correct type (factor, character,numeric, etc.)"
]
},
Expand Down Expand Up @@ -57,126 +56,119 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#4 Check that categorical variables don't need to be combined or collapsed."
"#4 Guess what, you don't need to scale the data, because DTs don't require this to be done, they make local greedy decisions...keeps getting easier, go to the next step."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#5 Guess what, you don't need to scale the data, because DTs don't require this to be done, they make local greedy decisions...keeps getting easier, go to the next step."
"#5 Determine the baserate or prevalence for the classifier, what does this number mean?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#6 Determine the baserate or prevalence for the classifier, what does this number mean?"
"#6 Split your data into test, tune, and train. (80/10/10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#7 Split your data into test, tune, and train. (80/10/10)"
"#7 Create the kfold object for cross validation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#8 Create the kfold object for cross validation."
"#8 Create the scoring metric you will use to evaluate your model and the max depth hyperparameter "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#9 Create the scoring metric you will use to evaluate your model and the max depth hyperparameter "
"#9 Build the classifier object "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#10 Build the classifier object "
"#10 Use the kfold object and the scoring metric to find the best hyperparameter value for max depth via the grid search method."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#11 Use the kfold object and the scoring metric to find the best hyperparameter value for max depth via the grid search method."
"#11 Fit the model to the training data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#12 Fit the model to the training data."
"#12 What is the best depth value?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#13 What is the best depth value?"
"#13 Print out the model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#14 Print out the model"
"#14 View the results, comment on how the model performed "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#15 View the results, comment on how the model performed "
"#15 Which variables appear to be contributing the most (variable importance) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#16 Which variables appear to be contributing the most (variable importance) "
"#16 Use the predict method on the test data and print out the results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#17 Use the predict method on the test data and print out the results."
"#17 How does the model perform on the test data?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#18 How does the model perform on the test data?"
"#18 Print out the confusion matrix for the test data, what does it tell you about the model?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#19 Print out the confusion matrix for the test data, what does it tell you about the model?"
"#19 What are the top 3 movies based on the test set? Which variables are most important in predicting the top 3 movies?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#20 What are the top 3 movies based on the test set? Which variables are most important in predicting the top 3 movies?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#21 Summarize what you learned along the way and make recommendations on how this could be used moving forward, being careful not to over promise."
"#20 Summarize what you learned along the way and make recommendations on how this could be used moving forward, being careful not to over promise."
]
}
],
Expand Down
Binary file modified 08_DT_Class/Decision_Trees_3.24.22.pptx
Binary file not shown.

0 comments on commit 7505c01

Please sign in to comment.