-
Notifications
You must be signed in to change notification settings - Fork 1
/
SearchingForTheBestModel.xml
59 lines (46 loc) · 2.62 KB
/
SearchingForTheBestModel.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
<?xml version="1.0"?>
<presentation>
<info company="Data Science" logo="ga-logo.png"/>
<title title= "Naive Search Methods and Less Naive Methods" date="October 21st, 2014" people="Jarret Petrillo, jarret.petrillo@gmail.com"/>
<section name="1,125,899,907,000,000"/>
<imageslide title="Compare CV across all subsets" image='allsubsets.png'/>
<imageslide title="Could we have found the best model quicker?" image='modelsubsets1.png' />
<imageslide title="Derivative models that start with good features do well" image='modelsubsets2.png'/>
<imagesslide title="Derivative models that start with good features do well" image='modelsubsets3.png'/>
<codeslide title="Pseudo Code: Naive Model Search Algorithm">
<codeblock>
<l focus="1">1. Start with a list of features</l>
<l focus="1">2. Use itertools to find all combinations (2^n!)</l>
<l focus="1">3. For each subset fit a linear regression model</l>
<l focus="1">4. Calculate cross-validated MSE with a test set</l>
<l focus="1">5. Choose the model with the lowest mean squared error</l>
</codeblock>
</codeslide>
<codeslide title="Pseudo Code: Iterative Search Algorithm">
<codeblock>
<l focus="1">1. Start with a list of features (n)</l>
<l focus="1">2. Run n simple linear regression models</l>
<l focus="1">3. Calculate cross-validated MSE for each model</l>
<l focus="1">4. Save the best feature</l>
<l indent="1">It will be in every subsequent model!</l>
<l focus="1">5. Consider only two feature models that contain the first (n-1)</l>
<l focus="1">6. For each new model fit a linear regression model</l>
<l focus="1">7. Calculate cross-validated MSE</l>
<l focus="1">8. Save the best features</l>
<l focus="1">9. Consider only three feature models that contain the best two!</l>
<l focus="1">Repeat!</l>
<l focus="1">Stop when the MSE gets worse with any added feature</l>
</codeblock>
</codeslide>
<imageslide title="Performs almost as well as naive method!" image='modelsearch.png'/>
<section name="Appendix"/>
<data type="line" title= "Example Data Slide">
<p>Description of data description of data description of data description of data description of data description of data description of data description of data</p>
<chart title="Revenue">1,3,4,7,8,11,8,4,3,5</chart>
</data>
<slide title= "Example Standard Slide">
<p>“Abstraction: the process of determining the important characteristics and ignoring other details”</p>
<p>“Plan to throw away the first version”</p>
<p>“There is surely nothing quite so useless as doing with great efficiency what should not be done at all”</p>
</slide>
</presentation>