diff --git a/Notebooks/02_data_wrangling.ipynb b/Notebooks/02_data_wrangling.ipynb index a52eb6c24..93632d82d 100644 --- a/Notebooks/02_data_wrangling.ipynb +++ b/Notebooks/02_data_wrangling.ipynb @@ -1,2813 +1,3222 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 2 Data wrangling" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.1 Contents\n", - "* [2 Data wrangling](#2_Data_wrangling)\n", - " * [2.1 Contents](#2.1_Contents)\n", - " * [2.2 Introduction](#2.2_Introduction)\n", - " * [2.2.1 Recap Of Data Science Problem](#2.2.1_Recap_Of_Data_Science_Problem)\n", - " * [2.2.2 Introduction To Notebook](#2.2.2_Introduction_To_Notebook)\n", - " * [2.3 Imports](#2.3_Imports)\n", - " * [2.4 Objectives](#2.4_Objectives)\n", - " * [2.5 Load The Ski Resort Data](#2.5_Load_The_Ski_Resort_Data)\n", - " * [2.6 Explore The Data](#2.6_Explore_The_Data)\n", - " * [2.6.1 Find Your Resort Of Interest](#2.6.1_Find_Your_Resort_Of_Interest)\n", - " * [2.6.2 Number Of Missing Values By Column](#2.6.2_Number_Of_Missing_Values_By_Column)\n", - " * [2.6.3 Categorical Features](#2.6.3_Categorical_Features)\n", - " * [2.6.3.1 Unique Resort Names](#2.6.3.1_Unique_Resort_Names)\n", - " * [2.6.3.2 Region And State](#2.6.3.2_Region_And_State)\n", - " * [2.6.3.3 Number of distinct regions and states](#2.6.3.3_Number_of_distinct_regions_and_states)\n", - " * [2.6.3.4 Distribution Of Resorts By Region And State](#2.6.3.4_Distribution_Of_Resorts_By_Region_And_State)\n", - " * [2.6.3.5 Distribution Of Ticket Price By State](#2.6.3.5_Distribution_Of_Ticket_Price_By_State)\n", - " * [2.6.3.5.1 Average weekend and weekday price by state](#2.6.3.5.1_Average_weekend_and_weekday_price_by_state)\n", - " * [2.6.3.5.2 Distribution of weekday and weekend price by state](#2.6.3.5.2_Distribution_of_weekday_and_weekend_price_by_state)\n", - " * [2.6.4 Numeric Features](#2.6.4_Numeric_Features)\n", - " * [2.6.4.1 Numeric data summary](#2.6.4.1_Numeric_data_summary)\n", - " * [2.6.4.2 Distributions Of Feature Values](#2.6.4.2_Distributions_Of_Feature_Values)\n", - " * [2.6.4.2.1 SkiableTerrain_ac](#2.6.4.2.1_SkiableTerrain_ac)\n", - " * [2.6.4.2.2 Snow Making_ac](#2.6.4.2.2_Snow_Making_ac)\n", - " * [2.6.4.2.3 fastEight](#2.6.4.2.3_fastEight)\n", - " * [2.6.4.2.4 fastSixes and Trams](#2.6.4.2.4_fastSixes_and_Trams)\n", - " * [2.7 Derive State-wide Summary Statistics For Our Market Segment](#2.7_Derive_State-wide_Summary_Statistics_For_Our_Market_Segment)\n", - " * [2.8 Drop Rows With No Price Data](#2.8_Drop_Rows_With_No_Price_Data)\n", - " * [2.9 Review distributions](#2.9_Review_distributions)\n", - " * [2.10 Population data](#2.10_Population_data)\n", - " * [2.11 Target Feature](#2.11_Target_Feature)\n", - " * [2.11.1 Number Of Missing Values By Row - Resort](#2.11.1_Number_Of_Missing_Values_By_Row_-_Resort)\n", - " * [2.12 Save data](#2.12_Save_data)\n", - " * [2.13 Summary](#2.13_Summary)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.2 Introduction" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This step focuses on collecting your data, organizing it, and making sure it's well defined. Paying attention to these tasks will pay off greatly later on. Some data cleaning can be done at this stage, but it's important not to be overzealous in your cleaning before you've explored the data to better understand it." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.2.1 Recap Of Data Science Problem" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The purpose of this data science project is to come up with a pricing model for ski resort tickets in our market segment. Big Mountain suspects it may not be maximizing its returns, relative to its position in the market. It also does not have a strong sense of what facilities matter most to visitors, particularly which ones they're most likely to pay more for. This project aims to build a predictive model for ticket price based on a number of facilities, or properties, boasted by resorts (*at the resorts).* \n", - "This model will be used to provide guidance for Big Mountain's pricing and future facility investment plans." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.2.2 Introduction To Notebook" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Notebooks grow organically as we explore our data. If you used paper notebooks, you could discover a mistake and cross out or revise some earlier work. Later work may give you a reason to revisit earlier work and explore it further. The great thing about Jupyter notebooks is that you can edit, add, and move cells around without needing to cross out figures or scrawl in the margin. However, this means you can lose track of your changes easily. If you worked in a regulated environment, the company may have a a policy of always dating entries and clearly crossing out any mistakes, with your initials and the date.\n", - "\n", - "**Best practice here is to commit your changes using a version control system such as Git.** Try to get into the habit of adding and committing your files to the Git repository you're working in after you save them. You're are working in a Git repository, right? If you make a significant change, save the notebook and commit it to Git. In fact, if you're about to make a significant change, it's a good idea to commit before as well. Then if the change is a mess, you've got the previous version to go back to.\n", - "\n", - "**Another best practice with notebooks is to try to keep them organized with helpful headings and comments.** Not only can a good structure, but associated headings help you keep track of what you've done and your current focus. Anyone reading your notebook will have a much easier time following the flow of work. Remember, that 'anyone' will most likely be you. Be kind to future you!\n", - "\n", - "In this notebook, note how we try to use well structured, helpful headings that frequently are self-explanatory, and we make a brief note after any results to highlight key takeaways. This is an immense help to anyone reading your notebook and it will greatly help you when you come to summarise your findings. **Top tip: jot down key findings in a final summary at the end of the notebook as they arise. You can tidy this up later.** This is a great way to ensure important results don't get lost in the middle of your notebooks." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this, and subsequent notebooks, there are coding tasks marked with `#Code task n#` with code to complete. The `___` will guide you to where you need to insert code." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.3 Imports" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Placing your imports all together at the start of your notebook means you only need to consult one place to check your notebook's dependencies. By all means import something 'in situ' later on when you're experimenting, but if the imported dependency ends up being kept, you should subsequently move the import statement here with the rest." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 1#\n", - "#Import pandas, matplotlib.pyplot, and seaborn in the correct lines below\n", - "import ___ as pd\n", - "import ___ as plt\n", - "import ___ as sns\n", - "import os\n", - "\n", - "from library.sb_utils import save_file\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.4 Objectives" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are some fundamental questions to resolve in this notebook before you move on.\n", - "\n", - "* Do you think you may have the data you need to tackle the desired question?\n", - " * Have you identified the required target value?\n", - " * Do you have potentially useful features?\n", - "* Do you have any fundamental issues with the data?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.5 Load The Ski Resort Data" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "# the supplied CSV data file is the raw_data directory\n", - "ski_data = pd.read_csv('../raw_data/ski_resort_data.csv')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Good first steps in auditing the data are the info method and displaying the first few records with head." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 2#\n", - "#Call the info method on ski_data to see a summary of the data\n", - "ski_data.___" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`AdultWeekday` is the price of an adult weekday ticket. `AdultWeekend` is the price of an adult weekend ticket. The other columns are potential features." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This immediately raises the question of what quantity will you want to model? You know you want to model the ticket price, but you realise there are two kinds of ticket price!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "#Code task 3#\n", - "#Call the head method on ski_data to print the first several rows of the data\n", - "ski_data.___" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The output above suggests you've made a good start getting the ski resort data organized. You have plausible column headings. You can already see you have a missing value in the `fastEight` column" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.6 Explore The Data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.6.1 Find Your Resort Of Interest" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Your resort of interest is called Big Mountain Resort. Check it's in the data:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 4#\n", - "#Filter the ski_data dataframe to display just the row for our resort with the name 'Big Mountain Resort'\n", - "#Hint: you will find that the transpose of the row will give a nicer output. DataFrame's do have a\n", - "#transpose method, but you can access this conveniently with the `T` property.\n", - "ski_data[ski_data.Name == ___].___" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It's good that your resort doesn't appear to have any missing values." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.6.2 Number Of Missing Values By Column" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Count the number of missing values in each column and sort them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 5#\n", - "#Count (using `.sum()`) the number of missing values (`.isnull()`) in each column of \n", - "#ski_data as well as the percentages (using `.mean()` instead of `.sum()`).\n", - "#Order them (increasing or decreasing) using sort_values\n", - "#Call `pd.concat` to present these in a single table (DataFrame) with the helpful column names 'count' and '%'\n", - "missing = ___([ski_data.___.___, 100 * ski_data.___.___], axis=1)\n", - "missing.columns=[___, ___]\n", - "missing.___(by=___)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`fastEight` has the most missing values, at just over 50%. Unfortunately, you see you're also missing quite a few of your desired target quantity, the ticket price, which is missing 15-16% of values. `AdultWeekday` is missing in a few more records than `AdultWeekend`. What overlap is there in these missing values? This is a question you'll want to investigate. You should also point out that `isnull()` is not the only indicator of missing data. Sometimes 'missingness' can be encoded, perhaps by a -1 or 999. Such values are typically chosen because they are \"obviously\" not genuine values. If you were capturing data on people's heights and weights but missing someone's height, you could certainly encode that as a 0 because no one has a height of zero (in any units). Yet such entries would not be revealed by `isnull()`. Here, you need a data dictionary and/or to spot such values as part of looking for outliers. Someone with a height of zero should definitely show up as an outlier!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.6.3 Categorical Features" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So far you've examined only the numeric features. Now you inspect categorical ones such as resort name and state. These are discrete entities. 'Alaska' is a name. Although names can be sorted alphabetically, it makes no sense to take the average of 'Alaska' and 'Arizona'. Similarly, 'Alaska' is before 'Arizona' only lexicographically; it is neither 'less than' nor 'greater than' 'Arizona'. As such, they tend to require different handling than strictly numeric quantities. Note, a feature _can_ be numeric but also categorical. For example, instead of giving the number of `fastEight` lifts, a feature might be `has_fastEights` and have the value 0 or 1 to denote absence or presence of such a lift. In such a case it would not make sense to take an average of this or perform other mathematical calculations on it. Although you digress a little to make a point, month numbers are also, strictly speaking, categorical features. Yes, when a month is represented by its number (1 for January, 2 for Februrary etc.) it provides a convenient way to graph trends over a year. And, arguably, there is some logical interpretation of the average of 1 and 3 (January and March) being 2 (February). However, clearly December of one years precedes January of the next and yet 12 as a number is not less than 1. The numeric quantities in the section above are truly numeric; they are the number of feet in the drop, or acres or years open or the amount of snowfall etc." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 6#\n", - "#Use ski_data's `select_dtypes` method to select columns of dtype 'object'\n", - "ski_data.___(___)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You saw earlier on that these three columns had no missing values. But are there any other issues with these columns? Sensible questions to ask here include:\n", - "\n", - "* Is `Name` (or at least a combination of Name/Region/State) unique?\n", - "* Is `Region` always the same as `state`?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2.6.3.1 Unique Resort Names" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 7#\n", - "#Use pandas' Series method `value_counts` to find any duplicated resort names\n", - "ski_data['Name'].___.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You have a duplicated resort name: Crystal Mountain." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Q: 1** Is this resort duplicated if you take into account Region and/or state as well?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 8#\n", - "#Concatenate the string columns 'Name' and 'Region' and count the values again (as above)\n", - "(ski_data[___] + ', ' + ski_data[___]).___.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 9#\n", - "#Concatenate 'Name' and 'state' and count the values again (as above)\n", - "(ski_data[___] + ', ' + ski_data[___]).___.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "**NB** because you know `value_counts()` sorts descending, you can use the `head()` method and know the rest of the counts must be 1." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**A: 1** Your answer here" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameRegionstatesummit_elevvertical_dropbase_elevtramsfastEightfastSixesfastQuads...LongestRun_miSkiableTerrain_acSnow Making_acdaysOpenLastYearyearsOpenaverageSnowfallAdultWeekdayAdultWeekendprojectedDaysOpenNightSkiing_ac
104Crystal MountainMichiganMichigan113237575700.001...0.3102.096.0120.063.0132.054.064.0135.056.0
295Crystal MountainWashingtonWashington7012310044001NaN22...2.52600.010.0NaN57.0486.099.099.0NaNNaN
\n", - "

2 rows × 27 columns

\n", - "
" + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "yW8MqdcgTNrU" + }, + "source": [ + "# 2 Data wrangling" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CRMs1zpCTNrV" + }, + "source": [ + "## 2.1 Contents\n", + "* [2 Data wrangling](#2_Data_wrangling)\n", + " * [2.1 Contents](#2.1_Contents)\n", + " * [2.2 Introduction](#2.2_Introduction)\n", + " * [2.2.1 Recap Of Data Science Problem](#2.2.1_Recap_Of_Data_Science_Problem)\n", + " * [2.2.2 Introduction To Notebook](#2.2.2_Introduction_To_Notebook)\n", + " * [2.3 Imports](#2.3_Imports)\n", + " * [2.4 Objectives](#2.4_Objectives)\n", + " * [2.5 Load The Ski Resort Data](#2.5_Load_The_Ski_Resort_Data)\n", + " * [2.6 Explore The Data](#2.6_Explore_The_Data)\n", + " * [2.6.1 Find Your Resort Of Interest](#2.6.1_Find_Your_Resort_Of_Interest)\n", + " * [2.6.2 Number Of Missing Values By Column](#2.6.2_Number_Of_Missing_Values_By_Column)\n", + " * [2.6.3 Categorical Features](#2.6.3_Categorical_Features)\n", + " * [2.6.3.1 Unique Resort Names](#2.6.3.1_Unique_Resort_Names)\n", + " * [2.6.3.2 Region And State](#2.6.3.2_Region_And_State)\n", + " * [2.6.3.3 Number of distinct regions and states](#2.6.3.3_Number_of_distinct_regions_and_states)\n", + " * [2.6.3.4 Distribution Of Resorts By Region And State](#2.6.3.4_Distribution_Of_Resorts_By_Region_And_State)\n", + " * [2.6.3.5 Distribution Of Ticket Price By State](#2.6.3.5_Distribution_Of_Ticket_Price_By_State)\n", + " * [2.6.3.5.1 Average weekend and weekday price by state](#2.6.3.5.1_Average_weekend_and_weekday_price_by_state)\n", + " * [2.6.3.5.2 Distribution of weekday and weekend price by state](#2.6.3.5.2_Distribution_of_weekday_and_weekend_price_by_state)\n", + " * [2.6.4 Numeric Features](#2.6.4_Numeric_Features)\n", + " * [2.6.4.1 Numeric data summary](#2.6.4.1_Numeric_data_summary)\n", + " * [2.6.4.2 Distributions Of Feature Values](#2.6.4.2_Distributions_Of_Feature_Values)\n", + " * [2.6.4.2.1 SkiableTerrain_ac](#2.6.4.2.1_SkiableTerrain_ac)\n", + " * [2.6.4.2.2 Snow Making_ac](#2.6.4.2.2_Snow_Making_ac)\n", + " * [2.6.4.2.3 fastEight](#2.6.4.2.3_fastEight)\n", + " * [2.6.4.2.4 fastSixes and Trams](#2.6.4.2.4_fastSixes_and_Trams)\n", + " * [2.7 Derive State-wide Summary Statistics For Our Market Segment](#2.7_Derive_State-wide_Summary_Statistics_For_Our_Market_Segment)\n", + " * [2.8 Drop Rows With No Price Data](#2.8_Drop_Rows_With_No_Price_Data)\n", + " * [2.9 Review distributions](#2.9_Review_distributions)\n", + " * [2.10 Population data](#2.10_Population_data)\n", + " * [2.11 Target Feature](#2.11_Target_Feature)\n", + " * [2.11.1 Number Of Missing Values By Row - Resort](#2.11.1_Number_Of_Missing_Values_By_Row_-_Resort)\n", + " * [2.12 Save data](#2.12_Save_data)\n", + " * [2.13 Summary](#2.13_Summary)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OvD_0DroTNrW" + }, + "source": [ + "## 2.2 Introduction" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F76Xr_mSTNrW" + }, + "source": [ + "This step focuses on collecting your data, organizing it, and making sure it's well defined. Paying attention to these tasks will pay off greatly later on. Some data cleaning can be done at this stage, but it's important not to be overzealous in your cleaning before you've explored the data to better understand it." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VNzIumXfTNrW" + }, + "source": [ + "### 2.2.1 Recap Of Data Science Problem" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TbVp-JNDTNrW" + }, + "source": [ + "The purpose of this data science project is to come up with a pricing model for ski resort tickets in our market segment. Big Mountain suspects it may not be maximizing its returns, relative to its position in the market. It also does not have a strong sense of what facilities matter most to visitors, particularly which ones they're most likely to pay more for. This project aims to build a predictive model for ticket price based on a number of facilities, or properties, boasted by resorts (*at the resorts).*\n", + "This model will be used to provide guidance for Big Mountain's pricing and future facility investment plans." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MFnOENRuTNrX" + }, + "source": [ + "### 2.2.2 Introduction To Notebook" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XWXuohx2TNrX" + }, + "source": [ + "Notebooks grow organically as we explore our data. If you used paper notebooks, you could discover a mistake and cross out or revise some earlier work. Later work may give you a reason to revisit earlier work and explore it further. The great thing about Jupyter notebooks is that you can edit, add, and move cells around without needing to cross out figures or scrawl in the margin. However, this means you can lose track of your changes easily. If you worked in a regulated environment, the company may have a a policy of always dating entries and clearly crossing out any mistakes, with your initials and the date.\n", + "\n", + "**Best practice here is to commit your changes using a version control system such as Git.** Try to get into the habit of adding and committing your files to the Git repository you're working in after you save them. You're are working in a Git repository, right? If you make a significant change, save the notebook and commit it to Git. In fact, if you're about to make a significant change, it's a good idea to commit before as well. Then if the change is a mess, you've got the previous version to go back to.\n", + "\n", + "**Another best practice with notebooks is to try to keep them organized with helpful headings and comments.** Not only can a good structure, but associated headings help you keep track of what you've done and your current focus. Anyone reading your notebook will have a much easier time following the flow of work. Remember, that 'anyone' will most likely be you. Be kind to future you!\n", + "\n", + "In this notebook, note how we try to use well structured, helpful headings that frequently are self-explanatory, and we make a brief note after any results to highlight key takeaways. This is an immense help to anyone reading your notebook and it will greatly help you when you come to summarise your findings. **Top tip: jot down key findings in a final summary at the end of the notebook as they arise. You can tidy this up later.** This is a great way to ensure important results don't get lost in the middle of your notebooks." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t0HANEb0TNrX" + }, + "source": [ + "In this, and subsequent notebooks, there are coding tasks marked with `#Code task n#` with code to complete. The `___` will guide you to where you need to insert code." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zwBgG09yTNrX" + }, + "source": [ + "## 2.3 Imports" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6Fq2Zn5QTNrX" + }, + "source": [ + "Placing your imports all together at the start of your notebook means you only need to consult one place to check your notebook's dependencies. By all means import something 'in situ' later on when you're experimenting, but if the imported dependency ends up being kept, you should subsequently move the import statement here with the rest." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wmP-dUztTNrX" + }, + "outputs": [], + "source": [ + "#Code task 1#\n", + "#Import pandas, matplotlib.pyplot, and seaborn in the correct lines below\n", + "import ___ as pd\n", + "import ___ as plt\n", + "import ___ as sns\n", + "import os\n", + "\n", + "from library.sb_utils import save_file\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UQGE01sCTNrY" + }, + "source": [ + "## 2.4 Objectives" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "my0jzNgITNrY" + }, + "source": [ + "There are some fundamental questions to resolve in this notebook before you move on.\n", + "\n", + "* Do you think you may have the data you need to tackle the desired question?\n", + " * Have you identified the required target value?\n", + " * Do you have potentially useful features?\n", + "* Do you have any fundamental issues with the data?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AJ_yDDTOTNrY" + }, + "source": [ + "## 2.5 Load The Ski Resort Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xisHZeYQTNrY" + }, + "outputs": [], + "source": [ + "# the supplied CSV data file is the raw_data directory\n", + "ski_data = pd.read_csv('../raw_data/ski_resort_data.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qsrYRwzgTNrZ" + }, + "source": [ + "Good first steps in auditing the data are the info method and displaying the first few records with head." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FKJF5TsPTNrZ" + }, + "outputs": [], + "source": [ + "#Code task 2#\n", + "#Call the info method on ski_data to see a summary of the data\n", + "ski_data.___" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UdmiwzsfTNrZ" + }, + "source": [ + "`AdultWeekday` is the price of an adult weekday ticket. `AdultWeekend` is the price of an adult weekend ticket. The other columns are potential features." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1qYz_90dTNrZ" + }, + "source": [ + "This immediately raises the question of what quantity will you want to model? You know you want to model the ticket price, but you realise there are two kinds of ticket price!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "id": "dIV0jMHPTNrZ" + }, + "outputs": [], + "source": [ + "#Code task 3#\n", + "#Call the head method on ski_data to print the first several rows of the data\n", + "ski_data.___" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1jVulon8TNrZ" + }, + "source": [ + "The output above suggests you've made a good start getting the ski resort data organized. You have plausible column headings. You can already see you have a missing value in the `fastEight` column" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XvlkYub_TNrZ" + }, + "source": [ + "## 2.6 Explore The Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YY9MZE8STNrZ" + }, + "source": [ + "### 2.6.1 Find Your Resort Of Interest" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9Oa0yq6wTNra" + }, + "source": [ + "Your resort of interest is called Big Mountain Resort. Check it's in the data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "X3Mxy8QYTNra" + }, + "outputs": [], + "source": [ + "#Code task 4#\n", + "#Filter the ski_data dataframe to display just the row for our resort with the name 'Big Mountain Resort'\n", + "#Hint: you will find that the transpose of the row will give a nicer output. DataFrame's do have a\n", + "#transpose method, but you can access this conveniently with the `T` property.\n", + "ski_data[ski_data.Name == ___].___" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W0BMZnjnTNra" + }, + "source": [ + "It's good that your resort doesn't appear to have any missing values." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lclEAmviTNra" + }, + "source": [ + "### 2.6.2 Number Of Missing Values By Column" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z1RYZRtlTNra" + }, + "source": [ + "Count the number of missing values in each column and sort them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "V49AlHZoTNra" + }, + "outputs": [], + "source": [ + "#Code task 5#\n", + "#Count (using `.sum()`) the number of missing values (`.isnull()`) in each column of\n", + "#ski_data as well as the percentages (using `.mean()` instead of `.sum()`).\n", + "#Order them (increasing or decreasing) using sort_values\n", + "#Call `pd.concat` to present these in a single table (DataFrame) with the helpful column names 'count' and '%'\n", + "missing = ___([ski_data.___.___, 100 * ski_data.___.___], axis=1)\n", + "missing.columns=[___, ___]\n", + "missing.___(by=___)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Id9MvWO4TNra" + }, + "source": [ + "`fastEight` has the most missing values, at just over 50%. Unfortunately, you see you're also missing quite a few of your desired target quantity, the ticket price, which is missing 15-16% of values. `AdultWeekday` is missing in a few more records than `AdultWeekend`. What overlap is there in these missing values? This is a question you'll want to investigate. You should also point out that `isnull()` is not the only indicator of missing data. Sometimes 'missingness' can be encoded, perhaps by a -1 or 999. Such values are typically chosen because they are \"obviously\" not genuine values. If you were capturing data on people's heights and weights but missing someone's height, you could certainly encode that as a 0 because no one has a height of zero (in any units). Yet such entries would not be revealed by `isnull()`. Here, you need a data dictionary and/or to spot such values as part of looking for outliers. Someone with a height of zero should definitely show up as an outlier!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qzmZ5vf3TNra" + }, + "source": [ + "### 2.6.3 Categorical Features" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sWxiJWPkTNrb" + }, + "source": [ + "So far you've examined only the numeric features. Now you inspect categorical ones such as resort name and state. These are discrete entities. 'Alaska' is a name. Although names can be sorted alphabetically, it makes no sense to take the average of 'Alaska' and 'Arizona'. Similarly, 'Alaska' is before 'Arizona' only lexicographically; it is neither 'less than' nor 'greater than' 'Arizona'. As such, they tend to require different handling than strictly numeric quantities. Note, a feature _can_ be numeric but also categorical. For example, instead of giving the number of `fastEight` lifts, a feature might be `has_fastEights` and have the value 0 or 1 to denote absence or presence of such a lift. In such a case it would not make sense to take an average of this or perform other mathematical calculations on it. Although you digress a little to make a point, month numbers are also, strictly speaking, categorical features. Yes, when a month is represented by its number (1 for January, 2 for Februrary etc.) it provides a convenient way to graph trends over a year. And, arguably, there is some logical interpretation of the average of 1 and 3 (January and March) being 2 (February). However, clearly December of one years precedes January of the next and yet 12 as a number is not less than 1. The numeric quantities in the section above are truly numeric; they are the number of feet in the drop, or acres or years open or the amount of snowfall etc." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7H0myqwCTNrb" + }, + "outputs": [], + "source": [ + "#Code task 6#\n", + "#Use ski_data's `select_dtypes` method to select columns of dtype 'object'\n", + "ski_data.___(___)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VCXm808NTNrb" + }, + "source": [ + "You saw earlier on that these three columns had no missing values. But are there any other issues with these columns? Sensible questions to ask here include:\n", + "\n", + "* Is `Name` (or at least a combination of Name/Region/State) unique?\n", + "* Is `Region` always the same as `state`?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rMouUH1aTNrb" + }, + "source": [ + "#### 2.6.3.1 Unique Resort Names" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IrZxYtdjTNrb" + }, + "outputs": [], + "source": [ + "#Code task 7#\n", + "#Use pandas' Series method `value_counts` to find any duplicated resort names\n", + "ski_data['Name'].___.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZiobtpQ7TNrb" + }, + "source": [ + "You have a duplicated resort name: Crystal Mountain." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QFIRzh1UTNrb" + }, + "source": [ + "**Q: 1** Is this resort duplicated if you take into account Region and/or state as well?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IdcW8a8dTNrc" + }, + "outputs": [], + "source": [ + "#Code task 8#\n", + "#Concatenate the string columns 'Name' and 'Region' and count the values again (as above)\n", + "(ski_data[___] + ', ' + ski_data[___]).___.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "x8yHtL7rTNrc" + }, + "outputs": [], + "source": [ + "#Code task 9#\n", + "#Concatenate 'Name' and 'state' and count the values again (as above)\n", + "(ski_data[___] + ', ' + ski_data[___]).___.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2Aq40cySTNrc" + }, + "outputs": [], + "source": [ + "**NB** because you know `value_counts()` sorts descending, you can use the `head()` method and know the rest of the counts must be 1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CmcugecHTNrc" + }, + "source": [ + "**A: 1** Your answer here" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "78boDJS2TNrc", + "outputId": "57b4c86f-354d-4afc-ae4b-0470031f44f3" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameRegionstatesummit_elevvertical_dropbase_elevtramsfastEightfastSixesfastQuads...LongestRun_miSkiableTerrain_acSnow Making_acdaysOpenLastYearyearsOpenaverageSnowfallAdultWeekdayAdultWeekendprojectedDaysOpenNightSkiing_ac
104Crystal MountainMichiganMichigan113237575700.001...0.3102.096.0120.063.0132.054.064.0135.056.0
295Crystal MountainWashingtonWashington7012310044001NaN22...2.52600.010.0NaN57.0486.099.099.0NaNNaN
\n", + "

2 rows × 27 columns

\n", + "
" + ], + "text/plain": [ + " Name Region state summit_elev vertical_drop \\\n", + "104 Crystal Mountain Michigan Michigan 1132 375 \n", + "295 Crystal Mountain Washington Washington 7012 3100 \n", + "\n", + " base_elev trams fastEight fastSixes fastQuads ... LongestRun_mi \\\n", + "104 757 0 0.0 0 1 ... 0.3 \n", + "295 4400 1 NaN 2 2 ... 2.5 \n", + "\n", + " SkiableTerrain_ac Snow Making_ac daysOpenLastYear yearsOpen \\\n", + "104 102.0 96.0 120.0 63.0 \n", + "295 2600.0 10.0 NaN 57.0 \n", + "\n", + " averageSnowfall AdultWeekday AdultWeekend projectedDaysOpen \\\n", + "104 132.0 54.0 64.0 135.0 \n", + "295 486.0 99.0 99.0 NaN \n", + "\n", + " NightSkiing_ac \n", + "104 56.0 \n", + "295 NaN \n", + "\n", + "[2 rows x 27 columns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ski_data[ski_data['Name'] == 'Crystal Mountain']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nCqBbQsHTNrd" + }, + "source": [ + "So there are two Crystal Mountain resorts, but they are clearly two different resorts in two different states. This is a powerful signal that you have unique records on each row." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eHUrzGTmTNrd" + }, + "source": [ + "#### 2.6.3.2 Region And State" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rpzdEtnhTNrd" + }, + "source": [ + "What's the relationship between region and state?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "q7b1Np25TNrd" + }, + "source": [ + "You know they are the same in many cases (e.g. both the Region and the state are given as 'Michigan'). In how many cases do they differ?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9hUIrnQuTNre" + }, + "outputs": [], + "source": [ + "#Code task 10#\n", + "#Calculate the number of times Region does not equal state\n", + "(ski_data.Region ___ ski_data.state).___" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IrpzJI_5TNre" + }, + "source": [ + "You know what a state is. What is a region? You can tabulate the distinct values along with their respective frequencies using `value_counts()`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "N0dlaCC5TNre", + "outputId": "5f6d91af-abb4-4d64-c9aa-1df358fe0e12" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "New York 33\n", + "Michigan 29\n", + "Sierra Nevada 22\n", + "Colorado 22\n", + "Pennsylvania 19\n", + "Wisconsin 16\n", + "New Hampshire 16\n", + "Vermont 15\n", + "Minnesota 14\n", + "Montana 12\n", + "Idaho 12\n", + "Massachusetts 11\n", + "Washington 10\n", + "Maine 9\n", + "New Mexico 9\n", + "Wyoming 8\n", + "Utah 7\n", + "Oregon 6\n", + "Salt Lake City 6\n", + "North Carolina 6\n", + "Connecticut 5\n", + "Ohio 5\n", + "West Virginia 4\n", + "Virginia 4\n", + "Mt. Hood 4\n", + "Illinois 4\n", + "Alaska 3\n", + "Iowa 3\n", + "Missouri 2\n", + "Arizona 2\n", + "Indiana 2\n", + "South Dakota 2\n", + "New Jersey 2\n", + "Nevada 2\n", + "Rhode Island 1\n", + "Maryland 1\n", + "Tennessee 1\n", + "Northern California 1\n", + "Name: Region, dtype: int64" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ski_data['Region'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6MMZVyQQTNre" + }, + "source": [ + "A casual inspection by eye reveals some non-state names such as Sierra Nevada, Salt Lake City, and Northern California. Tabulate the differences between Region and state. On a note regarding scaling to larger data sets, you might wonder how you could spot such cases when presented with millions of rows. This is an interesting point. Imagine you have access to a database with a Region and state column in a table and there are millions of rows. You wouldn't eyeball all the rows looking for differences! Bear in mind that our first interest lies in establishing the answer to the question \"Are they always the same?\" One approach might be to ask the database to return records where they differ, but limit the output to 10 rows. If there were differences, you'd only get up to 10 results, and so you wouldn't know whether you'd located all differences, but you'd know that there were 'a nonzero number' of differences. If you got an empty result set back, then you would know that the two columns always had the same value. At the risk of digressing, some values in one column only might be NULL (missing) and different databases treat NULL differently, so be aware that on many an occasion a seamingly 'simple' question gets very interesting to answer very quickly!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WXS6N7zYTNre" + }, + "outputs": [], + "source": [ + "#Code task 11#\n", + "#Filter the ski_data dataframe for rows where 'Region' and 'state' are different,\n", + "#group that by 'state' and perform `value_counts` on the 'Region'\n", + "(ski_data[ski_data.___ ___ ski_data.___]\n", + " .groupby(___)[___]\n", + " .value_counts())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9tNJQGgxTNre" + }, + "source": [ + "The vast majority of the differences are in California, with most Regions being called Sierra Nevada and just one referred to as Northern California." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wi6pbzZVTNre" + }, + "source": [ + "#### 2.6.3.3 Number of distinct regions and states" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ySPFicDkTNrf" + }, + "outputs": [], + "source": [ + "#Code task 12#\n", + "#Select the 'Region' and 'state' columns from ski_data and use the `nunique` method to calculate\n", + "#the number of unique values in each\n", + "ski_data[[___, ___]].___" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OZldLtq7TNrf" + }, + "source": [ + "Because a few states are split across multiple named regions, there are slightly more unique regions than states." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qLB88KkkTNrf" + }, + "source": [ + "#### 2.6.3.4 Distribution Of Resorts By Region And State" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LkOpFlVvTNrf" + }, + "source": [ + "If this is your first time using [matplotlib](https://matplotlib.org/3.2.2/index.html)'s [subplots](https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.subplots.html), you may find the online documentation useful." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9IqGsH_QTNrf" + }, + "outputs": [], + "source": [ + "#Code task 13#\n", + "#Create two subplots on 1 row and 2 columns with a figsize of (12, 8)\n", + "fig, ax = plt.subplots(___, ___, figsize=(___))\n", + "#Specify a horizontal barplot ('barh') as kind of plot (kind=)\n", + "ski_data.Region.value_counts().plot(kind=___, ax=ax[0])\n", + "#Give the plot a helpful title of 'Region'\n", + "ax[0].set_title(___)\n", + "#Label the xaxis 'Count'\n", + "ax[0].set_xlabel(___)\n", + "#Specify a horizontal barplot ('barh') as kind of plot (kind=)\n", + "ski_data.state.value_counts().plot(kind=___, ax=ax[1])\n", + "#Give the plot a helpful title of 'state'\n", + "ax[1].set_title(___)\n", + "#Label the xaxis 'Count'\n", + "ax[1].set_xlabel(___)\n", + "#Give the subplots a little \"breathing room\" with a wspace of 0.5\n", + "plt.subplots_adjust(wspace=___);\n", + "#You're encouraged to explore a few different figure sizes, orientations, and spacing here\n", + "# as the importance of easy-to-read and informative figures is frequently understated\n", + "# and you will find the ability to tweak figures invaluable later on" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dpijziQsTNrf" + }, + "source": [ + "How's your geography? Looking at the distribution of States, you see New York accounting for the majority of resorts. Our target resort is in Montana, which comes in at 13th place. You should think carefully about how, or whether, you use this information. Does New York command a premium because of its proximity to population? Even if a resort's State were a useful predictor of ticket price, your main interest lies in Montana. Would you want a model that is skewed for accuracy by New York? Should you just filter for Montana and create a Montana-specific model? This would slash your available data volume. Your problem task includes the contextual insight that the data are for resorts all belonging to the same market share. This suggests one might expect prices to be similar amongst them. You can look into this. A boxplot grouped by State is an ideal way to quickly compare prices. Another side note worth bringing up here is that, in reality, the best approach here definitely would include consulting with the client or other domain expert. They might know of good reasons for treating states equivalently or differently. The data scientist is rarely the final arbiter of such a decision. But here, you'll see if we can find any supporting evidence for treating states the same or differently." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HFjXIvmkTNrg" + }, + "source": [ + "#### 2.6.3.5 Distribution Of Ticket Price By State" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5shVIdb8TNrg" + }, + "source": [ + "Our primary focus is our Big Mountain resort, in Montana. Does the state give you any clues to help decide what your primary target response feature should be (weekend or weekday ticket prices)?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4mxWiGzkTNrg" + }, + "source": [ + "##### 2.6.3.5.1 Average weekend and weekday price by state" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "71TDNysNTNrg" + }, + "outputs": [], + "source": [ + "#Code task 14#\n", + "# Calculate average weekday and weekend price by state and sort by the average of the two\n", + "# Hint: use the pattern dataframe.groupby()[].mean()\n", + "state_price_means = ski_data.___(___)[[___, ___]].mean()\n", + "state_price_means.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "v7e2O_eDTNrg", + "outputId": "bd888d63-92b6-4dfd-8ae0-8e8d0d77c626" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# The next bit simply reorders the index by increasing average of weekday and weekend prices\n", + "# Compare the index order you get from\n", + "# state_price_means.index\n", + "# with\n", + "# state_price_means.mean(axis=1).sort_values(ascending=False).index\n", + "# See how this expression simply sits within the reindex()\n", + "(state_price_means.reindex(index=state_price_means.mean(axis=1)\n", + " .sort_values(ascending=False)\n", + " .index)\n", + " .plot(kind='barh', figsize=(10, 10), title='Average ticket price by State'))\n", + "plt.xlabel('Price ($)');" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9whuYMlDTNrg" + }, + "outputs": [], + "source": [ + "The figure above represents a dataframe with two columns, one for the average prices of each kind of ticket. This tells you how the average ticket price varies from state to state. But can you get more insight into the difference in the distributions between states?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "x56e56BMTNrh" + }, + "source": [ + "##### 2.6.3.5.2 Distribution of weekday and weekend price by state" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hOkBuilFTNrh" + }, + "source": [ + "Next, you can transform the data into a single column for price with a new categorical column that represents the ticket type." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jBFhADcGTNrh" + }, + "outputs": [], + "source": [ + "#Code task 15#\n", + "#Use the pd.melt function, pass in the ski_data columns 'state', 'AdultWeekday', and 'Adultweekend' only,\n", + "#specify 'state' for `id_vars`\n", + "#gather the ticket prices from the 'Adultweekday' and 'AdultWeekend' columns using the `value_vars` argument,\n", + "#call the resultant price column 'Price' via the `value_name` argument,\n", + "#name the weekday/weekend indicator column 'Ticket' via the `var_name` argument\n", + "ticket_prices = pd.melt(ski_data[[___, ___, ___]],\n", + " id_vars=___,\n", + " var_name=___,\n", + " value_vars=[___, ___],\n", + " value_name=___)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GNS1wSepTNrh", + "outputId": "95e16415-2baf-4823-daac-ea323a26523f" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
stateTicketPrice
0AlaskaAdultWeekday65.0
1AlaskaAdultWeekday47.0
2AlaskaAdultWeekday30.0
3ArizonaAdultWeekday89.0
4ArizonaAdultWeekday74.0
\n", + "
" + ], + "text/plain": [ + " state Ticket Price\n", + "0 Alaska AdultWeekday 65.0\n", + "1 Alaska AdultWeekday 47.0\n", + "2 Alaska AdultWeekday 30.0\n", + "3 Arizona AdultWeekday 89.0\n", + "4 Arizona AdultWeekday 74.0" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ticket_prices.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qgto7eHrTNrh" + }, + "source": [ + "This is now in a format we can pass to [seaborn](https://seaborn.pydata.org/)'s [boxplot](https://seaborn.pydata.org/generated/seaborn.boxplot.html) function to create boxplots of the ticket price distributions for each ticket type for each state." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mQeOgyu1TNrh" + }, + "outputs": [], + "source": [ + "#Code task 16#\n", + "#Create a seaborn boxplot of the ticket price dataframe we created above,\n", + "#with 'state' on the x-axis, 'Price' as the y-value, and a hue that indicates 'Ticket'\n", + "#This will use boxplot's x, y, hue, and data arguments.\n", + "plt.subplots(figsize=(12, 8))\n", + "sns.boxplot(x=___, y=___, hue=___, data=ticket_prices)\n", + "plt.xticks(rotation='vertical')\n", + "plt.ylabel('Price ($)')\n", + "plt.xlabel('State');" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1QZWGbswTNri" + }, + "source": [ + "Aside from some relatively expensive ticket prices in California, Colorado, and Utah, most prices appear to lie in a broad band from around 25 to over 100 dollars. Some States show more variability than others. Montana and South Dakota, for example, both show fairly small variability as well as matching weekend and weekday ticket prices. Nevada and Utah, on the other hand, show the most range in prices. Some States, notably North Carolina and Virginia, have weekend prices far higher than weekday prices. You could be inspired from this exploration to consider a few potential groupings of resorts, those with low spread, those with lower averages, and those that charge a premium for weekend tickets. However, you're told that you are taking all resorts to be part of the same market share, you could argue against further segment the resorts. Nevertheless, ways to consider using the State information in your modelling include:\n", + "\n", + "* disregard State completely\n", + "* retain all State information\n", + "* retain State in the form of Montana vs not Montana, as our target resort is in Montana\n", + "\n", + "You've also noted another effect above: some States show a marked difference between weekday and weekend ticket prices. It may make sense to allow a model to take into account not just State but also weekend vs weekday." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3jwdl3TJTNri" + }, + "source": [ + "Thus we currently have two main questions you want to resolve:\n", + "\n", + "* What do you do about the two types of ticket price?\n", + "* What do you do about the state information?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PtkcCq6sTNri" + }, + "source": [ + "### 2.6.4 Numeric Features" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WfTJNU5NTNri" + }, + "outputs": [], + "source": [ + "Having decided to reserve judgement on how exactly you utilize the State, turn your attention to cleaning the numeric features." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yy7EAty8TNri" + }, + "source": [ + "#### 2.6.4.1 Numeric data summary" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yry0HlbYTNri" + }, + "outputs": [], + "source": [ + "#Code task 17#\n", + "#Call ski_data's `describe` method for a statistical summary of the numerical columns\n", + "#Hint: there are fewer summary stat columns than features, so displaying the transpose\n", + "#will be useful again\n", + "ski_data.___.___" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nhQ5Aa0qTNri" + }, + "source": [ + "Recall you're missing the ticket prices for some 16% of resorts. This is a fundamental problem that means you simply lack the required data for those resorts and will have to drop those records. But you may have a weekend price and not a weekday price, or vice versa. You want to keep any price you have." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Cgv8jEibTNrj", + "outputId": "54dbe828-1345-4564-c6cf-17f04b98c5cd" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 82.424242\n", + "2 14.242424\n", + "1 3.333333\n", + "dtype: float64" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "missing_price = ski_data[['AdultWeekend', 'AdultWeekday']].isnull().sum(axis=1)\n", + "missing_price.value_counts()/len(missing_price) * 100" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oDWn1WBGTNrj" + }, + "source": [ + "Just over 82% of resorts have no missing ticket price, 3% are missing one value, and 14% are missing both. You will definitely want to drop the records for which you have no price information, however you will not do so just yet. There may still be useful information about the distributions of other features in that 14% of the data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TlwdBkSPTNrj" + }, + "source": [ + "#### 2.6.4.2 Distributions Of Feature Values" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jNIPE2oLTNrj" + }, + "source": [ + "Note that, although we are still in the 'data wrangling and cleaning' phase rather than exploratory data analysis, looking at distributions of features is immensely useful in getting a feel for whether the values look sensible and whether there are any obvious outliers to investigate. Some exploratory data analysis belongs here, and data wrangling will inevitably occur later on. It's more a matter of emphasis. Here, we're interesting in focusing on whether distributions look plausible or wrong. Later on, we're more interested in relationships and patterns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rZwEb5ZfTNrj" + }, + "outputs": [], + "source": [ + "#Code task 18#\n", + "#Call ski_data's `hist` method to plot histograms of each of the numeric features\n", + "#Try passing it an argument figsize=(15,10)\n", + "#Try calling plt.subplots_adjust() with an argument hspace=0.5 to adjust the spacing\n", + "#It's important you create legible and easy-to-read plots\n", + "ski_data.___(___)\n", + "#plt.subplots_adjust(hspace=___);\n", + "#Hint: notice how the terminating ';' \"swallows\" some messy output and leads to a tidier notebook" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "swCNX5c1TNrk" + }, + "source": [ + "What features do we have possible cause for concern about and why?\n", + "\n", + "* SkiableTerrain_ac because values are clustered down the low end,\n", + "* Snow Making_ac for the same reason,\n", + "* fastEight because all but one value is 0 so it has very little variance, and half the values are missing,\n", + "* fastSixes raises an amber flag; it has more variability, but still mostly 0,\n", + "* trams also may get an amber flag for the same reason,\n", + "* yearsOpen because most values are low but it has a maximum of 2019, which strongly suggests someone recorded calendar year rather than number of years." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PiYnWZ_ZTNrk" + }, + "source": [ + "##### 2.6.4.2.1 SkiableTerrain_ac" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mQ24ahTCTNrk" + }, + "outputs": [], + "source": [ + "#Code task 19#\n", + "#Filter the 'SkiableTerrain_ac' column to print the values greater than 10000\n", + "ski_data.___[ski_data.___ > ___]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "haQPnHRVTNrk" + }, + "source": [ + "**Q: 2** One resort has an incredibly large skiable terrain area! Which is it?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AdoTV36ZTNrk" + }, + "outputs": [], + "source": [ + "#Code task 20#\n", + "#Now you know there's only one, print the whole row to investigate all values, including seeing the resort name\n", + "#Hint: don't forget the transpose will be helpful here\n", + "ski_data[ski_data.___ > ___].___" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DVhhh6U0TNrk" + }, + "source": [ + "**A: 2** Your answer here" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g3Eo7Il5TNrk" + }, + "source": [ + "But what can you do when you have one record that seems highly suspicious?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hbLhuhbiTNrl" + }, + "source": [ + "You can see if your data are correct. Search for \"silverton mountain skiable area\". If you do this, you get some [useful information](https://www.google.com/search?q=silverton+mountain+skiable+area)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "debj8lCRTNrl" + }, + "source": [ + "![Silverton Mountain information](https://github.com/JLindsey96/DataScienceGuidedCapstone/blob/master/Notebooks/images/silverton_mountain_info.png?raw=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mdDMB5_MTNrl" + }, + "source": [ + "You can spot check data. You see your top and base elevation values agree, but the skiable area is very different. Your suspect value is 26819, but the value you've just looked up is 1819. The last three digits agree. This sort of error could have occured in transmission or some editing or transcription stage. You could plausibly replace the suspect value with the one you've just obtained. Another cautionary note to make here is that although you're doing this in order to progress with your analysis, this is most definitely an issue that should have been raised and fed back to the client or data originator as a query. You should view this \"data correction\" step as a means to continue (documenting it carefully as you do in this notebook) rather than an ultimate decision as to what is correct." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3Vy4aBRtTNrl" + }, + "outputs": [], + "source": [ + "#Code task 21#\n", + "#Use the .loc accessor to print the 'SkiableTerrain_ac' value only for this resort\n", + "ski_data.___[39, 'SkiableTerrain_ac']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gGbLsiHzTNrl" + }, + "outputs": [], + "source": [ + "#Code task 22#\n", + "#Use the .loc accessor again to modify this value with the correct value of 1819\n", + "ski_data.___[39, 'SkiableTerrain_ac'] = ___" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mLECCwCrTNrl" + }, + "outputs": [], + "source": [ + "#Code task 23#\n", + "#Use the .loc accessor a final time to verify that the value has been modified\n", + "ski_data.___[39, 'SkiableTerrain_ac']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mguQYF9STNrl" + }, + "source": [ + "**NB whilst you may become suspicious about your data quality, and you know you have missing values, you will not here dive down the rabbit hole of checking all values or web scraping to replace missing values.**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1ra2paEeTNrl" + }, + "source": [ + "What does the distribution of skiable area look like now?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pW-oHhT1TNrm", + "outputId": "dc4dcb34-9668-468f-8cd0-583df4d049e9" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "ski_data.SkiableTerrain_ac.hist(bins=30)\n", + "plt.xlabel('SkiableTerrain_ac')\n", + "plt.ylabel('Count')\n", + "plt.title('Distribution of skiable area (acres) after replacing erroneous value');" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zCSCIMbDTNrm" + }, + "source": [ + "You now see a rather long tailed distribution. You may wonder about the now most extreme value that is above 8000, but similarly you may also wonder about the value around 7000. If you wanted to spend more time manually checking values you could, but leave this for now. The above distribution is plausible." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wK6YiCepTNrm" + }, + "source": [ + "##### 2.6.4.2.2 Snow Making_ac" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "I-bALhBJTNrm", + "outputId": "b8985868-8a34-4210-d771-14b9820a3b67" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "11 3379.0\n", + "18 1500.0\n", + "Name: Snow Making_ac, dtype: float64" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ski_data['Snow Making_ac'][ski_data['Snow Making_ac'] > 1000]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Dv32rpWGTNrm", + "outputId": "d74d0faa-ec61-453d-bbbc-a8eb99ad8aeb" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
11
NameHeavenly Mountain Resort
RegionSierra Nevada
stateCalifornia
summit_elev10067
vertical_drop3500
base_elev7170
trams2
fastEight0
fastSixes2
fastQuads7
quad1
triple5
double3
surface8
total_chairs28
Runs97
TerrainParks3
LongestRun_mi5.5
SkiableTerrain_ac4800
Snow Making_ac3379
daysOpenLastYear155
yearsOpen64
averageSnowfall360
AdultWeekdayNaN
AdultWeekendNaN
projectedDaysOpen157
NightSkiing_acNaN
\n", + "
" + ], + "text/plain": [ + " 11\n", + "Name Heavenly Mountain Resort\n", + "Region Sierra Nevada\n", + "state California\n", + "summit_elev 10067\n", + "vertical_drop 3500\n", + "base_elev 7170\n", + "trams 2\n", + "fastEight 0\n", + "fastSixes 2\n", + "fastQuads 7\n", + "quad 1\n", + "triple 5\n", + "double 3\n", + "surface 8\n", + "total_chairs 28\n", + "Runs 97\n", + "TerrainParks 3\n", + "LongestRun_mi 5.5\n", + "SkiableTerrain_ac 4800\n", + "Snow Making_ac 3379\n", + "daysOpenLastYear 155\n", + "yearsOpen 64\n", + "averageSnowfall 360\n", + "AdultWeekday NaN\n", + "AdultWeekend NaN\n", + "projectedDaysOpen 157\n", + "NightSkiing_ac NaN" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ski_data[ski_data['Snow Making_ac'] > 3000].T" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R5xk7E7HTNrn" + }, + "source": [ + "You can adopt a similar approach as for the suspect skiable area value and do some spot checking. To save time, here is a link to the website for [Heavenly Mountain Resort](https://www.skiheavenly.com/the-mountain/about-the-mountain/mountain-info.aspx). From this you can glean that you have values for skiable terrain that agree. Furthermore, you can read that snowmaking covers 60% of the trails." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XRWd1-btTNrn" + }, + "source": [ + "What, then, is your rough guess for the area covered by snowmaking?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lrMU2cq4TNrn", + "outputId": "f7bb07b0-1c60-405e-acb0-151363980fe6" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "2880.0" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + ".6 * 4800" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xelkMGxrTNrn" + }, + "source": [ + "This is less than the value of 3379 in your data so you may have a judgement call to make. However, notice something else. You have no ticket pricing information at all for this resort. Any further effort spent worrying about values for this resort will be wasted. You'll simply be dropping the entire row!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BtnIe-skTNrn" + }, + "source": [ + "##### 2.6.4.2.3 fastEight" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DtresAlyTNrn" + }, + "source": [ + "Look at the different fastEight values more closely:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jNXA8HhiTNrn", + "outputId": "3d09eb90-22bd-461d-fb56-976ba35df3af" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0 163\n", + "1.0 1\n", + "Name: fastEight, dtype: int64" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ski_data.fastEight.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OjGO8QxDTNro" + }, + "source": [ + "Drop the fastEight column in its entirety; half the values are missing and all but the others are the value zero. There is essentially no information in this column." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PD_sXpbtTNro" + }, + "outputs": [], + "source": [ + "#Code task 24#\n", + "#Drop the 'fastEight' column from ski_data. Use inplace=True\n", + "ski_data.drop(columns=___, inplace=___)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MM0yrCEqTNro" + }, + "source": [ + "What about yearsOpen? How many resorts have purportedly been open for more than 100 years?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GHUbEcfBTNro" + }, + "outputs": [], + "source": [ + "#Code task 25#\n", + "#Filter the 'yearsOpen' column for values greater than 100\n", + "ski_data.___[ski_data.___ > ___]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2ia8cEHOTNro" + }, + "source": [ + "Okay, one seems to have been open for 104 years. But beyond that, one is down as having been open for 2019 years. This is wrong! What shall you do about this?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GgwcPRJITNrp" + }, + "source": [ + "What does the distribution of yearsOpen look like if you exclude just the obviously wrong one?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qgB5BNWsTNrp" + }, + "outputs": [], + "source": [ + "#Code task 26#\n", + "#Call the hist method on 'yearsOpen' after filtering for values under 1000\n", + "#Pass the argument bins=30 to hist(), but feel free to explore other values\n", + "ski_data.___[ski_data.___ < ___].hist(___)\n", + "plt.xlabel('Years open')\n", + "plt.ylabel('Count')\n", + "plt.title('Distribution of years open excluding 2019');" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ynZBUVW3TNrp" + }, + "source": [ + "The above distribution of years seems entirely plausible, including the 104 year value. You can certainly state that no resort will have been open for 2019 years! It likely means the resort opened in 2019. It could also mean the resort is due to open in 2019. You don't know when these data were gathered!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AdppyA_3TNrp" + }, + "source": [ + "Let's review the summary statistics for the years under 1000." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nVyd-8w6TNrp", + "outputId": "35ee0ba2-f55f-4ef0-9bcc-0773a14b4294" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "count 328.000000\n", + "mean 57.695122\n", + "std 16.841182\n", + "min 6.000000\n", + "25% 50.000000\n", + "50% 58.000000\n", + "75% 68.250000\n", + "max 104.000000\n", + "Name: yearsOpen, dtype: float64" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ski_data.yearsOpen[ski_data.yearsOpen < 1000].describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "de5kdbwkTNrp" + }, + "source": [ + "The smallest number of years open otherwise is 6. You can't be sure whether this resort in question has been open zero years or one year and even whether the numbers are projections or actual. In any case, you would be adding a new youngest resort so it feels best to simply drop this row." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EaDzE9VNTNrq" + }, + "outputs": [], + "source": [ + "ski_data = ski_data[ski_data.yearsOpen < 1000]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SZ6y5M9JTNrq" + }, + "source": [ + "##### 2.6.4.2.4 fastSixes and Trams" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hPaZqHh2TNrq" + }, + "source": [ + "The other features you had mild concern over, you will not investigate further. Perhaps take some care when using these features." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QMMST7PUTNrq" + }, + "source": [ + "## 2.7 Derive State-wide Summary Statistics For Our Market Segment" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z8PkS6bWTNrq" + }, + "source": [ + "You have, by this point removed one row, but it was for a resort that may not have opened yet, or perhaps in its first season. Using your business knowledge, you know that state-wide supply and demand of certain skiing resources may well factor into pricing strategies. Does a resort dominate the available night skiing in a state? Or does it account for a large proportion of the total skiable terrain or days open?\n", + "\n", + "If you want to add any features to your data that captures the state-wide market size, you should do this now, before dropping any more rows. In the next section, you'll drop rows with missing price information. Although you don't know what those resorts charge for their tickets, you do know the resorts exists and have been open for at least six years. Thus, you'll now calculate some state-wide summary statistics for later use." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jdLv7VpBTNrq" + }, + "source": [ + "Many features in your data pertain to chairlifts, that is for getting people around each resort. These aren't relevant, nor are the features relating to altitudes. Features that you may be interested in are:\n", + "\n", + "* TerrainParks\n", + "* SkiableTerrain_ac\n", + "* daysOpenLastYear\n", + "* NightSkiing_ac\n", + "\n", + "When you think about it, these are features it makes sense to sum: the total number of terrain parks, the total skiable area, the total number of days open, and the total area available for night skiing. You might consider the total number of ski runs, but understand that the skiable area is more informative than just a number of runs." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xEXIlXfcTNrr" + }, + "source": [ + "A fairly new groupby behaviour is [named aggregation](https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.25.0.html). This allows us to clearly perform the aggregations you want whilst also creating informative output column names." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5muKYVSATNrr" + }, + "outputs": [], + "source": [ + "#Code task 27#\n", + "#Add named aggregations for the sum of 'daysOpenLastYear', 'TerrainParks', and 'NightSkiing_ac'\n", + "#call them 'state_total_days_open', 'state_total_terrain_parks', and 'state_total_nightskiing_ac',\n", + "#respectively\n", + "#Finally, add a call to the reset_index() method (we recommend you experiment with and without this to see\n", + "#what it does)\n", + "state_summary = ski_data.groupby('state').agg(\n", + " resorts_per_state=pd.NamedAgg(column='Name', aggfunc='size'), #could pick any column here\n", + " state_total_skiable_area_ac=pd.NamedAgg(column='SkiableTerrain_ac', aggfunc='sum'),\n", + " state_total_days_open=pd.NamedAgg(column=__, aggfunc='sum'),\n", + " ___=pd.NamedAgg(column=___, aggfunc=___),\n", + " ___=pd.NamedAgg(column=___, aggfunc=___)\n", + ").___\n", + "state_summary.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9pI-IqLMTNrr" + }, + "source": [ + "## 2.8 Drop Rows With No Price Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-NlXdERmTNrr" + }, + "source": [ + "You know there are two columns that refer to price: 'AdultWeekend' and 'AdultWeekday'. You can calculate the number of price values missing per row. This will obviously have to be either 0, 1, or 2, where 0 denotes no price values are missing and 2 denotes that both are missing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Tjy2GyVLTNrr", + "outputId": "3702c49e-bf8f-4a45-91bf-1880e1e921d2" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 82.317073\n", + "2 14.329268\n", + "1 3.353659\n", + "dtype: float64" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "missing_price = ski_data[['AdultWeekend', 'AdultWeekday']].isnull().sum(axis=1)\n", + "missing_price.value_counts()/len(missing_price) * 100" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w0rkdN-pTNrr" + }, + "source": [ + "About 14% of the rows have no price data. As the price is your target, these rows are of no use. Time to lose them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ocm7TbzDTNrs" + }, + "outputs": [], + "source": [ + "#Code task 28#\n", + "#Use `missing_price` to remove rows from ski_data where both price values are missing\n", + "ski_data = ski_data[___ != 2]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3j-ylXeuTNrs" + }, + "source": [ + "## 2.9 Review distributions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "uFNrshckTNrs", + "outputId": "adc97012-e084-44cc-ab5c-e50fd299e880" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "ski_data.hist(figsize=(15, 10))\n", + "plt.subplots_adjust(hspace=0.5);" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G00BFm1UTNrs" + }, + "source": [ + "These distributions are much better. There are clearly some skewed distributions, so keep an eye on `fastQuads`, `fastSixes`, and perhaps `trams`. These lack much variance away from 0 and may have a small number of relatively extreme values. Models failing to rate a feature as important when domain knowledge tells you it should be is an issue to look out for, as is a model being overly influenced by some extreme values. If you build a good machine learning pipeline, hopefully it will be robust to such issues, but you may also wish to consider nonlinear transformations of features." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wgkJdG-FTNrt" + }, + "source": [ + "## 2.10 Population data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wvMtXIAITNrt" + }, + "source": [ + "Population and area data for the US states can be obtained from [wikipedia](https://simple.wikipedia.org/wiki/List_of_U.S._states). Listen, you should have a healthy concern about using data you \"found on the Internet\". Make sure it comes from a reputable source. This table of data is useful because it allows you to easily pull and incorporate an external data set. It also allows you to proceed with an analysis that includes state sizes and populations for your 'first cut' model. Be explicit about your source (we documented it here in this workflow) and ensure it is open to inspection. All steps are subject to review, and it may be that a client has a specific source of data they trust that you should use to rerun the analysis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qcblDsC8TNrt" + }, + "outputs": [], + "source": [ + "#Code task 29#\n", + "#Use pandas' `read_html` method to read the table from the URL below\n", + "states_url = 'https://simple.wikipedia.org/w/index.php?title=List_of_U.S._states&oldid=7168473'\n", + "usa_states = pd.___(___)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "itwPNvegTNrt", + "outputId": "51acb8ba-9199-402a-87fe-d755f941ace6" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "list" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(usa_states)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hNtA2XRQTNrt", + "outputId": "50b91464-55f4-4aa4-d443-66689b340b07" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(usa_states)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hjDf9_NwTNru", + "outputId": "23529399-33e8-47a9-9de0-a68b97125d75" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Name &postal abbs. [1]CitiesEstablished[upper-alpha 1]Population[upper-alpha 2][3]Total area[4]Land area[4]Water area[4]Numberof Reps.
Name &postal abbs. [1]Name &postal abbs. [1].1CapitalLargest[5]Established[upper-alpha 1]Population[upper-alpha 2][3]mi2km2mi2km2mi2km2Numberof Reps.
0AlabamaALMontgomeryBirminghamDec 14, 181949031855242013576750645131171177545977
1AlaskaAKJuneauAnchorageJan 3, 195973154566538417233375706411477953947432453841
2ArizonaAZPhoenixPhoenixFeb 14, 1912727871711399029523411359429420739610269
3ArkansasARLittle RockLittle RockJun 15, 183630178045317913773252035134771114329614
4CaliforniaCASacramentoLos AngelesSep 9, 18503951222316369542396715577940346679162050153
\n", + "
" + ], + "text/plain": [ + " Name &postal abbs. [1] Cities \\\n", + " Name &postal abbs. [1] Name &postal abbs. [1].1 Capital Largest[5] \n", + "0 Alabama AL Montgomery Birmingham \n", + "1 Alaska AK Juneau Anchorage \n", + "2 Arizona AZ Phoenix Phoenix \n", + "3 Arkansas AR Little Rock Little Rock \n", + "4 California CA Sacramento Los Angeles \n", + "\n", + " Established[upper-alpha 1] Population[upper-alpha 2][3] Total area[4] \\\n", + " Established[upper-alpha 1] Population[upper-alpha 2][3] mi2 \n", + "0 Dec 14, 1819 4903185 52420 \n", + "1 Jan 3, 1959 731545 665384 \n", + "2 Feb 14, 1912 7278717 113990 \n", + "3 Jun 15, 1836 3017804 53179 \n", + "4 Sep 9, 1850 39512223 163695 \n", + "\n", + " Land area[4] Water area[4] Numberof Reps. \n", + " km2 mi2 km2 mi2 km2 Numberof Reps. \n", + "0 135767 50645 131171 1775 4597 7 \n", + "1 1723337 570641 1477953 94743 245384 1 \n", + "2 295234 113594 294207 396 1026 9 \n", + "3 137732 52035 134771 1143 2961 4 \n", + "4 423967 155779 403466 7916 20501 53 " + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_states = usa_states[0]\n", + "usa_states.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Db_ica1wTNru" + }, + "source": [ + "Note, in even the last year, the capability of `pd.read_html()` has improved. The merged cells you see in the web table are now handled much more conveniently, with 'Phoenix' now being duplicated so the subsequent columns remain aligned. But check this anyway. If you extract the established date column, you should just get dates. Recall previously you used the `.loc` accessor, because you were using labels. Now you want to refer to a column by its index position and so use `.iloc`. For a discussion on the difference use cases of `.loc` and `.iloc` refer to the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3AffA8DpTNru" + }, + "outputs": [], + "source": [ + "#Code task 30#\n", + "#Use the iloc accessor to get the pandas Series for column number 4 from `usa_states`\n", + "#It should be a column of dates\n", + "established = usa_sates.___[:, 4]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "w3489xu3TNru", + "outputId": "4bad1d20-b93b-4b55-ace8-8703106ad1b1" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Dec 14, 1819\n", + "1 Jan 3, 1959\n", + "2 Feb 14, 1912\n", + "3 Jun 15, 1836\n", + "4 Sep 9, 1850\n", + "5 Aug 1, 1876\n", + "6 Jan 9, 1788\n", + "7 Dec 7, 1787\n", + "8 Mar 3, 1845\n", + "9 Jan 2, 1788\n", + "10 Aug 21, 1959\n", + "11 Jul 3, 1890\n", + "12 Dec 3, 1818\n", + "13 Dec 11, 1816\n", + "14 Dec 28, 1846\n", + "15 Jan 29, 1861\n", + "16 Jun 1, 1792\n", + "17 Apr 30, 1812\n", + "18 Mar 15, 1820\n", + "19 Apr 28, 1788\n", + "20 Feb 6, 1788\n", + "21 Jan 26, 1837\n", + "22 May 11, 1858\n", + "23 Dec 10, 1817\n", + "24 Aug 10, 1821\n", + "25 Nov 8, 1889\n", + "26 Mar 1, 1867\n", + "27 Oct 31, 1864\n", + "28 Jun 21, 1788\n", + "29 Dec 18, 1787\n", + "30 Jan 6, 1912\n", + "31 Jul 26, 1788\n", + "32 Nov 21, 1789\n", + "33 Nov 2, 1889\n", + "34 Mar 1, 1803\n", + "35 Nov 16, 1907\n", + "36 Feb 14, 1859\n", + "37 Dec 12, 1787\n", + "38 May 29, 1790\n", + "39 May 23, 1788\n", + "40 Nov 2, 1889\n", + "41 Jun 1, 1796\n", + "42 Dec 29, 1845\n", + "43 Jan 4, 1896\n", + "44 Mar 4, 1791\n", + "45 Jun 25, 1788\n", + "46 Nov 11, 1889\n", + "47 Jun 20, 1863\n", + "48 May 29, 1848\n", + "49 Jul 10, 1890\n", + "Name: (Established[upper-alpha 1], Established[upper-alpha 1]), dtype: object" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "established" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XAF6ZiG-TNrv" + }, + "source": [ + "Extract the state name, population, and total area (square miles) columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eVYiJLUYTNrv" + }, + "outputs": [], + "source": [ + "#Code task 31#\n", + "#Now use the iloc accessor again to extract columns 0, 5, and 6 and the dataframe's `copy()` method\n", + "#Set the names of these extracted columns to 'state', 'state_population', and 'state_area_sq_miles',\n", + "#respectively.\n", + "usa_states_sub = usa_states.___[:, [___]].copy()\n", + "usa_states_sub.columns = [___]\n", + "usa_states_sub.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WSTrL07ITNrv" + }, + "source": [ + "Do you have all the ski data states accounted for?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RmM-xaKoTNrv" + }, + "outputs": [], + "source": [ + "#Code task 32#\n", + "#Find the states in `state_summary` that are not in `usa_states_sub`\n", + "#Hint: set(list1) - set(list2) is an easy way to get items in list1 that are not in list2\n", + "missing_states = ___(state_summary.state) - ___(usa_states_sub.state)\n", + "missing_states" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "521nVhMtTNrv" + }, + "source": [ + "No??" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qArNVHaRTNrw" + }, + "source": [ + "If you look at the table on the web, you can perhaps start to guess what the problem is. You can confirm your suspicion by pulling out state names that _contain_ 'Massachusetts', 'Pennsylvania', or 'Virginia' from usa_states_sub:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "h6Jy3tQvTNrw", + "outputId": "38ffc16c-6d22-4304-c08b-82c93b74488d" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "20 Massachusetts[upper-alpha 3]\n", + "37 Pennsylvania[upper-alpha 3]\n", + "38 Rhode Island[upper-alpha 4]\n", + "45 Virginia[upper-alpha 3]\n", + "47 West Virginia\n", + "Name: state, dtype: object" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_states_sub.state[usa_states_sub.state.str.contains('Massachusetts|Pennsylvania|Rhode Island|Virginia')]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iXFAZmyPTNrw" + }, + "source": [ + "Delete square brackets and their contents and try again:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "inCZDuhTTNrw" + }, + "outputs": [], + "source": [ + "#Code task 33#\n", + "#Use pandas' Series' `replace()` method to replace anything within square brackets (including the brackets)\n", + "#with the empty string. Do this inplace, so you need to specify the arguments:\n", + "#to_replace='\\[.*\\]' #literal square bracket followed by anything or nothing followed by literal closing bracket\n", + "#value='' #empty string as replacement\n", + "#regex=True #we used a regex in our `to_replace` argument\n", + "#inplace=True #Do this \"in place\"\n", + "usa_states_sub.state.___(to_replace=___, value=__, regex=___, inplace=___)\n", + "usa_states_sub.state[usa_states_sub.state.str.contains('Massachusetts|Pennsylvania|Rhode Island|Virginia')]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WNh71rW1TNrx" + }, + "outputs": [], + "source": [ + "#Code task 34#\n", + "#And now verify none of our states are missing by checking that there are no states in\n", + "#state_summary that are not in usa_states_sub (as earlier using `set()`)\n", + "missing_states = ___(state_summary.state) - ___(usa_states_sub.state)\n", + "missing_states" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "br2Ng2gxTNrx" + }, + "source": [ + "Better! You have an empty set for missing states now. You can confidently add the population and state area columns to the ski resort data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LQDbrIjgTNrx" + }, + "outputs": [], + "source": [ + "#Code task 35#\n", + "#Use 'state_summary's `merge()` method to combine our new data in 'usa_states_sub'\n", + "#specify the arguments how='left' and on='state'\n", + "state_summary = state_summary.___(usa_states_sub, ___=___, ___=___)\n", + "state_summary.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CG46I0-ZTNrx" + }, + "source": [ + "Having created this data frame of summary statistics for various states, it would seem obvious to join this with the ski resort data to augment it with this additional data. You will do this, but not now. In the next notebook you will be exploring the data, including the relationships between the states. For that you want a separate row for each state, as you have here, and joining the data this soon means you'd need to separate and eliminate redundances in the state data when you wanted it." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p4NAsVeiTNrx" + }, + "source": [ + "## 2.11 Target Feature" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z9PKLXuITNrx" + }, + "source": [ + "Finally, what will your target be when modelling ticket price? What relationship is there between weekday and weekend prices?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "B4_3RAjtTNry" + }, + "outputs": [], + "source": [ + "#Code task 36#\n", + "#Use ski_data's `plot()` method to create a scatterplot (kind='scatter') with 'AdultWeekday' on the x-axis and\n", + "#'AdultWeekend' on the y-axis\n", + "ski_data.___(x=___, y=___, kind=___);" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tc5AkUM9TNry" + }, + "source": [ + "A couple of observations can be made. Firstly, there is a clear line where weekend and weekday prices are equal. Weekend prices being higher than weekday prices seem restricted to sub $100 resorts. Recall from the boxplot earlier that the distribution for weekday and weekend prices in Montana seemed equal. Is this confirmed in the actual data for each resort? Big Mountain resort is in Montana, so the relationship between these quantities in this state are particularly relevant." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "neqpTpj4TNry" + }, + "outputs": [], + "source": [ + "#Code task 37#\n", + "#Use the loc accessor on ski_data to print the 'AdultWeekend' and 'AdultWeekday' columns for Montana only\n", + "ski_data.___[ski_data.state == ___, [___, ___]]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "m0fGJ8POTNry" + }, + "source": [ + "Is there any reason to prefer weekend or weekday prices? Which is missing the least?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oAKRmA-HTNry", + "outputId": "c1f10496-0d26-4bcb-e02e-0e84bfb982da" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "AdultWeekend 4\n", + "AdultWeekday 7\n", + "dtype: int64" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } ], - "text/plain": [ - " Name Region state summit_elev vertical_drop \\\n", - "104 Crystal Mountain Michigan Michigan 1132 375 \n", - "295 Crystal Mountain Washington Washington 7012 3100 \n", - "\n", - " base_elev trams fastEight fastSixes fastQuads ... LongestRun_mi \\\n", - "104 757 0 0.0 0 1 ... 0.3 \n", - "295 4400 1 NaN 2 2 ... 2.5 \n", - "\n", - " SkiableTerrain_ac Snow Making_ac daysOpenLastYear yearsOpen \\\n", - "104 102.0 96.0 120.0 63.0 \n", - "295 2600.0 10.0 NaN 57.0 \n", - "\n", - " averageSnowfall AdultWeekday AdultWeekend projectedDaysOpen \\\n", - "104 132.0 54.0 64.0 135.0 \n", - "295 486.0 99.0 99.0 NaN \n", - "\n", - " NightSkiing_ac \n", - "104 56.0 \n", - "295 NaN \n", - "\n", - "[2 rows x 27 columns]" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data[ski_data['Name'] == 'Crystal Mountain']" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So there are two Crystal Mountain resorts, but they are clearly two different resorts in two different states. This is a powerful signal that you have unique records on each row." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2.6.3.2 Region And State" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What's the relationship between region and state?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You know they are the same in many cases (e.g. both the Region and the state are given as 'Michigan'). In how many cases do they differ?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 10#\n", - "#Calculate the number of times Region does not equal state\n", - "(ski_data.Region ___ ski_data.state).___" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You know what a state is. What is a region? You can tabulate the distinct values along with their respective frequencies using `value_counts()`." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "New York 33\n", - "Michigan 29\n", - "Sierra Nevada 22\n", - "Colorado 22\n", - "Pennsylvania 19\n", - "Wisconsin 16\n", - "New Hampshire 16\n", - "Vermont 15\n", - "Minnesota 14\n", - "Montana 12\n", - "Idaho 12\n", - "Massachusetts 11\n", - "Washington 10\n", - "Maine 9\n", - "New Mexico 9\n", - "Wyoming 8\n", - "Utah 7\n", - "Oregon 6\n", - "Salt Lake City 6\n", - "North Carolina 6\n", - "Connecticut 5\n", - "Ohio 5\n", - "West Virginia 4\n", - "Virginia 4\n", - "Mt. Hood 4\n", - "Illinois 4\n", - "Alaska 3\n", - "Iowa 3\n", - "Missouri 2\n", - "Arizona 2\n", - "Indiana 2\n", - "South Dakota 2\n", - "New Jersey 2\n", - "Nevada 2\n", - "Rhode Island 1\n", - "Maryland 1\n", - "Tennessee 1\n", - "Northern California 1\n", - "Name: Region, dtype: int64" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data['Region'].value_counts()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A casual inspection by eye reveals some non-state names such as Sierra Nevada, Salt Lake City, and Northern California. Tabulate the differences between Region and state. On a note regarding scaling to larger data sets, you might wonder how you could spot such cases when presented with millions of rows. This is an interesting point. Imagine you have access to a database with a Region and state column in a table and there are millions of rows. You wouldn't eyeball all the rows looking for differences! Bear in mind that our first interest lies in establishing the answer to the question \"Are they always the same?\" One approach might be to ask the database to return records where they differ, but limit the output to 10 rows. If there were differences, you'd only get up to 10 results, and so you wouldn't know whether you'd located all differences, but you'd know that there were 'a nonzero number' of differences. If you got an empty result set back, then you would know that the two columns always had the same value. At the risk of digressing, some values in one column only might be NULL (missing) and different databases treat NULL differently, so be aware that on many an occasion a seamingly 'simple' question gets very interesting to answer very quickly!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 11#\n", - "#Filter the ski_data dataframe for rows where 'Region' and 'state' are different,\n", - "#group that by 'state' and perform `value_counts` on the 'Region'\n", - "(ski_data[ski_data.___ ___ ski_data.___]\n", - " .groupby(___)[___]\n", - " .value_counts())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The vast majority of the differences are in California, with most Regions being called Sierra Nevada and just one referred to as Northern California." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2.6.3.3 Number of distinct regions and states" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 12#\n", - "#Select the 'Region' and 'state' columns from ski_data and use the `nunique` method to calculate\n", - "#the number of unique values in each\n", - "ski_data[[___, ___]].___" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Because a few states are split across multiple named regions, there are slightly more unique regions than states." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2.6.3.4 Distribution Of Resorts By Region And State" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If this is your first time using [matplotlib](https://matplotlib.org/3.2.2/index.html)'s [subplots](https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.subplots.html), you may find the online documentation useful." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 13#\n", - "#Create two subplots on 1 row and 2 columns with a figsize of (12, 8)\n", - "fig, ax = plt.subplots(___, ___, figsize=(___))\n", - "#Specify a horizontal barplot ('barh') as kind of plot (kind=)\n", - "ski_data.Region.value_counts().plot(kind=___, ax=ax[0])\n", - "#Give the plot a helpful title of 'Region'\n", - "ax[0].set_title(___)\n", - "#Label the xaxis 'Count'\n", - "ax[0].set_xlabel(___)\n", - "#Specify a horizontal barplot ('barh') as kind of plot (kind=)\n", - "ski_data.state.value_counts().plot(kind=___, ax=ax[1])\n", - "#Give the plot a helpful title of 'state'\n", - "ax[1].set_title(___)\n", - "#Label the xaxis 'Count'\n", - "ax[1].set_xlabel(___)\n", - "#Give the subplots a little \"breathing room\" with a wspace of 0.5\n", - "plt.subplots_adjust(wspace=___);\n", - "#You're encouraged to explore a few different figure sizes, orientations, and spacing here\n", - "# as the importance of easy-to-read and informative figures is frequently understated\n", - "# and you will find the ability to tweak figures invaluable later on" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "How's your geography? Looking at the distribution of States, you see New York accounting for the majority of resorts. Our target resort is in Montana, which comes in at 13th place. You should think carefully about how, or whether, you use this information. Does New York command a premium because of its proximity to population? Even if a resort's State were a useful predictor of ticket price, your main interest lies in Montana. Would you want a model that is skewed for accuracy by New York? Should you just filter for Montana and create a Montana-specific model? This would slash your available data volume. Your problem task includes the contextual insight that the data are for resorts all belonging to the same market share. This suggests one might expect prices to be similar amongst them. You can look into this. A boxplot grouped by State is an ideal way to quickly compare prices. Another side note worth bringing up here is that, in reality, the best approach here definitely would include consulting with the client or other domain expert. They might know of good reasons for treating states equivalently or differently. The data scientist is rarely the final arbiter of such a decision. But here, you'll see if we can find any supporting evidence for treating states the same or differently." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2.6.3.5 Distribution Of Ticket Price By State" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Our primary focus is our Big Mountain resort, in Montana. Does the state give you any clues to help decide what your primary target response feature should be (weekend or weekday ticket prices)?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### 2.6.3.5.1 Average weekend and weekday price by state" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 14#\n", - "# Calculate average weekday and weekend price by state and sort by the average of the two\n", - "# Hint: use the pattern dataframe.groupby()[].mean()\n", - "state_price_means = ski_data.___(___)[[___, ___]].mean()\n", - "state_price_means.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# The next bit simply reorders the index by increasing average of weekday and weekend prices\n", - "# Compare the index order you get from\n", - "# state_price_means.index\n", - "# with\n", - "# state_price_means.mean(axis=1).sort_values(ascending=False).index\n", - "# See how this expression simply sits within the reindex()\n", - "(state_price_means.reindex(index=state_price_means.mean(axis=1)\n", - " .sort_values(ascending=False)\n", - " .index)\n", - " .plot(kind='barh', figsize=(10, 10), title='Average ticket price by State'))\n", - "plt.xlabel('Price ($)');" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "The figure above represents a dataframe with two columns, one for the average prices of each kind of ticket. This tells you how the average ticket price varies from state to state. But can you get more insight into the difference in the distributions between states?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### 2.6.3.5.2 Distribution of weekday and weekend price by state" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, you can transform the data into a single column for price with a new categorical column that represents the ticket type." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 15#\n", - "#Use the pd.melt function, pass in the ski_data columns 'state', 'AdultWeekday', and 'Adultweekend' only,\n", - "#specify 'state' for `id_vars`\n", - "#gather the ticket prices from the 'Adultweekday' and 'AdultWeekend' columns using the `value_vars` argument,\n", - "#call the resultant price column 'Price' via the `value_name` argument,\n", - "#name the weekday/weekend indicator column 'Ticket' via the `var_name` argument\n", - "ticket_prices = pd.melt(ski_data[[___, ___, ___]], \n", - " id_vars=___, \n", - " var_name=___, \n", - " value_vars=[___, ___], \n", - " value_name=___)" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
stateTicketPrice
0AlaskaAdultWeekday65.0
1AlaskaAdultWeekday47.0
2AlaskaAdultWeekday30.0
3ArizonaAdultWeekday89.0
4ArizonaAdultWeekday74.0
\n", - "
" + "source": [ + "ski_data[['AdultWeekend', 'AdultWeekday']].isnull().sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y8Wtx5LzTNrz" + }, + "source": [ + "Weekend prices have the least missing values of the two, so drop the weekday prices and then keep just the rows that have weekend price." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "S2bTeAwbTNrz" + }, + "outputs": [], + "source": [ + "ski_data.drop(columns='AdultWeekday', inplace=True)\n", + "ski_data.dropna(subset=['AdultWeekend'], inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CuqQp0VITNrz", + "outputId": "58b61472-bebc-4f3a-b0f4-e6cf1ffec060" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(277, 25)" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } ], - "text/plain": [ - " state Ticket Price\n", - "0 Alaska AdultWeekday 65.0\n", - "1 Alaska AdultWeekday 47.0\n", - "2 Alaska AdultWeekday 30.0\n", - "3 Arizona AdultWeekday 89.0\n", - "4 Arizona AdultWeekday 74.0" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ticket_prices.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is now in a format we can pass to [seaborn](https://seaborn.pydata.org/)'s [boxplot](https://seaborn.pydata.org/generated/seaborn.boxplot.html) function to create boxplots of the ticket price distributions for each ticket type for each state." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 16#\n", - "#Create a seaborn boxplot of the ticket price dataframe we created above,\n", - "#with 'state' on the x-axis, 'Price' as the y-value, and a hue that indicates 'Ticket'\n", - "#This will use boxplot's x, y, hue, and data arguments.\n", - "plt.subplots(figsize=(12, 8))\n", - "sns.boxplot(x=___, y=___, hue=___, data=ticket_prices)\n", - "plt.xticks(rotation='vertical')\n", - "plt.ylabel('Price ($)')\n", - "plt.xlabel('State');" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Aside from some relatively expensive ticket prices in California, Colorado, and Utah, most prices appear to lie in a broad band from around 25 to over 100 dollars. Some States show more variability than others. Montana and South Dakota, for example, both show fairly small variability as well as matching weekend and weekday ticket prices. Nevada and Utah, on the other hand, show the most range in prices. Some States, notably North Carolina and Virginia, have weekend prices far higher than weekday prices. You could be inspired from this exploration to consider a few potential groupings of resorts, those with low spread, those with lower averages, and those that charge a premium for weekend tickets. However, you're told that you are taking all resorts to be part of the same market share, you could argue against further segment the resorts. Nevertheless, ways to consider using the State information in your modelling include:\n", - "\n", - "* disregard State completely\n", - "* retain all State information\n", - "* retain State in the form of Montana vs not Montana, as our target resort is in Montana\n", - "\n", - "You've also noted another effect above: some States show a marked difference between weekday and weekend ticket prices. It may make sense to allow a model to take into account not just State but also weekend vs weekday." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Thus we currently have two main questions you want to resolve:\n", - "\n", - "* What do you do about the two types of ticket price?\n", - "* What do you do about the state information?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.6.4 Numeric Features" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "Having decided to reserve judgement on how exactly you utilize the State, turn your attention to cleaning the numeric features." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2.6.4.1 Numeric data summary" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 17#\n", - "#Call ski_data's `describe` method for a statistical summary of the numerical columns\n", - "#Hint: there are fewer summary stat columns than features, so displaying the transpose\n", - "#will be useful again\n", - "ski_data.___.___" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Recall you're missing the ticket prices for some 16% of resorts. This is a fundamental problem that means you simply lack the required data for those resorts and will have to drop those records. But you may have a weekend price and not a weekday price, or vice versa. You want to keep any price you have." - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 82.424242\n", - "2 14.242424\n", - "1 3.333333\n", - "dtype: float64" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "missing_price = ski_data[['AdultWeekend', 'AdultWeekday']].isnull().sum(axis=1)\n", - "missing_price.value_counts()/len(missing_price) * 100" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Just over 82% of resorts have no missing ticket price, 3% are missing one value, and 14% are missing both. You will definitely want to drop the records for which you have no price information, however you will not do so just yet. There may still be useful information about the distributions of other features in that 14% of the data." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 2.6.4.2 Distributions Of Feature Values" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that, although we are still in the 'data wrangling and cleaning' phase rather than exploratory data analysis, looking at distributions of features is immensely useful in getting a feel for whether the values look sensible and whether there are any obvious outliers to investigate. Some exploratory data analysis belongs here, and data wrangling will inevitably occur later on. It's more a matter of emphasis. Here, we're interesting in focusing on whether distributions look plausible or wrong. Later on, we're more interested in relationships and patterns." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 18#\n", - "#Call ski_data's `hist` method to plot histograms of each of the numeric features\n", - "#Try passing it an argument figsize=(15,10)\n", - "#Try calling plt.subplots_adjust() with an argument hspace=0.5 to adjust the spacing\n", - "#It's important you create legible and easy-to-read plots\n", - "ski_data.___(___)\n", - "#plt.subplots_adjust(hspace=___);\n", - "#Hint: notice how the terminating ';' \"swallows\" some messy output and leads to a tidier notebook" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What features do we have possible cause for concern about and why?\n", - "\n", - "* SkiableTerrain_ac because values are clustered down the low end,\n", - "* Snow Making_ac for the same reason,\n", - "* fastEight because all but one value is 0 so it has very little variance, and half the values are missing,\n", - "* fastSixes raises an amber flag; it has more variability, but still mostly 0,\n", - "* trams also may get an amber flag for the same reason,\n", - "* yearsOpen because most values are low but it has a maximum of 2019, which strongly suggests someone recorded calendar year rather than number of years." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### 2.6.4.2.1 SkiableTerrain_ac" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 19#\n", - "#Filter the 'SkiableTerrain_ac' column to print the values greater than 10000\n", - "ski_data.___[ski_data.___ > ___]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Q: 2** One resort has an incredibly large skiable terrain area! Which is it?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 20#\n", - "#Now you know there's only one, print the whole row to investigate all values, including seeing the resort name\n", - "#Hint: don't forget the transpose will be helpful here\n", - "ski_data[ski_data.___ > ___].___" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**A: 2** Your answer here" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "But what can you do when you have one record that seems highly suspicious?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can see if your data are correct. Search for \"silverton mountain skiable area\". If you do this, you get some [useful information](https://www.google.com/search?q=silverton+mountain+skiable+area)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Silverton Mountain information](images/silverton_mountain_info.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can spot check data. You see your top and base elevation values agree, but the skiable area is very different. Your suspect value is 26819, but the value you've just looked up is 1819. The last three digits agree. This sort of error could have occured in transmission or some editing or transcription stage. You could plausibly replace the suspect value with the one you've just obtained. Another cautionary note to make here is that although you're doing this in order to progress with your analysis, this is most definitely an issue that should have been raised and fed back to the client or data originator as a query. You should view this \"data correction\" step as a means to continue (documenting it carefully as you do in this notebook) rather than an ultimate decision as to what is correct." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 21#\n", - "#Use the .loc accessor to print the 'SkiableTerrain_ac' value only for this resort\n", - "ski_data.___[39, 'SkiableTerrain_ac']" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 22#\n", - "#Use the .loc accessor again to modify this value with the correct value of 1819\n", - "ski_data.___[39, 'SkiableTerrain_ac'] = ___" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 23#\n", - "#Use the .loc accessor a final time to verify that the value has been modified\n", - "ski_data.___[39, 'SkiableTerrain_ac']" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**NB whilst you may become suspicious about your data quality, and you know you have missing values, you will not here dive down the rabbit hole of checking all values or web scraping to replace missing values.**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What does the distribution of skiable area look like now?" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "ski_data.SkiableTerrain_ac.hist(bins=30)\n", - "plt.xlabel('SkiableTerrain_ac')\n", - "plt.ylabel('Count')\n", - "plt.title('Distribution of skiable area (acres) after replacing erroneous value');" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You now see a rather long tailed distribution. You may wonder about the now most extreme value that is above 8000, but similarly you may also wonder about the value around 7000. If you wanted to spend more time manually checking values you could, but leave this for now. The above distribution is plausible." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### 2.6.4.2.2 Snow Making_ac" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "11 3379.0\n", - "18 1500.0\n", - "Name: Snow Making_ac, dtype: float64" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data['Snow Making_ac'][ski_data['Snow Making_ac'] > 1000]" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
11
NameHeavenly Mountain Resort
RegionSierra Nevada
stateCalifornia
summit_elev10067
vertical_drop3500
base_elev7170
trams2
fastEight0
fastSixes2
fastQuads7
quad1
triple5
double3
surface8
total_chairs28
Runs97
TerrainParks3
LongestRun_mi5.5
SkiableTerrain_ac4800
Snow Making_ac3379
daysOpenLastYear155
yearsOpen64
averageSnowfall360
AdultWeekdayNaN
AdultWeekendNaN
projectedDaysOpen157
NightSkiing_acNaN
\n", - "
" + "source": [ + "ski_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uClN6iMtTNrz" + }, + "source": [ + "Perform a final quick check on the data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zRCGGKFpTNrz" + }, + "source": [ + "### 2.11.1 Number Of Missing Values By Row - Resort" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gVRW1q23TNr0" + }, + "source": [ + "Having dropped rows missing the desired target ticket price, what degree of missingness do you have for the remaining rows?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "23sQdjQlTNr0", + "outputId": "2fd71f30-f2ab-477e-be75-32db0b3db021" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
count%
329520.0
62520.0
141520.0
86520.0
74520.0
146520.0
184416.0
108416.0
198416.0
39416.0
\n", + "
" + ], + "text/plain": [ + " count %\n", + "329 5 20.0\n", + "62 5 20.0\n", + "141 5 20.0\n", + "86 5 20.0\n", + "74 5 20.0\n", + "146 5 20.0\n", + "184 4 16.0\n", + "108 4 16.0\n", + "198 4 16.0\n", + "39 4 16.0" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } ], - "text/plain": [ - " 11\n", - "Name Heavenly Mountain Resort\n", - "Region Sierra Nevada\n", - "state California\n", - "summit_elev 10067\n", - "vertical_drop 3500\n", - "base_elev 7170\n", - "trams 2\n", - "fastEight 0\n", - "fastSixes 2\n", - "fastQuads 7\n", - "quad 1\n", - "triple 5\n", - "double 3\n", - "surface 8\n", - "total_chairs 28\n", - "Runs 97\n", - "TerrainParks 3\n", - "LongestRun_mi 5.5\n", - "SkiableTerrain_ac 4800\n", - "Snow Making_ac 3379\n", - "daysOpenLastYear 155\n", - "yearsOpen 64\n", - "averageSnowfall 360\n", - "AdultWeekday NaN\n", - "AdultWeekend NaN\n", - "projectedDaysOpen 157\n", - "NightSkiing_ac NaN" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data[ski_data['Snow Making_ac'] > 3000].T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can adopt a similar approach as for the suspect skiable area value and do some spot checking. To save time, here is a link to the website for [Heavenly Mountain Resort](https://www.skiheavenly.com/the-mountain/about-the-mountain/mountain-info.aspx). From this you can glean that you have values for skiable terrain that agree. Furthermore, you can read that snowmaking covers 60% of the trails." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What, then, is your rough guess for the area covered by snowmaking?" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "2880.0" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - ".6 * 4800" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is less than the value of 3379 in your data so you may have a judgement call to make. However, notice something else. You have no ticket pricing information at all for this resort. Any further effort spent worrying about values for this resort will be wasted. You'll simply be dropping the entire row!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### 2.6.4.2.3 fastEight" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Look at the different fastEight values more closely:" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.0 163\n", - "1.0 1\n", - "Name: fastEight, dtype: int64" - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data.fastEight.value_counts()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Drop the fastEight column in its entirety; half the values are missing and all but the others are the value zero. There is essentially no information in this column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 24#\n", - "#Drop the 'fastEight' column from ski_data. Use inplace=True\n", - "ski_data.drop(columns=___, inplace=___)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What about yearsOpen? How many resorts have purportedly been open for more than 100 years?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 25#\n", - "#Filter the 'yearsOpen' column for values greater than 100\n", - "ski_data.___[ski_data.___ > ___]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Okay, one seems to have been open for 104 years. But beyond that, one is down as having been open for 2019 years. This is wrong! What shall you do about this?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What does the distribution of yearsOpen look like if you exclude just the obviously wrong one?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 26#\n", - "#Call the hist method on 'yearsOpen' after filtering for values under 1000\n", - "#Pass the argument bins=30 to hist(), but feel free to explore other values\n", - "ski_data.___[ski_data.___ < ___].hist(___)\n", - "plt.xlabel('Years open')\n", - "plt.ylabel('Count')\n", - "plt.title('Distribution of years open excluding 2019');" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above distribution of years seems entirely plausible, including the 104 year value. You can certainly state that no resort will have been open for 2019 years! It likely means the resort opened in 2019. It could also mean the resort is due to open in 2019. You don't know when these data were gathered!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's review the summary statistics for the years under 1000." - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "count 328.000000\n", - "mean 57.695122\n", - "std 16.841182\n", - "min 6.000000\n", - "25% 50.000000\n", - "50% 58.000000\n", - "75% 68.250000\n", - "max 104.000000\n", - "Name: yearsOpen, dtype: float64" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data.yearsOpen[ski_data.yearsOpen < 1000].describe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The smallest number of years open otherwise is 6. You can't be sure whether this resort in question has been open zero years or one year and even whether the numbers are projections or actual. In any case, you would be adding a new youngest resort so it feels best to simply drop this row." - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [], - "source": [ - "ski_data = ski_data[ski_data.yearsOpen < 1000]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### 2.6.4.2.4 fastSixes and Trams" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The other features you had mild concern over, you will not investigate further. Perhaps take some care when using these features." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.7 Derive State-wide Summary Statistics For Our Market Segment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You have, by this point removed one row, but it was for a resort that may not have opened yet, or perhaps in its first season. Using your business knowledge, you know that state-wide supply and demand of certain skiing resources may well factor into pricing strategies. Does a resort dominate the available night skiing in a state? Or does it account for a large proportion of the total skiable terrain or days open?\n", - "\n", - "If you want to add any features to your data that captures the state-wide market size, you should do this now, before dropping any more rows. In the next section, you'll drop rows with missing price information. Although you don't know what those resorts charge for their tickets, you do know the resorts exists and have been open for at least six years. Thus, you'll now calculate some state-wide summary statistics for later use." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Many features in your data pertain to chairlifts, that is for getting people around each resort. These aren't relevant, nor are the features relating to altitudes. Features that you may be interested in are:\n", - "\n", - "* TerrainParks\n", - "* SkiableTerrain_ac\n", - "* daysOpenLastYear\n", - "* NightSkiing_ac\n", - "\n", - "When you think about it, these are features it makes sense to sum: the total number of terrain parks, the total skiable area, the total number of days open, and the total area available for night skiing. You might consider the total number of ski runs, but understand that the skiable area is more informative than just a number of runs." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A fairly new groupby behaviour is [named aggregation](https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.25.0.html). This allows us to clearly perform the aggregations you want whilst also creating informative output column names." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 27#\n", - "#Add named aggregations for the sum of 'daysOpenLastYear', 'TerrainParks', and 'NightSkiing_ac'\n", - "#call them 'state_total_days_open', 'state_total_terrain_parks', and 'state_total_nightskiing_ac',\n", - "#respectively\n", - "#Finally, add a call to the reset_index() method (we recommend you experiment with and without this to see\n", - "#what it does)\n", - "state_summary = ski_data.groupby('state').agg(\n", - " resorts_per_state=pd.NamedAgg(column='Name', aggfunc='size'), #could pick any column here\n", - " state_total_skiable_area_ac=pd.NamedAgg(column='SkiableTerrain_ac', aggfunc='sum'),\n", - " state_total_days_open=pd.NamedAgg(column=__, aggfunc='sum'),\n", - " ___=pd.NamedAgg(column=___, aggfunc=___),\n", - " ___=pd.NamedAgg(column=___, aggfunc=___)\n", - ").___\n", - "state_summary.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.8 Drop Rows With No Price Data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You know there are two columns that refer to price: 'AdultWeekend' and 'AdultWeekday'. You can calculate the number of price values missing per row. This will obviously have to be either 0, 1, or 2, where 0 denotes no price values are missing and 2 denotes that both are missing." - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 82.317073\n", - "2 14.329268\n", - "1 3.353659\n", - "dtype: float64" - ] - }, - "execution_count": 41, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "missing_price = ski_data[['AdultWeekend', 'AdultWeekday']].isnull().sum(axis=1)\n", - "missing_price.value_counts()/len(missing_price) * 100" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "About 14% of the rows have no price data. As the price is your target, these rows are of no use. Time to lose them." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 28#\n", - "#Use `missing_price` to remove rows from ski_data where both price values are missing\n", - "ski_data = ski_data[___ != 2]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.9 Review distributions" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "ski_data.hist(figsize=(15, 10))\n", - "plt.subplots_adjust(hspace=0.5);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "These distributions are much better. There are clearly some skewed distributions, so keep an eye on `fastQuads`, `fastSixes`, and perhaps `trams`. These lack much variance away from 0 and may have a small number of relatively extreme values. Models failing to rate a feature as important when domain knowledge tells you it should be is an issue to look out for, as is a model being overly influenced by some extreme values. If you build a good machine learning pipeline, hopefully it will be robust to such issues, but you may also wish to consider nonlinear transformations of features." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.10 Population data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Population and area data for the US states can be obtained from [wikipedia](https://simple.wikipedia.org/wiki/List_of_U.S._states). Listen, you should have a healthy concern about using data you \"found on the Internet\". Make sure it comes from a reputable source. This table of data is useful because it allows you to easily pull and incorporate an external data set. It also allows you to proceed with an analysis that includes state sizes and populations for your 'first cut' model. Be explicit about your source (we documented it here in this workflow) and ensure it is open to inspection. All steps are subject to review, and it may be that a client has a specific source of data they trust that you should use to rerun the analysis." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 29#\n", - "#Use pandas' `read_html` method to read the table from the URL below\n", - "states_url = 'https://simple.wikipedia.org/w/index.php?title=List_of_U.S._states&oldid=7168473'\n", - "usa_states = pd.___(___)" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "list" - ] - }, - "execution_count": 45, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "type(usa_states)" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "1" - ] - }, - "execution_count": 46, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "len(usa_states)" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Name &postal abbs. [1]CitiesEstablished[upper-alpha 1]Population[upper-alpha 2][3]Total area[4]Land area[4]Water area[4]Numberof Reps.
Name &postal abbs. [1]Name &postal abbs. [1].1CapitalLargest[5]Established[upper-alpha 1]Population[upper-alpha 2][3]mi2km2mi2km2mi2km2Numberof Reps.
0AlabamaALMontgomeryBirminghamDec 14, 181949031855242013576750645131171177545977
1AlaskaAKJuneauAnchorageJan 3, 195973154566538417233375706411477953947432453841
2ArizonaAZPhoenixPhoenixFeb 14, 1912727871711399029523411359429420739610269
3ArkansasARLittle RockLittle RockJun 15, 183630178045317913773252035134771114329614
4CaliforniaCASacramentoLos AngelesSep 9, 18503951222316369542396715577940346679162050153
\n", - "
" + "source": [ + "missing = pd.concat([ski_data.isnull().sum(axis=1), 100 * ski_data.isnull().mean(axis=1)], axis=1)\n", + "missing.columns=['count', '%']\n", + "missing.sort_values(by='count', ascending=False).head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "siL-Ka6eTNr0" + }, + "source": [ + "These seem possibly curiously quantized..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ENwiVKCITNr0", + "outputId": "f6cb089b-314b-4381-ca0d-34107063eb70" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 0., 4., 8., 12., 16., 20.])" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } ], - "text/plain": [ - " Name &postal abbs. [1] Cities \\\n", - " Name &postal abbs. [1] Name &postal abbs. [1].1 Capital Largest[5] \n", - "0 Alabama AL Montgomery Birmingham \n", - "1 Alaska AK Juneau Anchorage \n", - "2 Arizona AZ Phoenix Phoenix \n", - "3 Arkansas AR Little Rock Little Rock \n", - "4 California CA Sacramento Los Angeles \n", - "\n", - " Established[upper-alpha 1] Population[upper-alpha 2][3] Total area[4] \\\n", - " Established[upper-alpha 1] Population[upper-alpha 2][3] mi2 \n", - "0 Dec 14, 1819 4903185 52420 \n", - "1 Jan 3, 1959 731545 665384 \n", - "2 Feb 14, 1912 7278717 113990 \n", - "3 Jun 15, 1836 3017804 53179 \n", - "4 Sep 9, 1850 39512223 163695 \n", - "\n", - " Land area[4] Water area[4] Numberof Reps. \n", - " km2 mi2 km2 mi2 km2 Numberof Reps. \n", - "0 135767 50645 131171 1775 4597 7 \n", - "1 1723337 570641 1477953 94743 245384 1 \n", - "2 295234 113594 294207 396 1026 9 \n", - "3 137732 52035 134771 1143 2961 4 \n", - "4 423967 155779 403466 7916 20501 53 " - ] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "usa_states = usa_states[0]\n", - "usa_states.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note, in even the last year, the capability of `pd.read_html()` has improved. The merged cells you see in the web table are now handled much more conveniently, with 'Phoenix' now being duplicated so the subsequent columns remain aligned. But check this anyway. If you extract the established date column, you should just get dates. Recall previously you used the `.loc` accessor, because you were using labels. Now you want to refer to a column by its index position and so use `.iloc`. For a discussion on the difference use cases of `.loc` and `.iloc` refer to the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 30#\n", - "#Use the iloc accessor to get the pandas Series for column number 4 from `usa_states`\n", - "#It should be a column of dates\n", - "established = usa_sates.___[:, 4]" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 Dec 14, 1819\n", - "1 Jan 3, 1959\n", - "2 Feb 14, 1912\n", - "3 Jun 15, 1836\n", - "4 Sep 9, 1850\n", - "5 Aug 1, 1876\n", - "6 Jan 9, 1788\n", - "7 Dec 7, 1787\n", - "8 Mar 3, 1845\n", - "9 Jan 2, 1788\n", - "10 Aug 21, 1959\n", - "11 Jul 3, 1890\n", - "12 Dec 3, 1818\n", - "13 Dec 11, 1816\n", - "14 Dec 28, 1846\n", - "15 Jan 29, 1861\n", - "16 Jun 1, 1792\n", - "17 Apr 30, 1812\n", - "18 Mar 15, 1820\n", - "19 Apr 28, 1788\n", - "20 Feb 6, 1788\n", - "21 Jan 26, 1837\n", - "22 May 11, 1858\n", - "23 Dec 10, 1817\n", - "24 Aug 10, 1821\n", - "25 Nov 8, 1889\n", - "26 Mar 1, 1867\n", - "27 Oct 31, 1864\n", - "28 Jun 21, 1788\n", - "29 Dec 18, 1787\n", - "30 Jan 6, 1912\n", - "31 Jul 26, 1788\n", - "32 Nov 21, 1789\n", - "33 Nov 2, 1889\n", - "34 Mar 1, 1803\n", - "35 Nov 16, 1907\n", - "36 Feb 14, 1859\n", - "37 Dec 12, 1787\n", - "38 May 29, 1790\n", - "39 May 23, 1788\n", - "40 Nov 2, 1889\n", - "41 Jun 1, 1796\n", - "42 Dec 29, 1845\n", - "43 Jan 4, 1896\n", - "44 Mar 4, 1791\n", - "45 Jun 25, 1788\n", - "46 Nov 11, 1889\n", - "47 Jun 20, 1863\n", - "48 May 29, 1848\n", - "49 Jul 10, 1890\n", - "Name: (Established[upper-alpha 1], Established[upper-alpha 1]), dtype: object" - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "established" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Extract the state name, population, and total area (square miles) columns." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 31#\n", - "#Now use the iloc accessor again to extract columns 0, 5, and 6 and the dataframe's `copy()` method\n", - "#Set the names of these extracted columns to 'state', 'state_population', and 'state_area_sq_miles',\n", - "#respectively.\n", - "usa_states_sub = usa_states.___[:, [___]].copy()\n", - "usa_states_sub.columns = [___]\n", - "usa_states_sub.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Do you have all the ski data states accounted for?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 32#\n", - "#Find the states in `state_summary` that are not in `usa_states_sub`\n", - "#Hint: set(list1) - set(list2) is an easy way to get items in list1 that are not in list2\n", - "missing_states = ___(state_summary.state) - ___(usa_states_sub.state)\n", - "missing_states" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "No?? " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you look at the table on the web, you can perhaps start to guess what the problem is. You can confirm your suspicion by pulling out state names that _contain_ 'Massachusetts', 'Pennsylvania', or 'Virginia' from usa_states_sub:" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "20 Massachusetts[upper-alpha 3]\n", - "37 Pennsylvania[upper-alpha 3]\n", - "38 Rhode Island[upper-alpha 4]\n", - "45 Virginia[upper-alpha 3]\n", - "47 West Virginia\n", - "Name: state, dtype: object" - ] - }, - "execution_count": 52, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "usa_states_sub.state[usa_states_sub.state.str.contains('Massachusetts|Pennsylvania|Rhode Island|Virginia')]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Delete square brackets and their contents and try again:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 33#\n", - "#Use pandas' Series' `replace()` method to replace anything within square brackets (including the brackets)\n", - "#with the empty string. Do this inplace, so you need to specify the arguments:\n", - "#to_replace='\\[.*\\]' #literal square bracket followed by anything or nothing followed by literal closing bracket\n", - "#value='' #empty string as replacement\n", - "#regex=True #we used a regex in our `to_replace` argument\n", - "#inplace=True #Do this \"in place\"\n", - "usa_states_sub.state.___(to_replace=___, value=__, regex=___, inplace=___)\n", - "usa_states_sub.state[usa_states_sub.state.str.contains('Massachusetts|Pennsylvania|Rhode Island|Virginia')]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 34#\n", - "#And now verify none of our states are missing by checking that there are no states in\n", - "#state_summary that are not in usa_states_sub (as earlier using `set()`)\n", - "missing_states = ___(state_summary.state) - ___(usa_states_sub.state)\n", - "missing_states" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Better! You have an empty set for missing states now. You can confidently add the population and state area columns to the ski resort data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 35#\n", - "#Use 'state_summary's `merge()` method to combine our new data in 'usa_states_sub'\n", - "#specify the arguments how='left' and on='state'\n", - "state_summary = state_summary.___(usa_states_sub, ___=___, ___=___)\n", - "state_summary.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Having created this data frame of summary statistics for various states, it would seem obvious to join this with the ski resort data to augment it with this additional data. You will do this, but not now. In the next notebook you will be exploring the data, including the relationships between the states. For that you want a separate row for each state, as you have here, and joining the data this soon means you'd need to separate and eliminate redundances in the state data when you wanted it." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.11 Target Feature" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, what will your target be when modelling ticket price? What relationship is there between weekday and weekend prices?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 36#\n", - "#Use ski_data's `plot()` method to create a scatterplot (kind='scatter') with 'AdultWeekday' on the x-axis and\n", - "#'AdultWeekend' on the y-axis\n", - "ski_data.___(x=___, y=___, kind=___);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A couple of observations can be made. Firstly, there is a clear line where weekend and weekday prices are equal. Weekend prices being higher than weekday prices seem restricted to sub $100 resorts. Recall from the boxplot earlier that the distribution for weekday and weekend prices in Montana seemed equal. Is this confirmed in the actual data for each resort? Big Mountain resort is in Montana, so the relationship between these quantities in this state are particularly relevant." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Code task 37#\n", - "#Use the loc accessor on ski_data to print the 'AdultWeekend' and 'AdultWeekday' columns for Montana only\n", - "ski_data.___[ski_data.state == ___, [___, ___]]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Is there any reason to prefer weekend or weekday prices? Which is missing the least?" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "AdultWeekend 4\n", - "AdultWeekday 7\n", - "dtype: int64" - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data[['AdultWeekend', 'AdultWeekday']].isnull().sum()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Weekend prices have the least missing values of the two, so drop the weekday prices and then keep just the rows that have weekend price." - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [], - "source": [ - "ski_data.drop(columns='AdultWeekday', inplace=True)\n", - "ski_data.dropna(subset=['AdultWeekend'], inplace=True)" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(277, 25)" - ] - }, - "execution_count": 60, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ski_data.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Perform a final quick check on the data." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2.11.1 Number Of Missing Values By Row - Resort" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Having dropped rows missing the desired target ticket price, what degree of missingness do you have for the remaining rows?" - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
count%
329520.0
62520.0
141520.0
86520.0
74520.0
146520.0
184416.0
108416.0
198416.0
39416.0
\n", - "
" + "source": [ + "missing['%'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N7_L9U2FTNr0" + }, + "source": [ + "Yes, the percentage of missing values per row appear in multiples of 4." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qU_2tu7oTNr0", + "outputId": "1f552e29-bc0f-465e-8333-b5411fc297e8" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0 107\n", + "4.0 94\n", + "8.0 45\n", + "12.0 15\n", + "16.0 10\n", + "20.0 6\n", + "Name: %, dtype: int64" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } ], - "text/plain": [ - " count %\n", - "329 5 20.0\n", - "62 5 20.0\n", - "141 5 20.0\n", - "86 5 20.0\n", - "74 5 20.0\n", - "146 5 20.0\n", - "184 4 16.0\n", - "108 4 16.0\n", - "198 4 16.0\n", - "39 4 16.0" - ] - }, - "execution_count": 61, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "missing = pd.concat([ski_data.isnull().sum(axis=1), 100 * ski_data.isnull().mean(axis=1)], axis=1)\n", - "missing.columns=['count', '%']\n", - "missing.sort_values(by='count', ascending=False).head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "These seem possibly curiously quantized..." - ] - }, - { - "cell_type": "code", - "execution_count": 62, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([ 0., 4., 8., 12., 16., 20.])" - ] - }, - "execution_count": 62, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "missing['%'].unique()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Yes, the percentage of missing values per row appear in multiples of 4." - ] - }, - { - "cell_type": "code", - "execution_count": 63, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.0 107\n", - "4.0 94\n", - "8.0 45\n", - "12.0 15\n", - "16.0 10\n", - "20.0 6\n", - "Name: %, dtype: int64" - ] - }, - "execution_count": 63, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "missing['%'].value_counts()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is almost as if values have been removed artificially... Nevertheless, what you don't know is how useful the missing features are in predicting ticket price. You shouldn't just drop rows that are missing several useless features." - ] - }, - { - "cell_type": "code", - "execution_count": 64, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Int64Index: 277 entries, 0 to 329\n", - "Data columns (total 25 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 Name 277 non-null object \n", - " 1 Region 277 non-null object \n", - " 2 state 277 non-null object \n", - " 3 summit_elev 277 non-null int64 \n", - " 4 vertical_drop 277 non-null int64 \n", - " 5 base_elev 277 non-null int64 \n", - " 6 trams 277 non-null int64 \n", - " 7 fastSixes 277 non-null int64 \n", - " 8 fastQuads 277 non-null int64 \n", - " 9 quad 277 non-null int64 \n", - " 10 triple 277 non-null int64 \n", - " 11 double 277 non-null int64 \n", - " 12 surface 277 non-null int64 \n", - " 13 total_chairs 277 non-null int64 \n", - " 14 Runs 274 non-null float64\n", - " 15 TerrainParks 233 non-null float64\n", - " 16 LongestRun_mi 272 non-null float64\n", - " 17 SkiableTerrain_ac 275 non-null float64\n", - " 18 Snow Making_ac 240 non-null float64\n", - " 19 daysOpenLastYear 233 non-null float64\n", - " 20 yearsOpen 277 non-null float64\n", - " 21 averageSnowfall 268 non-null float64\n", - " 22 AdultWeekend 277 non-null float64\n", - " 23 projectedDaysOpen 236 non-null float64\n", - " 24 NightSkiing_ac 163 non-null float64\n", - "dtypes: float64(11), int64(11), object(3)\n", - "memory usage: 56.3+ KB\n" - ] + "source": [ + "missing['%'].value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jr_MGWs1TNr1" + }, + "source": [ + "This is almost as if values have been removed artificially... Nevertheless, what you don't know is how useful the missing features are in predicting ticket price. You shouldn't just drop rows that are missing several useless features." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "F1n4Bp1LTNr1", + "outputId": "730079a1-0eec-4801-c663-b8d373bd6ba4" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Int64Index: 277 entries, 0 to 329\n", + "Data columns (total 25 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Name 277 non-null object \n", + " 1 Region 277 non-null object \n", + " 2 state 277 non-null object \n", + " 3 summit_elev 277 non-null int64 \n", + " 4 vertical_drop 277 non-null int64 \n", + " 5 base_elev 277 non-null int64 \n", + " 6 trams 277 non-null int64 \n", + " 7 fastSixes 277 non-null int64 \n", + " 8 fastQuads 277 non-null int64 \n", + " 9 quad 277 non-null int64 \n", + " 10 triple 277 non-null int64 \n", + " 11 double 277 non-null int64 \n", + " 12 surface 277 non-null int64 \n", + " 13 total_chairs 277 non-null int64 \n", + " 14 Runs 274 non-null float64\n", + " 15 TerrainParks 233 non-null float64\n", + " 16 LongestRun_mi 272 non-null float64\n", + " 17 SkiableTerrain_ac 275 non-null float64\n", + " 18 Snow Making_ac 240 non-null float64\n", + " 19 daysOpenLastYear 233 non-null float64\n", + " 20 yearsOpen 277 non-null float64\n", + " 21 averageSnowfall 268 non-null float64\n", + " 22 AdultWeekend 277 non-null float64\n", + " 23 projectedDaysOpen 236 non-null float64\n", + " 24 NightSkiing_ac 163 non-null float64\n", + "dtypes: float64(11), int64(11), object(3)\n", + "memory usage: 56.3+ KB\n" + ] + } + ], + "source": [ + "ski_data.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0xg7fo_NTNr1" + }, + "source": [ + "There are still some missing values, and it's good to be aware of this, but leave them as is for now." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NgOzPfBPTNr1" + }, + "source": [ + "## 2.12 Save data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a97CMhmgTNr1", + "outputId": "ade467d1-6957-4981-907f-bf2b56f530a2" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(277, 25)" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ski_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hQBc-p6NTNr1" + }, + "source": [ + "Save this to your data directory, separately. Note that you were provided with the data in `raw_data` and you should saving derived data in a separate location. This guards against overwriting our original data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yioTvOBHTNr1" + }, + "outputs": [], + "source": [ + "# save the data to a new csv file\n", + "datapath = '../data'\n", + "save_file(ski_data, 'ski_data_cleaned.csv', datapath)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OT5Y6rsnTNr2" + }, + "outputs": [], + "source": [ + "# save the state_summary separately.\n", + "datapath = '../data'\n", + "save_file(state_summary, 'state_summary.csv', datapath)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L9N77RAWTNr2" + }, + "source": [ + "## 2.13 Summary" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "65C5KgRKTNr2" + }, + "source": [ + "**Q: 3** Write a summary statement that highlights the key processes and findings from this notebook. This should include information such as the original number of rows in the data, whether our own resort was actually present etc. What columns, if any, have been removed? Any rows? Summarise the reasons why. Were any other issues found? What remedial actions did you take? State where you are in the project. Can you confirm what the target feature is for your desire to predict ticket price? How many rows were left in the data? Hint: this is a great opportunity to reread your notebook, check all cells have been executed in order and from a \"blank slate\" (restarting the kernel will do this), and that your workflow makes sense and follows a logical pattern. As you do this you can pull out salient information for inclusion in this summary. Thus, this section will provide an important overview of \"what\" and \"why\" without having to dive into the \"how\" or any unproductive or inconclusive steps along the way." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UgTOE1phTNr2" + }, + "source": [ + "**A: 3** There were multiple different processes used in gathering information for the result. Many pandas and matplotlib methods were used as well as a few seaborn methods. There were, originally, 58 rows of data. By the end, there were 67 rows of data. I do not see Big Mountain Resort present on any of the output data. We did drop weekday prices as well as the 'fastEight' column. Some rows were also dropped because they were missing the desired target ticket price. The percentage of missing values per row appear in multiples of 4, almost as if they were removed artificially. Values were sorted and columns were not dropped for the sole purpose of missing data. I am in the Data Wrangling section of the project currently in this summary, I am started also in the Exploratory Data Analysis section. The target feature for the desire to predict ticket price, in my opinion, is whether it is a weekday or weekend and how much skiable terrain is available.\n", + "\n", + "Any constructive criticism is greatly appreciated as I am still very new to Data Science, but loving it so far! Thank you so much!" + ] } - ], - "source": [ - "ski_data.info()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are still some missing values, and it's good to be aware of this, but leave them as is for now." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.12 Save data" - ] - }, - { - "cell_type": "code", - "execution_count": 65, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(277, 25)" - ] - }, - "execution_count": 65, - "metadata": {}, - "output_type": "execute_result" + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + }, + "colab": { + "provenance": [] } - ], - "source": [ - "ski_data.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Save this to your data directory, separately. Note that you were provided with the data in `raw_data` and you should saving derived data in a separate location. This guards against overwriting our original data." - ] }, - { - "cell_type": "code", - "execution_count": 66, - "metadata": {}, - "outputs": [], - "source": [ - "# save the data to a new csv file\n", - "datapath = '../data'\n", - "save_file(ski_data, 'ski_data_cleaned.csv', datapath)" - ] - }, - { - "cell_type": "code", - "execution_count": 67, - "metadata": {}, - "outputs": [], - "source": [ - "# save the state_summary separately.\n", - "datapath = '../data'\n", - "save_file(state_summary, 'state_summary.csv', datapath)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2.13 Summary" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Q: 3** Write a summary statement that highlights the key processes and findings from this notebook. This should include information such as the original number of rows in the data, whether our own resort was actually present etc. What columns, if any, have been removed? Any rows? Summarise the reasons why. Were any other issues found? What remedial actions did you take? State where you are in the project. Can you confirm what the target feature is for your desire to predict ticket price? How many rows were left in the data? Hint: this is a great opportunity to reread your notebook, check all cells have been executed in order and from a \"blank slate\" (restarting the kernel will do this), and that your workflow makes sense and follows a logical pattern. As you do this you can pull out salient information for inclusion in this summary. Thus, this section will provide an important overview of \"what\" and \"why\" without having to dive into the \"how\" or any unproductive or inconclusive steps along the way." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**A: 3** Your answer here" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.4" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": true, - "sideBar": true, - "skip_h1_title": false, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": true - }, - "varInspector": { - "cols": { - "lenName": 16, - "lenType": 16, - "lenVar": 40 - }, - "kernels_config": { - "python": { - "delete_cmd_postfix": "", - "delete_cmd_prefix": "del ", - "library": "var_list.py", - "varRefreshCmd": "print(var_dic_list())" - }, - "r": { - "delete_cmd_postfix": ") ", - "delete_cmd_prefix": "rm(", - "library": "var_list.r", - "varRefreshCmd": "cat(var_dic_list()) " - } - }, - "types_to_exclude": [ - "module", - "function", - "builtin_function_or_method", - "instance", - "_Feature" - ], - "window_display": false - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/Notebooks/03_exploratory_data_analysis.ipynb b/Notebooks/03_exploratory_data_analysis.ipynb index c1746d2e4..7de859e3a 100644 --- a/Notebooks/03_exploratory_data_analysis.ipynb +++ b/Notebooks/03_exploratory_data_analysis.ipynb @@ -994,7 +994,7 @@ "outputs": [ { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEGCAYAAACHGfl5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAUVElEQVR4nO3dfZRkdX3n8feHASTQ8uAO24uDsdVVc4xsCPQaE43p0cQl4gaPhxgMKqi7k4dVIUc2B/OgxBx32bhmdV1zPCMRRFknBkzEx8RFGjYxIjOEOKOEYMgkghNRwcEmEUW/+ePeZoqme/p2T1fVdNX7dU6dqbpPv9+3bk996t6q+t1UFZKk8XbIsDsgSRo+w0CSZBhIkgwDSRKGgSQJOHTYHehi48aNNTU1taJ17rvvPo466qj+dGgdsP7xrX+cawfr761/x44dX6uq47usty7CYGpqiu3bt69ondnZWWZmZvrToXXA+se3/nGuHay/t/4kf991PU8TSZIMA0mSYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSSJdfIL5AMxdeFHOy23++LT+9wTSTp4eWQgSTIMJEmGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAk0ccwSPLuJHcl2dUz7VFJPpnktvbf4/rVviSpu34eGVwGnLZg2oXANVX1ROCa9rEkacj6FgZVdT1w94LJZwDvae+/B3hBv9qXJHU36M8MJqtqD0D7778ecPuSpEWkqvq38WQK+EhVPbV9/I2qOrZn/j1VtejnBkm2AFsAJicnT922bduK2p6bm2NiYoKdd+7ttPxJm45Z0fYPdvP1j6txrn+cawfr761/8+bNO6pqust6g74G8leSnFBVe5KcANy11IJVtRXYCjA9PV0zMzMramh2dpaZmRnO7XoN5LNXtv2D3Xz942qc6x/n2sH6V1v/oE8TXQ2c094/B/jQgNuXJC2in18tfT/wF8CTk9yR5JXAxcBPJbkN+Kn2sSRpyPp2mqiqXrzErOf0q01J0ur4C2RJkmEgSTIMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEkMKgyS/kuTzSXYleX+SI4bRD0lSY+BhkGQT8BpguqqeCmwAzhp0PyRJ+wzrNNGhwPclORQ4EvjykPohSQJSVYNvNDkPeBPwz8CfVtXZiyyzBdgCMDk5eeq2bdtW1Mbc3BwTExPsvHPvGvR4n5M2HbOm2+uX+frH1TjXP861g/X31r958+YdVTXdZb2Bh0GS44CrgJ8DvgH8IXBlVb1vqXWmp6dr+/btK2pndnaWmZkZpi786IF092F2X3z6mm6vX+brH1fjXP841w7W31t/ks5hMIzTRD8J/F1VfbWqvgN8EPixIfRDktQaRhj8A/D0JEcmCfAc4JYh9EOS1Bp4GFTVDcCVwE3AzrYPWwfdD0nSPocOo9GqegPwhmG0LUl6OH+BLEkyDCRJhoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQ6hkGSa7pMkyStT4fub2aSI4AjgY1JjgPSzjoaeHSf+yZJGpD9hgHwC8D5NC/8O9gXBvcC7+hjvyRJA7TfMKiqtwFvS/Lqqnr7gPokSRqw5Y4MAKiqtyf5MWCqd52qunw1jSY5FrgEeCpQwCuq6i9Wsy1J0oHrFAZJ3gs8AbgZ+G47uYBVhQHwNuATVXVmksNpPpeQJA1JpzAApoGnVFUdaINJjgaeBZwLUFXfBr59oNuVJK1eury+J/lD4DVVteeAG0xOBrYCXwB+iOaD6fOq6r4Fy20BtgBMTk6eum3bthW1Mzc3x8TEBDvv3HugXX6IkzYds6bb65f5+sfVONc/zrWD9ffWv3nz5h1VNd1lva5hcC1wMvBZ4P756VX1MyvtaJJp4DPAM6rqhiRvA+6tqt9cap3p6enavn37itqZnZ1lZmaGqQs/utIu7tfui09f0+31y3z942qc6x/n2sH6e+tP0jkMup4mumh13VrUHcAdVXVD+/hK4MI13L4kaYW6fpvourVqsKr+McmXkjy5qm4FnkNzykiSNCRdv030TZpvDwEcDhwG3FdVR6+y3VcDV7TfJLodePkqtyNJWgNdjwwe2fs4yQuAp6220aq6meYbSpKkg8CqRi2tqj8Gnr3GfZEkDUnX00Qv7Hl4CM27+gP+zYEk6eDQ9dtE/7Hn/gPAbuCMNe+NJGkoun5m4Ae8kjTCul7c5sQkf5TkriRfSXJVkhP73TlJ0mB0/QD5UuBqmusabAI+3E6TJI2ArmFwfFVdWlUPtLfLgOP72C9J0gB1DYOvJXlJkg3t7SXA1/vZMUnS4HQNg1cALwL+EdgDnIm/GpakkdH1q6W/DZxTVfcAJHkU8D9pQkKStM51PTL4d/NBAFBVdwM/3J8uSZIGrWsYHJLkuPkH7ZFB16MKSdJBrusL+luATye5kmYYihcBb+pbryRJA9X1F8iXJ9lOMzhdgBdWldcgkKQR0flUT/vibwBI0gha1RDWkqTRYhhIkgwDSZJhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJDHEMEiyIclfJvnIsPogSWoM88jgPOCWIbYvSWoNJQySnAicDlwyjPYlSQ+Vqhp8o83lM/878Ejggqp6/iLLbAG2AExOTp66bdu2FbUxNzfHxMQEO+/cuwY9Xp2TNh0ztLbn6x9X41z/ONcO1t9b/+bNm3dU1XSX9QZ+UfskzwfuqqodSWaWWq6qtgJbAaanp2tmZslFFzU7O8vMzAznXvjRA+jtgdl99szQ2p6vf1yNc/3jXDtY/2rrH8ZpomcAP5NkN7ANeHaS9w2hH5Kk1sDDoKpeV1UnVtUUcBbwqap6yaD7IUnax98ZSJIG/5lBr6qaBWaH2QdJkkcGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJYsijlqox1fFqbLsvPr3PPZE0rjwykCQZBpIkw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkMIQySPCbJtUluSfL5JOcNug+SpIcaxsVtHgBeW1U3JXkksCPJJ6vqC0PoiySJIRwZVNWeqrqpvf9N4BZg06D7IUnaJ1U1vMaTKeB64KlVde+CeVuALQCTk5Onbtu2bUXbnpubY2Jigp137l2bzh4ETtp0TOdl5+tfC12fw5X0r9/Wsv71ZpxrB+vvrX/z5s07qmq6y3pDC4MkE8B1wJuq6oP7W3Z6erq2b9++ou3Pzs4yMzPT+frC68FKroE8X/9aWI/XaF7L+tebca4drL+3/iSdw2Ao3yZKchhwFXDFckEgSeq/YXybKMDvA7dU1e8Oun1J0sMN48jgGcBLgWcnubm9PW8I/ZAktQb+1dKq+jMgg25XkrQ0f4EsSTIMJEmGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEliCKOWavXW4xXHDjZrfeW7fjzX7ufRt5K/w0HtZ48MJEmGgSTJMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRJDCoMkpyW5NckXk1w4jD5IkvYZeBgk2QC8A/hp4CnAi5M8ZdD9kCTtM4wjg6cBX6yq26vq28A24Iwh9EOS1EpVDbbB5EzgtKr6T+3jlwI/UlWvWrDcFmBL+/DJwK0rbGoj8LUD7O56Zv3jW/841w7W31v/Y6vq+C4rDeMayFlk2sMSqaq2AltX3UiyvaqmV7v+emf941v/ONcO1r/a+odxmugO4DE9j08EvjyEfkiSWsMIgxuBJyZ5XJLDgbOAq4fQD0lSa+CniarqgSSvAv4E2AC8u6o+34emVn2KaURY//ga59rB+ldV/8A/QJYkHXz8BbIkyTCQJI1AGCw3tEUa/7ud/7kkpwyjn/3Sof6ZJHuT3NzeXj+MfvZDkncnuSvJriXmj/q+X67+Ud73j0lybZJbknw+yXmLLDOy+79j/Svb/1W1bm80H0D/LfB44HDgr4CnLFjmecDHaX7f8HTghmH3e8D1zwAfGXZf+1T/s4BTgF1LzB/Zfd+x/lHe9ycAp7T3Hwn8zZj93+9S/4r2/3o/MugytMUZwOXV+AxwbJITBt3RPhnroT2q6nrg7v0sMsr7vkv9I6uq9lTVTe39bwK3AJsWLDay+79j/Suy3sNgE/Clnsd38PAnpMsy61XX2n40yV8l+XiSHxxM1w4Ko7zvuxr5fZ9kCvhh4IYFs8Zi/++nfljB/h/GcBRrqcvQFp2Gv1inutR2E834JHNJngf8MfDEvvfs4DDK+76Lkd/3SSaAq4Dzq+rehbMXWWWk9v8y9a9o/6/3I4MuQ1uM8vAXy9ZWVfdW1Vx7/2PAYUk2Dq6LQzXK+35Zo77vkxxG80J4RVV9cJFFRnr/L1f/Svf/eg+DLkNbXA28rP1mwdOBvVW1Z9Ad7ZNl60/yb5Kkvf80mn3+9YH3dDhGed8va5T3fVvX7wO3VNXvLrHYyO7/LvWvdP+v69NEtcTQFkl+sZ3/TuBjNN8q+CLwT8DLh9Xftdax/jOBX0ryAPDPwFnVftVgvUvyfppvTGxMcgfwBuAwGP19D53qH9l9DzwDeCmwM8nN7bRfA74fxmL/d6l/Rfvf4SgkSev+NJEkaQ0YBpIkw0CSZBhIkjAMJEkYButekkrylp7HFyS5aI22fVmSM9diW8u087Pt6IvX9rutJdo/P8mRw2i7bf9NSb6UZG7B9Eck+YN21M0b2mEH5uedk+S29nZOz/TdB8MPyxbWssj8Y5P8cs/jRye5sv8901IMg/XvfuCFB8MLQK8kG1aw+CuBX66qzWu0vc7a7Z4PDCQMlqjjwzSDDi70SuCeqvq3wP8C/ke7jUfR/KbgR9r13pDkuP70uG+OBR4Mg6r6clX1/Y2HlmYYrH8P0Fzz9FcWzlj4zn7+3Vo7zvl1ST6Q5G+SXJzk7CSfTbIzyRN6NvOTSf5/u9zz2/U3JHlzkhvTjBP/Cz3bvTbJ/wV2LtKfF7fb35Vk/oXt9cAzgXcmefOC5R+yvf20e0KS69OM2b4ryY8v1d7885DkjUluAH4deDRwbdvWhvZ529Wuu9Tz+s61el6q6jNL/DL2DOA97f0rgee0vyj9D8Anq+ruqroH+CRw2oI+fl+STyT5z4v0fy7JW5LclOSaJMe3009O8pm27380HzBJZpO8Ncmn2+flae30i5Jc0LPdXb1HL+20ibaNm9rnc35U3YuBJ7T77M1JptJelyHJEUkubZf/yySb2+nnJvlgW9dtSX5nkedMqzXscbm9HfC45nPA0cBu4BjgAuCidt5lwJm9y7b/zgDfoBkT/RHAncBvtfPOA97as/4naN40PJFmrJcjgC3Ab7TLPALYDjyu3e59wOMW6eejgX8Ajqf55fungBe082aB6UXWecj29tPua4Ffb6dvoBnffX/tFfCinnZ2Axvb+6fSvNDOzzt2kX6t2fOycF8ueLwLOLHn8d8CG9t9/Bs9038TuKCnling/wEvW6KdAs5u778e+D/t/c8BP9Hef2PP38Es8K72/rNor58AXDTfbk9/pxb8rR0KHN3e30jza+C0fdzVs+5Uz3ZfC1za3v+Bdj8eAZwL3E7zd34E8PfAY4b9f3BUbh4ZjIBqRiu8HHjNCla7sZox0e+neZH503b6Tpr/mPM+UFXfq6rbaP4j/gDwXJoxX26mGTb3X7FvNMTPVtXfLdLevwdmq+qrVfUAcAXNC8tyere3VLs3Ai9P81nJSdWM776/9r5LM8DXYm4HHp/k7UlOAxaOBDlvrZ6X/Vlq1M3lRuP8EM2L6eVLbPd7wB+0998HPDPJMTTBd107/T08dP+8Hx68hsLRSY7tVgIB/luSz9EE1CZgcpl1ngm8t23vr2le9J/UzrumqvZW1beALwCP7dgPLcMwGB1vpTnHfFTPtAdo93F7euHwnnn399z/Xs/j7/HQMasWjlcy/2L06qo6ub09rqrmw+S+Jfq32AtYF73bW7Td9gXqWTRHOO9N8rJl2vtWVX13sRnVnHb5IZp3w/8FuGSJbazV87I/D466meRQmnfEd7P8aJx/Dvx0u8+76DImzWL1Pvj31TpikfXOpjk6O7WqTga+ssRyvfbX796/2++yzsdXO5gYBiOiqu4GPkATCPN205z2gOb882Gr2PTPJjmk/Rzh8cCtNAPj/VKaIXRJ8qQkR+1vIzTvlH8iycY0H6K+GLhumXUWWrTdJI8F7qqqd9GM5HjKCtv7Js2pJdJ8EH9IVV1Fc/plqevmrtXzsj9XA/PfFDoT+FQ1507+BHhukuPa8/rPbafNez3N6JS/t8R2D2m3B/DzwJ9V1V7gnvnPW2gGQet9vn6uremZNKN/7qX5+zqlnX4KzSmxhY6h2Tffac/9z7+Tf/A5X8T1NCFCkifRDL526xLLao2YqqPlLcCreh6/C/hQks8C17C6d6e30rwoTAK/WFXfSnIJzamkm9p3n18FXrC/jVTVniSvA66leef3sar60Ar7slS7M8B/TfIdms9QXrbC9rYCH0+yh+abRZcmmX+j9Lol1lmT5wWg/SD054Ej04w+eklVXUQTbO9N8kWaI4KzoAn+JL9Nc3oM4I3tm4Fe5wPvTvI7VfWrC+bdB/xgkh3AXtoXeprgeWear9nezkNH+bwnyadpPp96RTvtKvadFruR5jq8C10BfDjJduBm4K/bGr6e5M/bD40/DryjZ53fa/uxk+bo49yqur/7gY5Ww1FLpRVKchnNhcbX5ffik8xV1cQKlp+l+aB4e/96pWHzNJEkySMDSZJHBpIkDANJEoaBJAnDQJKEYSBJAv4FFZnvyL/kiA8AAAAASUVORK5CYII=\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEGCAYAAACHGfl5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAUVElEQVR4nO3dfZRkdX3n8feHASTQ8uAO24uDsdVVc4xsCPQaE43p0cQl4gaPhxgMKqi7k4dVIUc2B/OgxBx32bhmdV1zPCMRRFknBkzEx8RFGjYxIjOEOKOEYMgkghNRwcEmEUW/+ePeZoqme/p2T1fVdNX7dU6dqbpPv9+3bk996t6q+t1UFZKk8XbIsDsgSRo+w0CSZBhIkgwDSRKGgSQJOHTYHehi48aNNTU1taJ17rvvPo466qj+dGgdsP7xrX+cawfr761/x44dX6uq47usty7CYGpqiu3bt69ondnZWWZmZvrToXXA+se3/nGuHay/t/4kf991PU8TSZIMA0mSYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSSJdfIL5AMxdeFHOy23++LT+9wTSTp4eWQgSTIMJEmGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAk0ccwSPLuJHcl2dUz7VFJPpnktvbf4/rVviSpu34eGVwGnLZg2oXANVX1ROCa9rEkacj6FgZVdT1w94LJZwDvae+/B3hBv9qXJHU36M8MJqtqD0D7778ecPuSpEWkqvq38WQK+EhVPbV9/I2qOrZn/j1VtejnBkm2AFsAJicnT922bduK2p6bm2NiYoKdd+7ttPxJm45Z0fYPdvP1j6txrn+cawfr761/8+bNO6pqust6g74G8leSnFBVe5KcANy11IJVtRXYCjA9PV0zMzMramh2dpaZmRnO7XoN5LNXtv2D3Xz942qc6x/n2sH6V1v/oE8TXQ2c094/B/jQgNuXJC2in18tfT/wF8CTk9yR5JXAxcBPJbkN+Kn2sSRpyPp2mqiqXrzErOf0q01J0ur4C2RJkmEgSTIMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEkMKgyS/kuTzSXYleX+SI4bRD0lSY+BhkGQT8BpguqqeCmwAzhp0PyRJ+wzrNNGhwPclORQ4EvjykPohSQJSVYNvNDkPeBPwz8CfVtXZiyyzBdgCMDk5eeq2bdtW1Mbc3BwTExPsvHPvGvR4n5M2HbOm2+uX+frH1TjXP861g/X31r958+YdVTXdZb2Bh0GS44CrgJ8DvgH8IXBlVb1vqXWmp6dr+/btK2pndnaWmZkZpi786IF092F2X3z6mm6vX+brH1fjXP841w7W31t/ks5hMIzTRD8J/F1VfbWqvgN8EPixIfRDktQaRhj8A/D0JEcmCfAc4JYh9EOS1Bp4GFTVDcCVwE3AzrYPWwfdD0nSPocOo9GqegPwhmG0LUl6OH+BLEkyDCRJhoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQ6hkGSa7pMkyStT4fub2aSI4AjgY1JjgPSzjoaeHSf+yZJGpD9hgHwC8D5NC/8O9gXBvcC7+hjvyRJA7TfMKiqtwFvS/Lqqnr7gPokSRqw5Y4MAKiqtyf5MWCqd52qunw1jSY5FrgEeCpQwCuq6i9Wsy1J0oHrFAZJ3gs8AbgZ+G47uYBVhQHwNuATVXVmksNpPpeQJA1JpzAApoGnVFUdaINJjgaeBZwLUFXfBr59oNuVJK1eury+J/lD4DVVteeAG0xOBrYCXwB+iOaD6fOq6r4Fy20BtgBMTk6eum3bthW1Mzc3x8TEBDvv3HugXX6IkzYds6bb65f5+sfVONc/zrWD9ffWv3nz5h1VNd1lva5hcC1wMvBZ4P756VX1MyvtaJJp4DPAM6rqhiRvA+6tqt9cap3p6enavn37itqZnZ1lZmaGqQs/utIu7tfui09f0+31y3z942qc6x/n2sH6e+tP0jkMup4mumh13VrUHcAdVXVD+/hK4MI13L4kaYW6fpvourVqsKr+McmXkjy5qm4FnkNzykiSNCRdv030TZpvDwEcDhwG3FdVR6+y3VcDV7TfJLodePkqtyNJWgNdjwwe2fs4yQuAp6220aq6meYbSpKkg8CqRi2tqj8Gnr3GfZEkDUnX00Qv7Hl4CM27+gP+zYEk6eDQ9dtE/7Hn/gPAbuCMNe+NJGkoun5m4Ae8kjTCul7c5sQkf5TkriRfSXJVkhP73TlJ0mB0/QD5UuBqmusabAI+3E6TJI2ArmFwfFVdWlUPtLfLgOP72C9J0gB1DYOvJXlJkg3t7SXA1/vZMUnS4HQNg1cALwL+EdgDnIm/GpakkdH1q6W/DZxTVfcAJHkU8D9pQkKStM51PTL4d/NBAFBVdwM/3J8uSZIGrWsYHJLkuPkH7ZFB16MKSdJBrusL+luATye5kmYYihcBb+pbryRJA9X1F8iXJ9lOMzhdgBdWldcgkKQR0flUT/vibwBI0gha1RDWkqTRYhhIkgwDSZJhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJDHEMEiyIclfJvnIsPogSWoM88jgPOCWIbYvSWoNJQySnAicDlwyjPYlSQ+Vqhp8o83lM/878Ejggqp6/iLLbAG2AExOTp66bdu2FbUxNzfHxMQEO+/cuwY9Xp2TNh0ztLbn6x9X41z/ONcO1t9b/+bNm3dU1XSX9QZ+UfskzwfuqqodSWaWWq6qtgJbAaanp2tmZslFFzU7O8vMzAznXvjRA+jtgdl99szQ2p6vf1yNc/3jXDtY/2rrH8ZpomcAP5NkN7ANeHaS9w2hH5Kk1sDDoKpeV1UnVtUUcBbwqap6yaD7IUnax98ZSJIG/5lBr6qaBWaH2QdJkkcGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJYsijlqox1fFqbLsvPr3PPZE0rjwykCQZBpIkw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkjAMJEkMIQySPCbJtUluSfL5JOcNug+SpIcaxsVtHgBeW1U3JXkksCPJJ6vqC0PoiySJIRwZVNWeqrqpvf9N4BZg06D7IUnaJ1U1vMaTKeB64KlVde+CeVuALQCTk5Onbtu2bUXbnpubY2Jigp137l2bzh4ETtp0TOdl5+tfC12fw5X0r9/Wsv71ZpxrB+vvrX/z5s07qmq6y3pDC4MkE8B1wJuq6oP7W3Z6erq2b9++ou3Pzs4yMzPT+frC68FKroE8X/9aWI/XaF7L+tebca4drL+3/iSdw2Ao3yZKchhwFXDFckEgSeq/YXybKMDvA7dU1e8Oun1J0sMN48jgGcBLgWcnubm9PW8I/ZAktQb+1dKq+jMgg25XkrQ0f4EsSTIMJEmGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEliCKOWavXW4xXHDjZrfeW7fjzX7ufRt5K/w0HtZ48MJEmGgSTJMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRJDCoMkpyW5NckXk1w4jD5IkvYZeBgk2QC8A/hp4CnAi5M8ZdD9kCTtM4wjg6cBX6yq26vq28A24Iwh9EOS1EpVDbbB5EzgtKr6T+3jlwI/UlWvWrDcFmBL+/DJwK0rbGoj8LUD7O56Zv3jW/841w7W31v/Y6vq+C4rDeMayFlk2sMSqaq2AltX3UiyvaqmV7v+emf941v/ONcO1r/a+odxmugO4DE9j08EvjyEfkiSWsMIgxuBJyZ5XJLDgbOAq4fQD0lSa+CniarqgSSvAv4E2AC8u6o+34emVn2KaURY//ga59rB+ldV/8A/QJYkHXz8BbIkyTCQJI1AGCw3tEUa/7ud/7kkpwyjn/3Sof6ZJHuT3NzeXj+MfvZDkncnuSvJriXmj/q+X67+Ud73j0lybZJbknw+yXmLLDOy+79j/Svb/1W1bm80H0D/LfB44HDgr4CnLFjmecDHaX7f8HTghmH3e8D1zwAfGXZf+1T/s4BTgF1LzB/Zfd+x/lHe9ycAp7T3Hwn8zZj93+9S/4r2/3o/MugytMUZwOXV+AxwbJITBt3RPhnroT2q6nrg7v0sMsr7vkv9I6uq9lTVTe39bwK3AJsWLDay+79j/Suy3sNgE/Clnsd38PAnpMsy61XX2n40yV8l+XiSHxxM1w4Ko7zvuxr5fZ9kCvhh4IYFs8Zi/++nfljB/h/GcBRrqcvQFp2Gv1inutR2E834JHNJngf8MfDEvvfs4DDK+76Lkd/3SSaAq4Dzq+rehbMXWWWk9v8y9a9o/6/3I4MuQ1uM8vAXy9ZWVfdW1Vx7/2PAYUk2Dq6LQzXK+35Zo77vkxxG80J4RVV9cJFFRnr/L1f/Svf/eg+DLkNbXA28rP1mwdOBvVW1Z9Ad7ZNl60/yb5Kkvf80mn3+9YH3dDhGed8va5T3fVvX7wO3VNXvLrHYyO7/LvWvdP+v69NEtcTQFkl+sZ3/TuBjNN8q+CLwT8DLh9Xftdax/jOBX0ryAPDPwFnVftVgvUvyfppvTGxMcgfwBuAwGP19D53qH9l9DzwDeCmwM8nN7bRfA74fxmL/d6l/Rfvf4SgkSev+NJEkaQ0YBpIkw0CSZBhIkjAMJEkYButekkrylp7HFyS5aI22fVmSM9diW8u087Pt6IvX9rutJdo/P8mRw2i7bf9NSb6UZG7B9Eck+YN21M0b2mEH5uedk+S29nZOz/TdB8MPyxbWssj8Y5P8cs/jRye5sv8901IMg/XvfuCFB8MLQK8kG1aw+CuBX66qzWu0vc7a7Z4PDCQMlqjjwzSDDi70SuCeqvq3wP8C/ke7jUfR/KbgR9r13pDkuP70uG+OBR4Mg6r6clX1/Y2HlmYYrH8P0Fzz9FcWzlj4zn7+3Vo7zvl1ST6Q5G+SXJzk7CSfTbIzyRN6NvOTSf5/u9zz2/U3JHlzkhvTjBP/Cz3bvTbJ/wV2LtKfF7fb35Vk/oXt9cAzgXcmefOC5R+yvf20e0KS69OM2b4ryY8v1d7885DkjUluAH4deDRwbdvWhvZ529Wuu9Tz+s61el6q6jNL/DL2DOA97f0rgee0vyj9D8Anq+ruqroH+CRw2oI+fl+STyT5z4v0fy7JW5LclOSaJMe3009O8pm27380HzBJZpO8Ncmn2+flae30i5Jc0LPdXb1HL+20ibaNm9rnc35U3YuBJ7T77M1JptJelyHJEUkubZf/yySb2+nnJvlgW9dtSX5nkedMqzXscbm9HfC45nPA0cBu4BjgAuCidt5lwJm9y7b/zgDfoBkT/RHAncBvtfPOA97as/4naN40PJFmrJcjgC3Ab7TLPALYDjyu3e59wOMW6eejgX8Ajqf55fungBe082aB6UXWecj29tPua4Ffb6dvoBnffX/tFfCinnZ2Axvb+6fSvNDOzzt2kX6t2fOycF8ueLwLOLHn8d8CG9t9/Bs9038TuKCnling/wEvW6KdAs5u778e+D/t/c8BP9Hef2PP38Es8K72/rNor58AXDTfbk9/pxb8rR0KHN3e30jza+C0fdzVs+5Uz3ZfC1za3v+Bdj8eAZwL3E7zd34E8PfAY4b9f3BUbh4ZjIBqRiu8HHjNCla7sZox0e+neZH503b6Tpr/mPM+UFXfq6rbaP4j/gDwXJoxX26mGTb3X7FvNMTPVtXfLdLevwdmq+qrVfUAcAXNC8tyere3VLs3Ai9P81nJSdWM776/9r5LM8DXYm4HHp/k7UlOAxaOBDlvrZ6X/Vlq1M3lRuP8EM2L6eVLbPd7wB+0998HPDPJMTTBd107/T08dP+8Hx68hsLRSY7tVgIB/luSz9EE1CZgcpl1ngm8t23vr2le9J/UzrumqvZW1beALwCP7dgPLcMwGB1vpTnHfFTPtAdo93F7euHwnnn399z/Xs/j7/HQMasWjlcy/2L06qo6ub09rqrmw+S+Jfq32AtYF73bW7Td9gXqWTRHOO9N8rJl2vtWVX13sRnVnHb5IZp3w/8FuGSJbazV87I/D466meRQmnfEd7P8aJx/Dvx0u8+76DImzWL1Pvj31TpikfXOpjk6O7WqTga+ssRyvfbX796/2++yzsdXO5gYBiOiqu4GPkATCPN205z2gOb882Gr2PTPJjmk/Rzh8cCtNAPj/VKaIXRJ8qQkR+1vIzTvlH8iycY0H6K+GLhumXUWWrTdJI8F7qqqd9GM5HjKCtv7Js2pJdJ8EH9IVV1Fc/plqevmrtXzsj9XA/PfFDoT+FQ1507+BHhukuPa8/rPbafNez3N6JS/t8R2D2m3B/DzwJ9V1V7gnvnPW2gGQet9vn6uremZNKN/7qX5+zqlnX4KzSmxhY6h2Tffac/9z7+Tf/A5X8T1NCFCkifRDL526xLLao2YqqPlLcCreh6/C/hQks8C17C6d6e30rwoTAK/WFXfSnIJzamkm9p3n18FXrC/jVTVniSvA66leef3sar60Ar7slS7M8B/TfIdms9QXrbC9rYCH0+yh+abRZcmmX+j9Lol1lmT5wWg/SD054Ej04w+eklVXUQTbO9N8kWaI4KzoAn+JL9Nc3oM4I3tm4Fe5wPvTvI7VfWrC+bdB/xgkh3AXtoXeprgeWear9nezkNH+bwnyadpPp96RTvtKvadFruR5jq8C10BfDjJduBm4K/bGr6e5M/bD40/DryjZ53fa/uxk+bo49yqur/7gY5Ww1FLpRVKchnNhcbX5ffik8xV1cQKlp+l+aB4e/96pWHzNJEkySMDSZJHBpIkDANJEoaBJAnDQJKEYSBJAv4FFZnvyL/kiA8AAAAASUVORK5CYII=", "text/plain": [ "
" ] @@ -1018,7 +1018,7 @@ "outputs": [ { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAEGCAYAAACAd+UpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAV3UlEQVR4nO3de5QkZXnH8e8DKyosIGbJBhfjgreEI17Y8RJRMqscRMFLCBoIKqjJmkQRvCTiMSLRcIIaNBxj5OAFBdFVAe8R9OgOxhu4iyCLK3LbRHQFFQRnveDikz+qBnp7p2d6Zrq2u16/n3PmbHd1Vb1Pv1P7m+q3u9+KzESSVKYdhl2AJKk5hrwkFcyQl6SCGfKSVDBDXpIKtmjYBXRasmRJLl++fM7bbd68mV122WXwBTWobTW3rV5oX81tqxfaV3Pb6oXZa163bt1PM3PPnitk5sj8rFixIudjzZo189pumNpWc9vqzWxfzW2rN7N9Nbet3szZawbW5gy56nCNJBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVbKSmNdgelp/0ub7W23jaYQ1XIknN80xekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKlijIR8Rr4yIqyNifUR8JCLu02R7kqStNRbyEbEMeAUwlpmPAHYEjmqqPUnStpoerlkE3DciFgE7Az9quD1JUofIzOZ2HnECcCrwK+ALmXnMNOusAlYBLF26dMXq1avn3M7k5CQ33n7XAqvd2v7Ldh/o/rpNTk6yePHiRtsYpLbVC+2ruW31Qvtqblu9MHvNK1euXJeZY70ebyzkI2IP4ALgr4CfAx8Hzs/MD/XaZmxsLNeuXTvntiYmJjjuos3zLXVaG087bKD76zYxMcH4+HijbQxS2+qF9tXctnqhfTW3rV6YveaImDHkmxyuORi4MTN/kpm/BS4Enthge5KkLk2G/P8BT4iInSMigKcCGxpsT5LUpbGQz8xLgfOBy4Gr6rbOaqo9SdK2FjW588x8I/DGJtuQJPXmN14lqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIK1mjIR8T9IuL8iPheRGyIiD9rsj1J0tYWNbz/M4CLMvPIiNgJ2Lnh9iRJHRoL+YjYDTgIOA4gM+8E7myqPUnStiIzm9lxxKOBs4DvAo8C1gEnZObmrvVWAasAli5dumL16tVzbmtycpIbb79rwTXPx/7Ldp/XdpOTkyxevHjA1TSnbfVC+2puW73QvprbVi/MXvPKlSvXZeZYr8ebDPkx4JvAgZl5aUScAdyRmW/otc3Y2FiuXbt2zm1NTExw3EWbZ1+xARtPO2xe201MTDA+Pj7YYhrUtnqhfTW3rV5oX81tqxdmrzkiZgz5Jt94vQm4KTMvre+fDxzQYHuSpC6NhXxm/hj4QUQ8vF70VKqhG0nSdtL0p2uOB86rP1lzA/CihtuTJHVoNOQz8wqg51iRJKlZfuNVkgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQXrK+Qj4kv9LJMkjZYZZ6GMiPtQXXx7SUTsAUT90G7AAxquTZK0QLNNNfxS4ESqQF/HPSF/B/CuBuuSJA3AjCGfmWcAZ0TE8Zn5zu1UkyRpQPq6aEhmvjMinggs79wmM89pqC5J0gD0FfIRcS7wYOAK4K56cQKGvCSNsH4v/zcG7JeZ2WQxkqTB6vdz8uuBP2qyEEnS4PV7Jr8E+G5EXAb8ZmphZj6rkaokSQPRb8if0mQRkqRm9PvpmkuaLkSSNHj9frrmF1SfpgHYCbgXsDkzd2uqMEnSwvV7Jr9r5/2IeA7wuEYqkiQNzLxmoczMTwJPGXAtkqQB63e45oiOuztQfW7ez8xL0ojr99M1z+y4vQXYCDx74NVIkgaq3zH5FzVdiCRp8Pq9aMjeEfGJiLglIm6OiAsiYu+mi5MkLUy/b7yeDXyaal75ZcBn6mWSpBHWb8jvmZlnZ+aW+ucDwJ4N1iVJGoB+Q/6nEfH8iNix/nk+8LMmC5MkLVy/If9i4HnAj4FNwJGAb8ZK0ojr9yOUbwaOzczbACLi/sC/U4W/JGlE9Xsm/8ipgAfIzFuBxzRTkiRpUPoN+R0iYo+pO/WZfL+vAiRJQ9JvUJ8OfD0izqeazuB5wKn9bBgROwJrgR9m5uHzqlKSNC/9fuP1nIhYSzUpWQBHZOZ3+2zjBGAD4LTEkrSd9T3kUod6v8EOVN+UBQ6jOut/1dxKkyQtVGQ2N5lkPbzzb8CuwGumG66JiFXAKoClS5euWL169ZzbmZyc5Mbb71pgtfOz/7Ld57Xd5OQkixcvHnA1zWlbvdC+mttWL7Sv5rbVC7PXvHLlynWZOdbr8cbePI2Iw4FbMnNdRIz3Wi8zzwLOAhgbG8vx8Z6r9jQxMcHpX908z0oXZuMx4/PabmJigvk812FpW73QvprbVi+0r+a21QsLr3leFw3p04HAsyJiI7AaeEpEfKjB9iRJXRoL+cx8XWbunZnLgaOAL2fm85tqT5K0rSbP5CVJQ7ZdvtCUmRPAxPZoS5J0D8/kJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWDb5cpQJVt+0uf6Wm/jaYc1XIkkbcszeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalgjYV8RDwwItZExIaIuDoiTmiqLUnS9Jq8kPcW4NWZeXlE7Aqsi4gvZuZ3G2xTktShsTP5zNyUmZfXt38BbACWNdWeJGlbkZnNNxKxHPgK8IjMvKPrsVXAKoClS5euWL169Zz3Pzk5yY2337XwQhu0/7Ldt7o/OTnJ4sWLh1TN3LWtXmhfzW2rF9pXc9vqhdlrXrly5brMHOv1eOMhHxGLgUuAUzPzwpnWHRsby7Vr1865jYmJCY67aPM8K9w+Np522Fb3JyYmGB8fH04x89C2eqF9NbetXmhfzW2rF2avOSJmDPlGP10TEfcCLgDOmy3gJUmD1+SnawJ4H7AhM9/eVDuSpN6aPJM/EHgB8JSIuKL+eUaD7UmSujT2EcrM/CoQTe1fkjQ7v/EqSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBWvsylDa2vKTPrfV/Vfvv4XjupbN1cbTDptX2wvdX7/6bXcuBl1jv9rQh8Pqm5IM+vc8rOOmk2fyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEaDfmIODQiromI6yLipCbbkiRtq7GQj4gdgXcBTwf2A46OiP2aak+StK0mz+QfB1yXmTdk5p3AauDZDbYnSeoSmdnMjiOOBA7NzL+p778AeHxmvrxrvVXAqvruw4Fr5tHcEuCnCyh3GNpWc9vqhfbV3LZ6oX01t61emL3mB2Xmnr0eXDT4eu4W0yzb5i9KZp4FnLWghiLWZubYQvaxvbWt5rbVC+2ruW31Qvtqblu9sPCamxyuuQl4YMf9vYEfNdieJKlLkyH/LeChEbFPROwEHAV8usH2JEldGhuuycwtEfFy4GJgR+D9mXl1Q80taLhnSNpWc9vqhfbV3LZ6oX01t61eWOhwdlNvvEqShs9vvEpSwQx5SSpYq0O+DdMmRMQDI2JNRGyIiKsj4oR6+SkR8cOIuKL+ecawa+0UERsj4qq6trX1svtHxBcj4tr63z2GXSdARDy8ox+viIg7IuLEUevjiHh/RNwSEes7lvXs04h4XX1sXxMRTxuRet8WEd+LiO9ExCci4n718uUR8auOvj5ze9c7Q809j4Nh9/EMNX+0o96NEXFFvXzu/ZyZrfyhejP3emBfYCfgSmC/Ydc1TZ17AQfUt3cFvk81zcMpwGuGXd8MdW8ElnQteytwUn37JOAtw66zx3HxY+BBo9bHwEHAAcD62fq0PkauBO4N7FMf6zuOQL2HAIvq22/pqHd553oj1sfTHgej0Me9au56/HTg5Pn2c5vP5FsxbUJmbsrMy+vbvwA2AMuGW9W8PRv4YH37g8BzhlhLL08Frs/M/x12Id0y8yvArV2Le/Xps4HVmfmbzLwRuI7qmN9upqs3M7+QmVvqu9+k+v7LyOjRx70MvY9h5pojIoDnAR+Z7/7bHPLLgB903L+JEQ/PiFgOPAa4tF708vpl7/tHZeijQwJfiIh19dQTAEszcxNUf7yAPxxadb0dxdb/IUa5j6F3n7bh+H4x8PmO+/tExLcj4pKIePKwiuphuuOgDX38ZODmzLy2Y9mc+rnNId/XtAmjIiIWAxcAJ2bmHcC7gQcDjwY2Ub0kGyUHZuYBVLOIviwiDhp2QbOpv3T3LODj9aJR7+OZjPTxHRGvB7YA59WLNgF/nJmPAV4FfDgidhtWfV16HQcj3ce1o9n6pGXO/dzmkG/NtAkRcS+qgD8vMy8EyMybM/OuzPwd8B6G8DJxJpn5o/rfW4BPUNV3c0TsBVD/e8vwKpzW04HLM/NmGP0+rvXq05E9viPiWOBw4JisB4rrIY+f1bfXUY1vP2x4Vd5jhuNgZPsYICIWAUcAH51aNp9+bnPIt2LahHpM7X3Ahsx8e8fyvTpW+wtgffe2wxIRu0TErlO3qd5sW0/Vv8fWqx0LfGo4Ffa01VnPKPdxh159+mngqIi4d0TsAzwUuGwI9W0lIg4FXgs8KzN/2bF8z6iuIUFE7EtV7w3DqXJrMxwHI9nHHQ4GvpeZN00tmFc/b+93kgf8rvQzqD6tcj3w+mHX06PGJ1G9BPwOcEX98wzgXOCqevmngb2GXWtHzftSfergSuDqqb4F/gD4EnBt/e/9h11rR807Az8Ddu9YNlJ9TPUHaBPwW6qzyJfM1KfA6+tj+xrg6SNS73VU49hTx/KZ9bp/WR8rVwKXA88coT7ueRwMu4971Vwv/wDwd13rzrmfndZAkgrW5uEaSdIsDHlJKpghL0kFM+QlqWCGvCQVzJAfMRGREXF6x/3XRMQpA9r3ByLiyEHsa5Z2nhvVrJtrmm6rR/snRsTOw2i7bv/UiPhBREx2Lb93PbvgdRFxaT3NxdRjx9YzUV5bf9loavnGiFiy/arfviLi6/W/yztnYdTgGPKj5zfAEaP2H3vqCxh9egnwD5m5ckD761u93xOpPjffuB7P4zNM/+3alwC3ZeZDgHdQzeJIRNwfeCPw+Hq7N47oPDsD/71l5hMHuT9ty5AfPVuorun4yu4Hus/Ep84UI2K8nqzoYxHx/Yg4LSKOiYjLopoT/sEduzk4Iv6nXu/wevsdo5on/Fv1JE4v7djvmoj4MNWXSbrrObre//qImAqsk6m+AHZmRLyta/2t9jdDu3tFxFeimi97/dQkTNO1N9UPEfGmiLiU6sstDwDW1G3tWPfb+nrbXv165qD6JTO/mfWkY106Z5w8H3hq/Y3opwFfzMxbM/M24IvAoV013jciLoqIv+1aPu3zi4gVEXFlRHyjfg7r6+XHRcR/dmz/2YgYr2+/OyLWRnXdg3/pWGdjRJwcEV8FnhsRh9T7vTwiPh7VvEzdfToREe+of48bIuKxEXFh/UrlXzt/d9NsO6fjQjNr7ELeWpB3Ad+JiLfOYZtHAX9KNWXpDcB7M/NxUV2k5Hiqs1uo5qP+c6oJm9ZExEOAFwK3Z+ZjI+LewNci4gv1+o8DHpHVVKx3i4gHUJ2JrgBuo5qx8jmZ+aaIeArV/N1rp6nz7v1FNbvldO0eAVycmadGdea48wztfRLYhWqO7ZPr2l4MrMzMn0bECmBZZj6ifux+PfpvIP0yi7tnPczqQve3U33jdbbZEBdTTaV9Tmae07XPR/d4fmcDx2fmJd1/bGfw+sy8te7zL0XEIzPzO/Vjv87MJ0X1CvNC4ODM3BwRr6WaKOtN0+zvzsw8qD4GP0X1u7sVuD4i3pH1HCzTeAl9Hhd9Pq/fa4b8CMrMOyLiHOAVwK/63OxbU2ePEXE9MBVGVwGdwyYfy2qipmsj4gbgT6jmpnlk3PMqYXeqOTHuBC7rEWSPBSYy8yd1m+dRXfzgk7PU2bm/Xu1+C3h/VBO7fTIzr6j/cPRq7y6qCeCmcwOwb0S8E/hcR790G1S/zKTXrIezzYb4KeCtmXneNOtt8/wiYnfgfpl5Sb3OuVSTt83mefUf3kVUF7vZj2oqALhnkqwn1Mu/Vr0IYSfgGz32NzWX1FXA1R3H5w1UE4P1Cvm+j4s+ntPvPUN+dP0H1dwUZ3cs20I9xFa/zN+p47HfdNz+Xcf937H177l7HoupkDk+My/ufKB+Gb+5R33TBVM/Ovc3bbt12wcBhwHn1meid8ywz19n5l3TPZCZt0XEo6iGRF5GdQGGF0+36jT359MvM5ma9fCmqGYY3J3qzPYmYLxjvb2BiY77XwOeHhEfzq55SHo8v1dN83ym3H0M1e5TP6d9gNcAj633+YGpx2pTzzeohpaO7uP5dh6D3cfnTNnT93ExzSsbdXFMfkRl5q3Ax6heuk7ZSPWSF6rx3XvNY9fPjYgdohqn35dqYqaLgb+vz5CIiIdFNfvkTC4F/jwiltQvnY8GLpllm27TthsRDwJuycz3UM3gecAc2/sF1aUWqYcXdsjMC4A31PuazqD6ZSadM04eCXy5Du2LgUMiYo+o3nA9pF425WSqs97/6t7hdM8vM38O3B4RT6pXO6Zjk43Ao+vn+kDueYN4N6ogvz0iltL7zP+bwIH1cBYRsXNEDHpK4bkcF5qFZ/Kj7XTg5R333wN8KiIuo5qxcD5nk9dQheNSqhnufh0R76Uak768foXwE2a5tF9mboqI1wFrqM68/jsz5zr1cK92x4F/jIjfApPAC+fY3lnA5yNiE9V7EWdHxNQJzet6bDOQfgGo30v5a6r3Em6ien/kFKpgOjcirqM6gz8Kqj/oEfFmquEIgDfVf+Q7nUg1VPHWzPynjuXLejy/F9Xr/5Kt/2B8DbiRaghlPdWrRTLzyoj4NtUMhzfU620jM38SEccBH6nHywH+mWo22EHp+7gYYJvFchZK/d6rhyY+m5nnD7uWJkT1efzPTr05q98vDtdIUsE8k5ekgnkmL0kFM+QlqWCGvCQVzJCXpIIZ8pJUsP8HjznZg2f7H4sAAAAASUVORK5CYII=\n", + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAEGCAYAAACAd+UpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAV3UlEQVR4nO3de5QkZXnH8e8DKyosIGbJBhfjgreEI17Y8RJRMqscRMFLCBoIKqjJmkQRvCTiMSLRcIIaNBxj5OAFBdFVAe8R9OgOxhu4iyCLK3LbRHQFFQRnveDikz+qBnp7p2d6Zrq2u16/n3PmbHd1Vb1Pv1P7m+q3u9+KzESSVKYdhl2AJKk5hrwkFcyQl6SCGfKSVDBDXpIKtmjYBXRasmRJLl++fM7bbd68mV122WXwBTWobTW3rV5oX81tqxfaV3Pb6oXZa163bt1PM3PPnitk5sj8rFixIudjzZo189pumNpWc9vqzWxfzW2rN7N9Nbet3szZawbW5gy56nCNJBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVbKSmNdgelp/0ub7W23jaYQ1XIknN80xekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKlijIR8Rr4yIqyNifUR8JCLu02R7kqStNRbyEbEMeAUwlpmPAHYEjmqqPUnStpoerlkE3DciFgE7Az9quD1JUofIzOZ2HnECcCrwK+ALmXnMNOusAlYBLF26dMXq1avn3M7k5CQ33n7XAqvd2v7Ldh/o/rpNTk6yePHiRtsYpLbVC+2ruW31Qvtqblu9MHvNK1euXJeZY70ebyzkI2IP4ALgr4CfAx8Hzs/MD/XaZmxsLNeuXTvntiYmJjjuos3zLXVaG087bKD76zYxMcH4+HijbQxS2+qF9tXctnqhfTW3rV6YveaImDHkmxyuORi4MTN/kpm/BS4Enthge5KkLk2G/P8BT4iInSMigKcCGxpsT5LUpbGQz8xLgfOBy4Gr6rbOaqo9SdK2FjW588x8I/DGJtuQJPXmN14lqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIK1mjIR8T9IuL8iPheRGyIiD9rsj1J0tYWNbz/M4CLMvPIiNgJ2Lnh9iRJHRoL+YjYDTgIOA4gM+8E7myqPUnStiIzm9lxxKOBs4DvAo8C1gEnZObmrvVWAasAli5dumL16tVzbmtycpIbb79rwTXPx/7Ldp/XdpOTkyxevHjA1TSnbfVC+2puW73QvprbVi/MXvPKlSvXZeZYr8ebDPkx4JvAgZl5aUScAdyRmW/otc3Y2FiuXbt2zm1NTExw3EWbZ1+xARtPO2xe201MTDA+Pj7YYhrUtnqhfTW3rV5oX81tqxdmrzkiZgz5Jt94vQm4KTMvre+fDxzQYHuSpC6NhXxm/hj4QUQ8vF70VKqhG0nSdtL0p2uOB86rP1lzA/CihtuTJHVoNOQz8wqg51iRJKlZfuNVkgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQXrK+Qj4kv9LJMkjZYZZ6GMiPtQXXx7SUTsAUT90G7AAxquTZK0QLNNNfxS4ESqQF/HPSF/B/CuBuuSJA3AjCGfmWcAZ0TE8Zn5zu1UkyRpQPq6aEhmvjMinggs79wmM89pqC5J0gD0FfIRcS7wYOAK4K56cQKGvCSNsH4v/zcG7JeZ2WQxkqTB6vdz8uuBP2qyEEnS4PV7Jr8E+G5EXAb8ZmphZj6rkaokSQPRb8if0mQRkqRm9PvpmkuaLkSSNHj9frrmF1SfpgHYCbgXsDkzd2uqMEnSwvV7Jr9r5/2IeA7wuEYqkiQNzLxmoczMTwJPGXAtkqQB63e45oiOuztQfW7ez8xL0ojr99M1z+y4vQXYCDx74NVIkgaq3zH5FzVdiCRp8Pq9aMjeEfGJiLglIm6OiAsiYu+mi5MkLUy/b7yeDXyaal75ZcBn6mWSpBHWb8jvmZlnZ+aW+ucDwJ4N1iVJGoB+Q/6nEfH8iNix/nk+8LMmC5MkLVy/If9i4HnAj4FNwJGAb8ZK0ojr9yOUbwaOzczbACLi/sC/U4W/JGlE9Xsm/8ipgAfIzFuBxzRTkiRpUPoN+R0iYo+pO/WZfL+vAiRJQ9JvUJ8OfD0izqeazuB5wKn9bBgROwJrgR9m5uHzqlKSNC/9fuP1nIhYSzUpWQBHZOZ3+2zjBGAD4LTEkrSd9T3kUod6v8EOVN+UBQ6jOut/1dxKkyQtVGQ2N5lkPbzzb8CuwGumG66JiFXAKoClS5euWL169ZzbmZyc5Mbb71pgtfOz/7Ld57Xd5OQkixcvHnA1zWlbvdC+mttWL7Sv5rbVC7PXvHLlynWZOdbr8cbePI2Iw4FbMnNdRIz3Wi8zzwLOAhgbG8vx8Z6r9jQxMcHpX908z0oXZuMx4/PabmJigvk812FpW73QvprbVi+0r+a21QsLr3leFw3p04HAsyJiI7AaeEpEfKjB9iRJXRoL+cx8XWbunZnLgaOAL2fm85tqT5K0rSbP5CVJQ7ZdvtCUmRPAxPZoS5J0D8/kJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWDb5cpQJVt+0uf6Wm/jaYc1XIkkbcszeUkqmCEvSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalgjYV8RDwwItZExIaIuDoiTmiqLUnS9Jq8kPcW4NWZeXlE7Aqsi4gvZuZ3G2xTktShsTP5zNyUmZfXt38BbACWNdWeJGlbkZnNNxKxHPgK8IjMvKPrsVXAKoClS5euWL169Zz3Pzk5yY2337XwQhu0/7Ldt7o/OTnJ4sWLh1TN3LWtXmhfzW2rF9pXc9vqhdlrXrly5brMHOv1eOMhHxGLgUuAUzPzwpnWHRsby7Vr1865jYmJCY67aPM8K9w+Np522Fb3JyYmGB8fH04x89C2eqF9NbetXmhfzW2rF2avOSJmDPlGP10TEfcCLgDOmy3gJUmD1+SnawJ4H7AhM9/eVDuSpN6aPJM/EHgB8JSIuKL+eUaD7UmSujT2EcrM/CoQTe1fkjQ7v/EqSQUz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBWvsylDa2vKTPrfV/Vfvv4XjupbN1cbTDptX2wvdX7/6bXcuBl1jv9rQh8Pqm5IM+vc8rOOmk2fyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEaDfmIODQiromI6yLipCbbkiRtq7GQj4gdgXcBTwf2A46OiP2aak+StK0mz+QfB1yXmTdk5p3AauDZDbYnSeoSmdnMjiOOBA7NzL+p778AeHxmvrxrvVXAqvruw4Fr5tHcEuCnCyh3GNpWc9vqhfbV3LZ6oX01t61emL3mB2Xmnr0eXDT4eu4W0yzb5i9KZp4FnLWghiLWZubYQvaxvbWt5rbVC+2ruW31Qvtqblu9sPCamxyuuQl4YMf9vYEfNdieJKlLkyH/LeChEbFPROwEHAV8usH2JEldGhuuycwtEfFy4GJgR+D9mXl1Q80taLhnSNpWc9vqhfbV3LZ6oX01t61eWOhwdlNvvEqShs9vvEpSwQx5SSpYq0O+DdMmRMQDI2JNRGyIiKsj4oR6+SkR8cOIuKL+ecawa+0UERsj4qq6trX1svtHxBcj4tr63z2GXSdARDy8ox+viIg7IuLEUevjiHh/RNwSEes7lvXs04h4XX1sXxMRTxuRet8WEd+LiO9ExCci4n718uUR8auOvj5ze9c7Q809j4Nh9/EMNX+0o96NEXFFvXzu/ZyZrfyhejP3emBfYCfgSmC/Ydc1TZ17AQfUt3cFvk81zcMpwGuGXd8MdW8ElnQteytwUn37JOAtw66zx3HxY+BBo9bHwEHAAcD62fq0PkauBO4N7FMf6zuOQL2HAIvq22/pqHd553oj1sfTHgej0Me9au56/HTg5Pn2c5vP5FsxbUJmbsrMy+vbvwA2AMuGW9W8PRv4YH37g8BzhlhLL08Frs/M/x12Id0y8yvArV2Le/Xps4HVmfmbzLwRuI7qmN9upqs3M7+QmVvqu9+k+v7LyOjRx70MvY9h5pojIoDnAR+Z7/7bHPLLgB903L+JEQ/PiFgOPAa4tF708vpl7/tHZeijQwJfiIh19dQTAEszcxNUf7yAPxxadb0dxdb/IUa5j6F3n7bh+H4x8PmO+/tExLcj4pKIePKwiuphuuOgDX38ZODmzLy2Y9mc+rnNId/XtAmjIiIWAxcAJ2bmHcC7gQcDjwY2Ub0kGyUHZuYBVLOIviwiDhp2QbOpv3T3LODj9aJR7+OZjPTxHRGvB7YA59WLNgF/nJmPAV4FfDgidhtWfV16HQcj3ce1o9n6pGXO/dzmkG/NtAkRcS+qgD8vMy8EyMybM/OuzPwd8B6G8DJxJpn5o/rfW4BPUNV3c0TsBVD/e8vwKpzW04HLM/NmGP0+rvXq05E9viPiWOBw4JisB4rrIY+f1bfXUY1vP2x4Vd5jhuNgZPsYICIWAUcAH51aNp9+bnPIt2LahHpM7X3Ahsx8e8fyvTpW+wtgffe2wxIRu0TErlO3qd5sW0/Vv8fWqx0LfGo4Ffa01VnPKPdxh159+mngqIi4d0TsAzwUuGwI9W0lIg4FXgs8KzN/2bF8z6iuIUFE7EtV7w3DqXJrMxwHI9nHHQ4GvpeZN00tmFc/b+93kgf8rvQzqD6tcj3w+mHX06PGJ1G9BPwOcEX98wzgXOCqevmngb2GXWtHzftSfergSuDqqb4F/gD4EnBt/e/9h11rR807Az8Ddu9YNlJ9TPUHaBPwW6qzyJfM1KfA6+tj+xrg6SNS73VU49hTx/KZ9bp/WR8rVwKXA88coT7ueRwMu4971Vwv/wDwd13rzrmfndZAkgrW5uEaSdIsDHlJKpghL0kFM+QlqWCGvCQVzJAfMRGREXF6x/3XRMQpA9r3ByLiyEHsa5Z2nhvVrJtrmm6rR/snRsTOw2i7bv/UiPhBREx2Lb93PbvgdRFxaT3NxdRjx9YzUV5bf9loavnGiFiy/arfviLi6/W/yztnYdTgGPKj5zfAEaP2H3vqCxh9egnwD5m5ckD761u93xOpPjffuB7P4zNM/+3alwC3ZeZDgHdQzeJIRNwfeCPw+Hq7N47oPDsD/71l5hMHuT9ty5AfPVuorun4yu4Hus/Ep84UI2K8nqzoYxHx/Yg4LSKOiYjLopoT/sEduzk4Iv6nXu/wevsdo5on/Fv1JE4v7djvmoj4MNWXSbrrObre//qImAqsk6m+AHZmRLyta/2t9jdDu3tFxFeimi97/dQkTNO1N9UPEfGmiLiU6sstDwDW1G3tWPfb+nrbXv165qD6JTO/mfWkY106Z5w8H3hq/Y3opwFfzMxbM/M24IvAoV013jciLoqIv+1aPu3zi4gVEXFlRHyjfg7r6+XHRcR/dmz/2YgYr2+/OyLWRnXdg3/pWGdjRJwcEV8FnhsRh9T7vTwiPh7VvEzdfToREe+of48bIuKxEXFh/UrlXzt/d9NsO6fjQjNr7ELeWpB3Ad+JiLfOYZtHAX9KNWXpDcB7M/NxUV2k5Hiqs1uo5qP+c6oJm9ZExEOAFwK3Z+ZjI+LewNci4gv1+o8DHpHVVKx3i4gHUJ2JrgBuo5qx8jmZ+aaIeArV/N1rp6nz7v1FNbvldO0eAVycmadGdea48wztfRLYhWqO7ZPr2l4MrMzMn0bECmBZZj6ifux+PfpvIP0yi7tnPczqQve3U33jdbbZEBdTTaV9Tmae07XPR/d4fmcDx2fmJd1/bGfw+sy8te7zL0XEIzPzO/Vjv87MJ0X1CvNC4ODM3BwRr6WaKOtN0+zvzsw8qD4GP0X1u7sVuD4i3pH1HCzTeAl9Hhd9Pq/fa4b8CMrMOyLiHOAVwK/63OxbU2ePEXE9MBVGVwGdwyYfy2qipmsj4gbgT6jmpnlk3PMqYXeqOTHuBC7rEWSPBSYy8yd1m+dRXfzgk7PU2bm/Xu1+C3h/VBO7fTIzr6j/cPRq7y6qCeCmcwOwb0S8E/hcR790G1S/zKTXrIezzYb4KeCtmXneNOtt8/wiYnfgfpl5Sb3OuVSTt83mefUf3kVUF7vZj2oqALhnkqwn1Mu/Vr0IYSfgGz32NzWX1FXA1R3H5w1UE4P1Cvm+j4s+ntPvPUN+dP0H1dwUZ3cs20I9xFa/zN+p47HfdNz+Xcf937H177l7HoupkDk+My/ufKB+Gb+5R33TBVM/Ovc3bbt12wcBhwHn1meid8ywz19n5l3TPZCZt0XEo6iGRF5GdQGGF0+36jT359MvM5ma9fCmqGYY3J3qzPYmYLxjvb2BiY77XwOeHhEfzq55SHo8v1dN83ym3H0M1e5TP6d9gNcAj633+YGpx2pTzzeohpaO7uP5dh6D3cfnTNnT93ExzSsbdXFMfkRl5q3Ax6heuk7ZSPWSF6rx3XvNY9fPjYgdohqn35dqYqaLgb+vz5CIiIdFNfvkTC4F/jwiltQvnY8GLpllm27TthsRDwJuycz3UM3gecAc2/sF1aUWqYcXdsjMC4A31PuazqD6ZSadM04eCXy5Du2LgUMiYo+o3nA9pF425WSqs97/6t7hdM8vM38O3B4RT6pXO6Zjk43Ao+vn+kDueYN4N6ogvz0iltL7zP+bwIH1cBYRsXNEDHpK4bkcF5qFZ/Kj7XTg5R333wN8KiIuo5qxcD5nk9dQheNSqhnufh0R76Uak768foXwE2a5tF9mboqI1wFrqM68/jsz5zr1cK92x4F/jIjfApPAC+fY3lnA5yNiE9V7EWdHxNQJzet6bDOQfgGo30v5a6r3Em6ien/kFKpgOjcirqM6gz8Kqj/oEfFmquEIgDfVf+Q7nUg1VPHWzPynjuXLejy/F9Xr/5Kt/2B8DbiRaghlPdWrRTLzyoj4NtUMhzfU620jM38SEccBH6nHywH+mWo22EHp+7gYYJvFchZK/d6rhyY+m5nnD7uWJkT1efzPTr05q98vDtdIUsE8k5ekgnkmL0kFM+QlqWCGvCQVzJCXpIIZ8pJUsP8HjznZg2f7H4sAAAAASUVORK5CYII=", "text/plain": [ "
" ] @@ -1158,11 +1158,11 @@ "source": [ "#Code task 1#\n", "#Create a new dataframe, `state_summary_scale` from `state_summary` whilst setting the index to 'state'\n", - "state_summary_scale = state_summary.set_index(___)\n", + "state_summary_scale = state_summary.set_index('state')\n", "#Save the state labels (using the index attribute of `state_summary_scale`) into the variable 'state_summary_index'\n", - "state_summary_index = state_summary_scale.___\n", + "state_summary_index = state_summary_scale.index\n", "#Save the column names (using the `columns` attribute) of `state_summary_scale` into the variable 'state_summary_columns'\n", - "state_summary_columns = state_summary_scale.___\n", + "state_summary_columns = state_summary_scale.tolist()\n", "state_summary_scale.head()" ] }, @@ -1199,7 +1199,7 @@ "source": [ "#Code task 2#\n", "#Create a new dataframe from `state_summary_scale` using the column names we saved in `state_summary_columns`\n", - "state_summary_scaled_df = pd.DataFrame(___, columns=___)\n", + "state_summary_scaled_df = pd.DataFrame(states, columns='state_summary_columns')\n", "state_summary_scaled_df.head()" ] }, @@ -1232,7 +1232,7 @@ "source": [ "#Code task 3#\n", "#Call `state_summary_scaled_df`'s `mean()` method\n", - "state_summary_scaled_df.___" + "state_summary_scaled_df.mean()" ] }, { @@ -1257,7 +1257,7 @@ "source": [ "#Code task 4#\n", "#Call `state_summary_scaled_df`'s `std()` method\n", - "state_summary_scaled_df.___" + "state_summary_scaled_df.std()" ] }, { @@ -1277,7 +1277,7 @@ "source": [ "#Code task 5#\n", "#Repeat the previous call to `std()` but pass in ddof=0 \n", - "state_summary_scaled_df.___(___)" + "state_summary_scaled_df.std(ddof=0)" ] }, { @@ -1330,10 +1330,10 @@ "#title to 'Cumulative variance ratio explained by PCA components for state/resort summary statistics'\n", "#Hint: remember the handy ';' at the end of the last plot call to suppress that untidy output\n", "plt.subplots(figsize=(10, 6))\n", - "plt.plot(state_pca.explained_variance_ratio_.___)\n", - "plt.xlabel(___)\n", - "plt.ylabel(___)\n", - "plt.title(___);" + "plt.plot(state_pca.explained_variance_ratio_.cumsum())\n", + "plt.xlabel('Component #')\n", + "plt.ylabel('Cumulative ratio variance')\n", + "plt.title('Cumulative variance ratio explained by PCA components for state/resort summary statistics');" ] }, { @@ -1365,7 +1365,7 @@ "source": [ "#Code task 7#\n", "#Call `state_pca`'s `transform()` method, passing in `state_summary_scale` as its argument\n", - "state_pca_x = state_pca.___(___)" + "state_pca_x = state_pca.transform(`state_summary_scale`)" ] }, { @@ -1411,7 +1411,7 @@ "outputs": [ { "data": { - "image/png": "\n", + "image/png": "", "text/plain": [ "
" ] @@ -1458,7 +1458,7 @@ "source": [ "#Code task 8#\n", "#Calculate the average 'AdultWeekend' ticket price by state\n", - "state_avg_price = ski_data.groupby(___)[___].___\n", + "state_avg_price = ski_data.groupby('AdultWeekend')['price'].mean()\n", "state_avg_price.head()" ] }, @@ -1469,7 +1469,7 @@ "outputs": [ { "data": { - "image/png": "\n", + "image/png": "", "text/plain": [ "
" ] @@ -1512,7 +1512,7 @@ "#Remember the first component was given by state_pca_x[:, 0],\n", "#and the second by state_pca_x[:, 1]\n", "#Call these 'PC1' and 'PC2', respectively and set the dataframe index to `state_summary_index`\n", - "pca_df = pd.DataFrame({'PC1': ___, 'PC2': ___}, index=__)\n", + "pca_df = pd.DataFrame({'PC1': [:, 0], 'PC2': [:, 1]}, index=`state_summary_index`)\n", "pca_df.head()" ] }, @@ -1644,7 +1644,7 @@ "#Code task 10#\n", "#Use pd.concat to concatenate `pca_df` and `state_avg_price` along axis 1\n", "# remember, pd.concat will align on index\n", - "pca_df = ___([___, ___], axis=___)\n", + "pca_df = pd.concat([`pca_df`, `state_avg_price`], axis=1)\n", "pca_df.head()" ] }, @@ -1901,7 +1901,7 @@ "outputs": [ { "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnQAAAHwCAYAAAAvoPKcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeXxU1f3/8dcnCwQIO4hA0AguZUkIEARkSQIF1CoqoIjKokWr1aIUUKlfUbtpK4Kl/KxWsaBFUEGFalutRZBVCRIBgbIZIYhIkC1AgCTn98edjAlJYBJIJmPez8djHs4999xzPnMzMR/Ouedec84hIiIiIqErLNgBiIiIiMjZUUInIiIiEuKU0ImIiIiEOCV0IiIiIiFOCZ2IiIhIiFNCJyIiIhLilNBJSDKzkWa2tIR9t5rZBxUdk0goMbOeZva/YMdR3sws1sycmUUEULfczomZzTCz35ZH2yKghE4qMTPrYWbLzeygmX1nZsvMrPOZjnPOzXLO9StDf4vMbFQp6j9uZn8vbT9SVIE/ulm+V7qZPVxgv5nZaDNbb2ZHzCzDzN40s7hT2nnc187lpez/iwJ9Z5lZjpn9w7ev5yn7snx9DCqhrUlmtsXMDpvZJjMbXmBfXTN738wOmNksMwsvsO9FM7uhNHGfDefcEufcZRXVXyjQOZFQpoROKiUzqwO8C/wZaAA0B54AjgczLglMIKMhJajnnIsGhgITzexKX/mfgPuB0Xjfh0uBd4CfFOjTgGHAd8CI0nTqnGvrnIv29V0b2AG86du3JH+fb/81QBbw7xKaOwJcC9T1xfEnM7vCt+9nwBqgCRAL3OCLvRvQ1Dn3dmniLquz+PmISCWlhE4qq0sBnHOznXO5zrljzrkPnHNri6tsZk+b2VLfCMjppmOjzOzvZrbPN0qyysyamNnvgJ7ANN8IzDRf/T+Z2U4zO2Rmq82sp6/8SuBXwBBf/c995XXNbLqZ7TazXWb22/xRGDO72MwW+0YcM83s9dLE6NuXbmY/LlDXP0pYYJTrdl/M+83sbjPrbGZrfW1NK3DsSN+o5xTfvu1mdoWvfKeZfWtmIwrU/4mZrfGdi51m9niBffl9/9TMdgALzew9M/vFKZ9trZldX/yP/HvOuRXAF0A7M7sEuBcY6pxb6Jw77pw76huJfarAYT2BZniJ381mVu1M/ZSgF3AeMK+E/SOAuc65IyXE/phzbpNzLs859wmwBOjm230R8JFz7rivvKXv+zHFF3eJzOxhM5t7StmfzGyq7/3tZrbRNzK43cx+VqBesnmjmg+Z2TfA3/LLTml/m+/4DQVHC/N/p3yjj/vN7Eszu6rA/gZm9jcz+9q3/50C+64xszTfd2y5mcWf5jP+yMz+Y96I/P/M7CZfeStfWUffdjPf71Cyb3uRmT1pZp/6fr/mm1mDEvo443kqsJ1uZuN839uDZva6mUUF8tnMrIOZfebr53XAf5xIuXDO6aVXpXsBdYB9wEzgKqD+KftHAkvx/lHyIvA+ULPgvhLa/RnwD6AmEA50Aur49i0CRp1S/zagIRABjAW+AaJ8+x4H/n5K/XeAF4BaeEnBp8DPfPtmA4/4Yo4CepQhxnTgxwXq+mPAG/FxwPO+9vsB2b6YzsMb5fwWSCpwnnKA2339/BZvZOr/AdV9xx8Gon31k4E4X/zxwB7g+lP6fsX32WsANwGfFIi1ve9nWq2Yz5x/fARgQHfgKNAHuBv4KoDvzHTgDSDS18/AMn73XgZmlLCvpu+cJAfYVg1gN3Clb/te4Glf+TK8EcYxwGMBtHWh75zkfxfCfW139W3/BGjlO39JvrodC/zscoA/+H62NXxlGQXavxEvIQ4DhuCNNDYt8F05Cdzp6/ce4GvAfPvfA14H6vvOf/53rKPvO9fFd9wIvO9w9WI+Xy1gp+/7GOE7NhNo69t/J7DR9zN4H5hU4NhFwC6gna+deRT9vYgI8DwVPCfpeL/DzfBGhjcCd5/pswHVgK98P9tIYLDv/P22vP/fqVfVfQU9AL30KukFtAZmABm+P0YLgCa+fSOBT3x/ROZRIEng9AndHcByIL6YfYs4JaErps5+oL3v/eMUSOjwptGOAzUKlA3FG5EBL9n5KxBzhj5OF2M6Z07omhfYvw8YUmB7HvBAgfO0pcC+ON/xTU45PqGEOJ8FppzSd8sC+6vjTX9e4tueBDxXQlv5xx/wneONwGjfvkeAlWc4ZzWBQ3yfYL4AzC/Ddy6/neQS9g8DvsSXyATQ3ky8qdn8xCfK9x1YCzwFxACf4U3P/gX4mNP80cf7R8xw3/u+wLbT1H0HuN/3Phk4ge8fIwXKMk5zfBpwXYHvytZTzpMDzgeaAnmc8o8uX72/AL85pex/+BK+U8qHAEtOKXuBAsku3v8D1vnOX/UC5YuApwpst/F93nBOSegCOE+nJnS3Fdj+I/D8mT4b3iivP+H17Vt+up+tXnqd7UtTrlJpOec2OudGOudi8P7l3Qwvich3MXAd8IRz7kSAzb6K96/7Ob7poT+aWWRJlc1srG965qCZHcD7w9uohOoX4v1rfLdvCuYA3h+k83z7H8QbFfjUvIvw7zgXMRZjT4H3x4rZjj5NXZxzxdY3sy5m9pGZ7TWzg3gjZ6eei535b5w3rfgGcJuZheElt6+eIfZGzrn6zrnWzrmpvrJ9eEnD6dyAl/T/07c9C7jKzBqf4bhTDcRLQheXsH8E8Ipzzp2pITN7Gu97e1N+fedctnPuLudcvHPuYbyp1l8Bt+IlH0lAF/v+2sFTvYZ3HgFu8W3n93eVma30TU0eAK6m8M9nr3Mu+zTxDi8wfXjAF3vB47/Jf+OcO+p7Gw20AL5zzu0vptkLgbH5bfrabYH3u1xc3S6n1L0VL2nM96Ivrj/7vl8F7Szw/iu838Uiv6sBnKdTfVPg/VG+//053WdrBuw65Xvy1Wn6EDlrSugkJDjnNuGN1rUrULwRb3rmX2YW0Mo059xJ59wTzrk2wBV4F7jnr0Is9EfavOvlHsKbOqzvnKsHHMRLyorUx/uDchwvKanne9VxzrX19f2Nc+5O51wzvGnV58zs4lLGeARvdCTf+aceX45ewxshaeGcq4s3tWun1Dn1nMzE+6PcBzjqvGvjSuu/QIyZJZ6mzgi8P7Q7fNeIvYn3B33oaY4pqZ1iEzYza4E3gvPKmRoxsyfwLhXo55w7VEKdK/FGcP6NNzqa6us3FW9KuzhvAslmFoOXxL7ma6s63ujrJLwR1np4yW3Bn0+JSaiZXYiXLN0HNPQdv56iP9/i7AQamFm9Evb9rsDvQz3nXE3n3OwS6i4+pW60c+4eX4zReP+gmw48Xsw1ci0KvL8Ab4oz85TPGch5CtTpPttuoLmZFWz3gjL0IRIwJXRSKfkujh7r+8OV/8d0KLCyYD3f/zx/BXxoZq0CaDfFzOLMuxD9EN7/9HN9u/cALQtUr4036rMXiDCziXjX9lGgfqxv9Ann3G7gA+AZM6tjZmG+i7mTfH3fmP958KYVXYG+A40xDe+C/0hfgjP4TJ/5HKqNNxKTbd5tQW450wG+BC4PeIYzj86V1MYW4Dlgtu+i9WrmLRy52bwL+ZvjJYzXAAm+V3u868VGQKFFG7El9eP72aTgJaHFGQYsd85tO128ZjYB79z0dc7tK6FOFN6U6xhf0Zd4iVo1vOsHtxd3nHNuL9704t+AL51zG327quFNce8FcsxbsFCaW/fUwvs+7vXFdzuF//FUIt/3/l94/0Cp7/tu9vLtfhG42ze6a2ZWy7zFNbWLaepd4FIzG+ZrI9K8BT2tffv/BKx2zo3Cu2bv+VOOv83M2phZTeDXeAtXTv39OtvzVNDpPtsKvP93jDazCDMbCJTqVjoipaWETiqrw3gXG39iZkfwErn1eAsTCnHOzcT7H/jC0/3B9jkfmIuXKG3Em1rLv5fcn4DB5q3Sm4o37fkvYDPedEk2had13vT9d5+ZfeZ7Pxzvj8YGvKRtLt9PF3b2fZ4svJGu+51zX5YyxkfxLujej3cbl9eKOb68/Bz4tZkdBibiTacG4hW8EaizuWffaGAa3oKNA8A2vBGqf+AlWmnOWwX9Tf4LmArEm1k7vNGbr/AunC/JMGDFaRK24RST7Jl3I+svChT9Hm80Zot9f9+6X51y2K+AWc65/O/TC3jTfnvxrhk93e1LXgN+TIGfvXPuMN45egPvu3EL3ncsIM65DXhJ9wq8f6jE4S3aCNQwvH94bMJbKPCAr91UvMUM03xxbcW7Hq+4GA7jJVc3411/9g2+RRxmdh1wJd40P8AvgY5mdmuBJl7FG8X/Bu9axdEl9FHm83RKWyV+Nt8lIAN92/vxrg98qyz9iAQq/0JdEZFyYd6Nde9yzvUIYgz/h3cN2QvBikHKj5ktwlsc9FKwYxEJFt1cUkTKjW/66+d4U6ZB45zTI5dE5AdNU64iUi7MrD/eFOIeKnZqWESkytGUq4iIiEiI0widiIiISIhTQiciIiIS4kJqUUSjRo1cbGxssMMQEREROaPVq1dnOudK+8SaMgmphC42NpbU1NRghyEiIiJyRmZWYY9805SriIiISIhTQiciIiIS4pTQiYiIiIS4kLqGTkREREp28uRJMjIyyM7ODnYoVUpUVBQxMTFERkYGLQYldCIiIj8QGRkZ1K5dm9jYWMws2OFUCc459u3bR0ZGBhdddFHQ4tCUq4iIyA9EdnY2DRs2VDJXgcyMhg0bBn1UNKgjdGaWDhwGcoEc51xiMOMREREJdUrmKl5lOOeVYco1xTmXGewgREREREKVplxFRESqqNw8x+LNe3llRTqLN+8lN8+ddZvHjh0jKSmJ3Nxcf9mhQ4do3rw59913n79s2rRpXHzxxZgZmZklj+vs2LGDfv360bp1a9q0aUN6evpp+58xYwaNGzcmISGBhIQEXnrpJQC++uorOnXqREJCAm3btuX5558v9viPP/6Yjh07EhERwdy5c/3l27ZtIyEhgejo6EBOQ4UL9gidAz4wMwe84Jz7a5DjERERqRK2783ilpc+4XD2SXJyHRHhRu2oSF4b1YWWjcuetLz88ssMHDiQ8PBwf9mjjz5KUlJSoXrdu3fnmmuuITk5+bTtDR8+nEceeYS+ffuSlZVFWNiZx6KGDBnCtGnTCpU1bdqU5cuXU716dbKysmjXrh0DBgygWbNmhepdcMEFzJgxg0mTJhUqb9WqFWlpaZU2oQv2CF1351xH4CrgXjPrdWoFM7vLzFLNLHXv3r0VH6GIiARVcnIy77//fqGyZ599lp///OdBiuj7GI4ePRrUGMoqN89xy0ufsOdgNkeO53I8J48jx3PZcyibW1/65KxG6mbNmsV1113n3169ejV79uyhX79+hep16NCBMz2ffcOGDeTk5NC3b18AoqOjqVmzZpniqlatGtWrVwfg+PHj5OXlFVsvNjaW+Pj4gBLHyiSo0Trnvvb991vgbeDyYur81TmX6JxLbNy4Qp5vKyIilcjQoUOZM2dOobI5c+YwdOjQMx5bcNrvXAvlhG7p1kwOZ5/k1LTNOTiUfZKlW8t2afuJEyfYvn27P1HLy8tj7NixPP3002Vqb/PmzdSrV4+BAwfSoUMHxo8fH9DPdN68ecTHxzN48GB27tzpL9+5cyfx8fG0aNGChx56qMjoXCgLWkJnZrXMrHb+e6AfsD5Y8YiISOXhcnI4vHAhmc8/T+/sbN6dP5/jx48DkJ6eztdff83Ro0fp1q0bHTt25MYbbyQrKwvwRlh+/etf06NHD958801iY2P51a9+Rbdu3UhMTOSzzz6jf//+tGrVyn8dlXOO8ePH065dO+Li4nj99dcBWLRoEcnJyQwePJgf/ehH3HrrrTjnmDp1Kl9//TUpKSmkpKQE5ySdha/2HSEnt/hRuNxcx459R8rUbmZmJvXq1fNvP/fcc1x99dW0aNGiTO3l5OSwZMkSJk2axKpVq9i+fTszZsw47THXXnst6enprF27lh//+MeMGDHCv69FixasXbuWrVu3MnPmTPbs2VOmuCqjYI7QNQGWmtnnwKfAe865fwcxHhERqQSyFi9mS4+efD3+QfZO/TM5L75Em5xcZvbuzck9e5gzZw59+vThd7/7HR9++CGfffYZiYmJTJ482d9GVFQUS5cu5eabbwa8P+QrVqygZ8+ejBw5krlz57Jy5UomTpwIwFtvvUVaWhqff/45H374IePHj2f37t0ArFmzhmeffZYNGzawfft2li1bxujRo2nWrBkfffQRH330UcWfpLN0YcNaRIQXf6uN8HDjgoa1ytRujRo1Ct2PbcWKFUybNo3Y2FjGjRvHK6+8wsMPPxxwezExMXTo0IGWLVsSERHB9ddfz2effXbaYxo2bOifWr3zzjtZvXp1kTrNmjWjbdu2LFmyJOBYKrugLYpwzm0H2gerfxERqXyOLF9Oxv0P4ArepDUvj6tr1WL++vWk3DSEOd/tY+CNN/KPf/yD7t27A95UX7du3fyHDBkypFC7AwYMACAuLo6srCxq165N7dq1iYqK4sCBAyxdupShQ4cSHh5OkyZNSEpKYtWqVdSpU4fLL7+cmJgYABISEkhPT6dHjx7lfCbKV4+LG1E7KpKjJ3JxBQbqzKBOVCQ9Lm5Upnbr169Pbm4u2dnZREVFMWvWLP++GTNmkJqaylNPPRVwe507d2b//v3s3buXxo0bs3DhQhITvVvWTpgwgcsvv5wbbrih0DG7d++madOmACxYsIDWrVsD3lM0GjZsSI0aNdi/fz/Lli3jl7/8ZZk+Z2UUWlf8iYjID5Zzjt2PPV44mfPpU7s2K48cYe3XX5P17bd06NCBvn37kpaWRlpaGhs2bGD69On++rVqFR5hyh+xCQsL87/P387JycG5khcBFKwfHh5OTk5OmT9jZREeZrw2qgtN6kRRq3o4URFh1Koezvl1opg1qgvhYWW/UW6/fv1YunTpGetNnTqVmJgYMjIyiI+PZ9SoUQCkpqb634eHhzNp0iT69OlDXFwczjnuvPNOANatW8f5559fbLtt27alffv2TJ061T9Fu3HjRrp06UL79u1JSkpi3LhxxMXFATBx4kQWLFgAwKpVq4iJieHNN9/kZz/7GW3bti3zuahIwb5tiYiICADZ678gZ9++YvfVCgujc82aPLJzB1c1bkzXrl2599572bp1KxdffDFHjx4lIyODSy+9tEx99+rVixdeeIERI0bw3Xff8fHHH/P000+zadOmEo+pXbs2hw8fplGjso1mBVvLxtEse6g3S7dmsmPfES5oWIseFzc6q2QO4L777mPy5Mn8+Mc/LlQ+cuRIRo4c6d8ePXo0o0ePLnJ8YmKi/95xAH379mXt2rVF6p08ebLQqGy+J598kieffLJIeUntAPz617/2v+/cuTMZGRnF1qvMNEInIiKVwsmMnd6cXwl+UrsO/zt+nCsjImnUoAEzZsxg6NChxMfH07Vr19MmX2dyww03EB8fT/v27enduzd//OMfix39Keiuu+7iqquuCslFEfnCw4ykSxszrFssSZc2PutkDrzbkaSkpJTrCmOgyK1sylv+jYWbNGlSof0Gyk43zFzZJCYmutTU1GCHISIi5SBr8WJ2jR1Hnm+1aokiI/nR2s8rxfMzK5uNGzf6rxmTilXcuTez1RX1nHqN0ImISKVQ8/LLcWca1QkLo3ZKipI5kVMooRMRkUohrEYN6t96KxYVVWIdq1aNhnfdVYFRiYQGJXQiIlJpnPfA/dTq2ROrUaPwjogILCqKpr/5NTXahcaqQ5GKpFWuIiJSaVhEBDFT/8SR5cv57uW/kf2//2GRkdTu04cGw26j2oUXBjtEkUpJI3QiIlKpmBnR3btzwfSXuHTpEi75aCHn/98jSubONedg+yKYdRNMS/T+u33RWTd77NgxkpKS/Ktcw8PDSUhIICEhwX+DZ4Cf/vSntG/f3v/M1awSFsNceeWV1KtXj2uuuSbgGN544w3atGlD27ZtueWWW/zlM2fO5JJLLuGSSy5h5syZp21j7ty5mBn5izHzV7lGR0cHHEdF0gidiIhIVeMc/OtBWPN3OHnUK8vcAulLoMMwuPqPZW765ZdfZuDAgYSHhwPe48DS0tKK1JsyZQp16tQB4Je//CXTpk0r9rFg48eP5+jRo7zwwgsB9b9lyxaefPJJli1bRv369fn2228B+O6773jiiSdITU3FzOjUqRMDBgygfv36Rdo4fPgwU6dOpUuXLv6yVq1akZaWVmkTOo3QiYiIVDVfLi6czOU7eRTWvHpWI3WzZs3iuuuuO2O9/GTOOcexY8dKXLncp08fateuHXD/L774Ivfee68/UTvvvPMA7751ffv2pUGDBtSvX5++ffvy738X/wj5Rx99lAcffJCo0yzQqWyU0ImIiFQ1K54rmszlO3nU218GJ06cYPv27cTGxvrLsrOzSUxMpGvXrrzzzjuF6t9+++2cf/75bNq0iV/84hdl6vNUmzdvZvPmzXTv3p2uXbv6k7Zdu3bRokULf72YmBh27dpV5Pg1a9awc+fOUk3xVgaachUREalq9m8/w/4vy9RsZmYm9erVK1S2Y8cOmjVrxvbt2+nduzdxcXG0atUKgL/97W/k5ubyi1/8gtdff53bb7+9TP0WlJOTw5YtW1i0aBEZGRn07NmT9evXF/u83lNHBfPy8hgzZoz/+a+hRCN0IiIiVU39lqff3+CiMjVbo0YNsrOzC5U1a9YMgJYtW5KcnMyaNWsK7Q8PD2fIkCHMmzevTH2eKiYmhuuuu47IyEguuugiLrvsMrZs2UJMTAw7d+7018vIyPDHlu/w4cOsX7+e5ORkYmNjWblyJQMGDCAUnlKlhE5ERKSq6fZziKxZ/L7ImtD152Vqtn79+uTm5vqTuv3793P8+HHAG71btmwZbdq0wTnH1q1bAe8aun/84x/86Ec/KlVfEyZM4O233y5Sfv311/PRRx/5+9y8eTMtW7akf//+fPDBB+zfv5/9+/fzwQcf0L9//0LH1q1bl8zMTNLT00lPT6dr164sWLCAxMQKeXrXWVFCJyIiUtVclOStZj01qYusCR2HQcvkMjfdr18/li5dCnjPN01MTKR9+/akpKTw8MMP+xO6ESNGEBcXR1xcHLt372bixIkApKamMmrUKH97PXv25MYbb+S///0vMTExvP/++wCsW7eO888/v0j//fv3p2HDhrRp04aUlBSefvppGjZsSIMGDXj00Ufp3LkznTt3ZuLEiTRo0ACAiRMnsmDBgjJ/5srAiptTrqwSExNdKAx7ioiIBENxD4g/re2LvAUQ+7+E+hd5I3ctk88qhjVr1jB58mReffXVs2rnTPr37+9P7ipSdHR0sffMK+7cm9lq51yFDO9pUYSIiEhV1TL5rBO4U3Xo0IGUlBRyc3P996IrDxWdzG3bto1BgwbRpEmTCu03UEroRERE5Jy64447gh3COZd/Y+HKStfQiYiIiIQ4JXQiIiIiIU4JnYiIiEiIU0InIiJShWWdyGLn4Z1knSi6clNChxI6ERGRKijjcAajF44m6fUkBi8YTNLrSdy/8H4yDmecVbvHjh0jKSmJ3NxcwHv0V79+/WjdujVt2rQhPT0dgC+//JIuXbpwySWXMGTIEE6cOFFim4cOHaJ58+bcd999Z+x/zJgxJCQkkJCQwKWXXlroUWQPPvggbdu2pXXr1owePbrYx4Hlmzt3Lmbmf0rEtm3bSEhIIDo6OpDTUOGU0ImIiFQxGYczGPLuEBZnLOZE3gmO5hzlRN4JFmUsYsi7Q84qqXv55ZcZOHCg/5Ylw4cPZ/z48WzcuJFPP/2U8847D4CHHnqIMWPGsGXLFurXr8/06dNLbPPRRx8lKSkpoP6nTJlCWloaaWlp/OIXv2DgwIEALF++nGXLlrF27VrWr1/PqlWrWLx4cbFtHD58mKlTp9KlSxd/mVa5ioiISKXyx1V/JOtkFnkur1B5nssj62QWk1InlbntWbNmcd111wGwYcMGcnJy6Nu3L+DdlLdmzZo451i4cCGDBw8GYMSIEbzzzjvFtrd69Wr27NlDv379Sh3L7NmzGTp0KABmRnZ2NidOnOD48eOcPHmyxHvKPfroozz44INERUWVus9gUUInIiJShWSdyGLZrmVFkrl8eS6PJRlLynRN3YkTJ9i+fTuxsbEAbN68mXr16jFw4EA6dOjA+PHjyc3NZd++fdSrV4+ICO92uDExMezatatoLHl5jB07lqeffrrUsXz11Vd8+eWX9O7dG4Bu3bqRkpJC06ZNadq0Kf379y/2qRpr1qxh586dXHPNNaXuM5iU0ImIiFQh+4/vJyLs9M8VCA8LZ//x/aVuOzMzs9A1azk5OSxZsoRJkyaxatUqtm/fzowZM4q9ds3MipQ999xzXH311bRo0aLUscyZM4fBgwf7p363bt3Kxo0bycjIYNeuXSxcuJCPP/640DF5eXmMGTOGZ555ptT9BZsSOhERkSqkfvX65OTlnLZObl4u9avXL3XbNWrUIDs7278dExNDhw4daNmyJREREVx//fV89tlnNGrUiAMHDpCT48WRkZFBs2bNirS3YsUKpk2bRmxsLOPGjeOVV17h4YcfDiiWOXPm+KdbAd5++226du1KdHQ00dHRXHXVVaxcubLQMYcPH2b9+vUkJycTGxvLypUrGTBgAKHwHHkldCIiIlVIdLVoejTvQZgVnwKEWRg9Y3oSXa30qznr169Pbm6uP6nr3Lkz+/fvZ+/evQAsXLiQNm3aYGakpKQwd+5cAGbOnOm/7q6gWbNmsWPHDtLT05k0aRLDhw/nqaeeAmDChAm8/fbbxcbxv//9j/3799OtWzd/2QUXXMDixYvJycnh5MmTLF68uMiUa926dcnMzCQ9PZ309HS6du3KggULSExMLPW5qGhK6ERERKqY8Z3HEx0ZXSSpC7MwakfWZlziuDK33a9fP5YuXQpAeHg4kyZNok+fPsTFxeGc48477wTgD3/4A5MnT+biiy9m3759/PSnPwUgNTWVUaNGnbGfdevWcf755xe7b/bs2dx8882FpnEHDx5Mq1atiIuLo3379rRv355rr70WgIkTJ6NcNkgAACAASURBVLJgwYIyf+bKwE53D5bKJjEx0YXCsKeIiEgwbNy4sdgL/YuTcTiDp1c9zdJdSwkPCyc3L5eeMT0ZlziOmNoxZY5hzZo1TJ48mVdffbXMbQSif//+vP/+++XaR3Gio6PJyiq6YKS4c29mq51zFTK8d/qrIkVEROQHKaZ2DH/q/SeyTmSx//h+6levX6Zp1lN16NCBlJQUcnNz/QsSykNFJ3Pbtm1j0KBBJd7qJNiU0ImIiFRh0dWiz0kiV9Add9xxTturDHRjYREREREpV0roREREREKcEjoRERGREKdr6ERERKqo7M2bOTjvLU7u3k1k06bUHTSQqEsvDXZYUgYaoRMREali8rKz2Xn3PaTfNITv/v53Dn/wAd/9/e+k3zSEnXffQ16Bpz2U1rFjx0hKSiI3N5ePPvqIhIQE/ysqKop33nkHgP/+97907NiRhIQEevTowdatW0ts89ChQzRv3pz77rvvjP1/9dVX9OnTh/j4eJKTk8nIyPCXd+rUiYSEBNq2bcvzzz9f7PHHjx9nyJAhXHzxxXTp0oX09HTAW+WakJBAdPS5XUByriihExERqWJ2PTCGIytW4LKzITfXK8zNxWVnc2TFCnaNGVPmtl9++WUGDhxIeHg4KSkppKWlkZaWxsKFC6lZsyb9+vUD4J577mHWrFmkpaVxyy238Nvf/rbENh999FGSkpIC6n/cuHEMHz6ctWvXMnHiRCZMmABA06ZNWb58OWlpaXzyySc89dRTfP3110WOnz59OvXr12fr1q2MGTOGhx56CNAqVxEREalEsjdv5sjKlbjjx4vd744f58iKlRzfsqVM7c+aNavYx3jNnTuXq666ipo1awJgZhw6dAiAgwcPFvssV4DVq1ezZ88efyJ4Jhs2bKBPnz4ApKSkMH/+fACqVatG9erVAW8ULi8vr9jj58+fz4gRIwDv6RL//e9/CYWHMCihExERqUIOznsLd/Lkaeu4kyc5MG9eqds+ceIE27dvJzY2tsi+OXPmMHToUP/2Sy+9xNVXX01MTAyvvvoqDz/8cJFj8vLyGDt2LE8//XTAMbRv3555vtjffvttDh8+zL59+wDYuXMn8fHxtGjRgoceeqjYJHLXrl20aNECgIiICOrWres/vjJTQiciIlKFnNy9+/tp1pLk5nJy9zelbjszM5N69eoVKd+9ezfr1q2jf//+/rIpU6bwz3/+k4yMDG6//XZ++ctfFjnuueee4+qrr/YnWIGYNGkSixcvpkOHDixevJjmzZsTEeGtAW3RogVr165l69atzJw5kz179hQ5vrjRuILPhK2stMpVRESkCols2hTCw0+f1IWHe/VKqUaNGmQXs6DijTfe4IYbbiAyMhKAvXv38vnnn9OlSxcAhgwZwpVXXlnkuBUrVrBkyRKee+45srKyOHHiBNHR0Tz11FMlxtCsWTPeeustALKyspg3bx5169YtUqdt27YsWbKEwYMHF9oXExPDzp07iYmJIScnh4MHD9KgQYPSnYgg0AidiIhIFVJ30EDMl1iVxCIjqTdoYKnbrl+/Prm5uUWSutmzZxeabq1fvz4HDx5k8+bNAPznP/8p8mB78K7H27FjB+np6UyaNInhw4f7k7kJEybw9ttvFzkmMzPTf33ck08+6X8MWUZGBseOHQNg//79LFu2jMsuu6zI8QMGDGDmzJmAd91f7969Q2KETgmdiIhIFRJ16aXU6toV8y0QOJVVr06tbl2pfsklZWq/X79+LF261L+dnp7Ozp07C61SjYiI4MUXX2TQoEG0b9+eV1991X+dXGpqKqNGjTpjP+vWreP8888vUr5o0SIuu+wyLr30Uvbs2cMjjzwCwMaNG+nSpQvt27cnKSmJcePGERcXB8DEiRNZsGABAD/96U/Zt28fF198MZMnTz7taGBlYqGwciNfYmKiS01NDXYYIiIildLGjRuLHek6VV52NrvGjOHIipXeAoncXAgPxyIjqdWtK82nTCEsKqpMMaxZs4bJkyfz6quvlun4QPXv35/333+/XPsoTnR0NFlZWUXKizv3ZrbaOZdYEXHpGjoREZEqJiwqihZ/+cv3T4r45hsimzal3qCBZR6Zy9ehQwdSUlLIzc0lPDz8HEVcVEUnc9u2bWPQoEE0adKkQvsNlBI6ERGRKirq0kuJmlD0diFnK/+6tR8S3VhYRERERMqVEjoRERGREKeETkRERCTEKaETERGp4s7lHS/MjLFjx/q3J02axOOPP16oTvv27Qvdl07OnhZFiIiIVEEnsnNY8/4O1n+8i+wjJ4mqFUm7Xs3p0P8CqkWVPT2oXr06b731FhMmTKBRo0ZF9m/cuJG8vDw+/vhjjhw5Qq1atc7mY4hP0EfozCzczNaY2bvBjkVERKQqOJGdw9w/pLLmPzvIPnISgOwjJ1nznx3M/UMqJ7Jzytx2REQEd911F1OmTCl2/2uvvcawYcPo16+f/2a+cvaCntAB9wMbgx2EiIhIVbHm/R0c2ptNbk5eofLcnDwO7c1mzQc7zqr9e++9l1mzZnHw4MEi+15//XWGDBnC0KFDmT179ln1I98LakJnZjHAT4CXghmHiIhIVbL+411Fkrl8uTl5rF+866zar1OnDsOHD2fq1KmFyletWkXjxo258MIL6dOnD5999hn79+8/q77EE+wRumeBB4Hiv1WAmd1lZqlmlrp3796Ki0xEROQHyDnnn2YtSfaRk2e9UOKBBx5g+vTpHDlyxF82e/ZsNm3aRGxsLK1ateLQoUPMmzfvrPoRT9ASOjO7BvjWObf6dPWcc391ziU65xIbN25cQdGJiIj8MJkZUbUiT1snqlYkZnZW/TRo0ICbbrqJ6dOnA5CXl8ebb77J2rVrSU9PJz09nfnz52va9RwJ5ghdd2CAmaUDc4DeZvb3IMYjIiJSJbTr1ZzwiOJTgPCIMNolNT8n/YwdO5bMzEwAPv74Y5o3b07z5t+33atXLzZs2MDu3bvPSX9VWdBuW+KcmwBMADCzZGCcc+62YMUjIiJSVXTofwHb0r4tsjAiPCKMOo2j6NDvgjK3nZWV5X/fpEkTjh496t9euXJlobrh4eFK5s6RYF9DJyIiIhWsWlQEgx9KpEO/C/zTr1G1IunQ7wIGP5R4Vvehk+CoFD8x59wiYFGQwxAREakyqkVF0GVAS7oMaIlz7qyvmZPg0gidiIhIFadkLvQpoRMREfkBOZfPZZXAVIZzroRORETkByIqKop9+/ZVigSjqnDOsW/fPqKiooIaR6W4hk5ERETOXkxMDBkZGehG/BUrKiqKmJiYoMaghE5EROQHIjIykosuuijYYUgQaMpVREREJMQpoRMREREJcUroREREREKcEjoRERGREKeETkRERCTEKaETERERCXFK6ERERERCnBI6ERERkRCnhE5EREQkxCmhExEREQlxSuhEREREQpwSOhEREZEQp4ROREREJMQpoRMREREJcUroREREREKcEjoRERGREKeETkRERCTEKaETERERCXFK6ERERERCnBI6ERERkRCnhE5EREQkxCmhExEREQlxSuhEREREQpwSOhEREZEQp4ROREQCZmaMHTvWvz1p0iQef/zxc9Z+eno67dq1K1T2+OOPM2nSpHPWR6BO1+8VV1xRwdGInJ4SOhERCVj16tV56623yMzMDHYoQbV8+fIiZbm5uUGIRMSjhE5ERAIWERHBXXfdxZQpU4rs27t3L4MGDaJz58507tyZZcuWARAXF8eBAwdwztGwYUNeeeUVAIYNG8aHH35Yqv5ffPFFOnfuTPv27Rk0aBBHjx4FYOTIkdxzzz2kpKTQsmVLFi9ezB133EHr1q0ZOXKk//jo6GjGjh1Lx44d6dOnD3v37gVg6tSptGnThvj4eG6++WZ//Q0bNpCcnEzLli2ZOnVqoXYAFi1aREpKCrfccgtxcXHk5uYyfvx4OnfuTHx8PC+88EKpPp9IWSmhExGREi3/ejl3/PsOOr3aiU6vduJ47nEuH3Q5s2bN4uDBg4Xq3n///YwZM4ZVq1Yxb948Ro0aBUD37t1ZtmwZX3zxBS1btmTJkiUArFy5kq5duxbpc9u2bSQkJPhfzz//vH/fwIEDWbVqFZ9//jmtW7dm+vTp/n379+9n4cKFTJkyhWuvvZYxY8bwxRdfsG7dOtLS0gA4cuQIHTt25LPPPiMpKYknnngCgKeeeoo1a9awdu3aQv1t2rSJ999/n08//ZQnnniCkydPFon3008/5Xe/+x0bNmxg+vTp1K1bl1WrVrFq1SpefPFFvvzyy7KefpGARQQ7ABERqZyeS3uOGV/M4FjOMX9Znsvj4U8fpl3/dkydOpUaNWr493344Yds2LDBv33o0CEOHz5Mz549+fjjj7nwwgu55557+Otf/8quXbto0KCBf6SroFatWvkTMKDQNXrr16/n//7v/zhw4ABZWVn079/fv+/aa6/FzIiLi6NJkybExcUB0LZtW9LT00lISCAsLIwhQ4YAcNtttzFw4EAA4uPjufXWW7n++uu5/vrr/W3+5Cc/oXr16lSvXp3zzjuPPXv2EBMTUyjeyy+/nIsuugiADz74gLVr1zJ37lwADh48yJYtW/z7RcqLRuhERKSItG/T+Nv6vxVK5vJl52azq8Munn/xeY4cOeIvz8vLY8WKFaSlpZGWlsauXbuoXbs2vXr1YsmSJSxZsoTk5GQaN27M3Llz6dmzZ6njGjlyJNOmTWPdunU89thjZGdn+/dVr14dgLCwMP/7/O2cnJxi2zMzAN577z3uvfdeVq9eTadOnfz1C7YTHh5ebDu1atXyv3fO8ec//9l/Dr788kv69etX6s8pUlpK6EREpIiX17/M8dzjJe7Pq5FHsyuaFZry7NevH9OmTfNv54+ytWjRgszMTLZs2ULLli3p0aMHkyZNKlNCd/jwYZo2bcrJkyeZNWtWqY/Py8vzj5699tpr9OjRg7y8PHbu3ElKSgp//OMf/aN/ZdG/f3/+8pe/+KdmN2/eXCjpFSkvmnIVEZEi1u5di8OVuD+PPKL6RJG54PvVrlOnTuXee+8lPj6enJwcevXq5b8erUuXLv5VoD179mTChAn06NGj1HH95je/oUuXLlx44YXExcVx+PDhUh1fq1YtvvjiCzp16kTdunV5/fXXyc3N5bbbbuPgwYM45xgzZgz16tUrdWwAo0aNIj09nY4dO+Kco3HjxrzzzjtlakukNMy5kn9hK5vExESXmpoa7DBERH7w+rzZh2+PfnvaOk1rNeWDwR9UUETnRnR0dJlH30RKy8xWO+cSK6IvTbmKiEgRyTHJRFjJkziRYZH0vqB3BUYkIqejhE5ERIoY3nY4EWElJ3ThFs6trW+twIjODY3OyQ+VEjoRESniwjoX8nTS00SFR1EtrJq/vHp4daLCo3gm+Rla1G4RxAhFpCAtihARkWIlt0jmvYHv8cb/3mDJLu9mwEnNk7jpRzfRqEajIEcnIgVpUYSIiIhIOdCiCBEREREJmBI6ERERkRCnhE5EREQkxCmhExEREQlxSuhEREREQpwSOhEREZEQp4ROREREJMQpoRMREREJcUroREREREKcEjoRERGREBe0hM7MoszsUzP73My+MLMnghWLiIiISCiLCGLfx4HezrksM4sElprZv5xzK4MYk4iIiEjICVpC55xzQJZvM9L3csGKR0RERCRUBfUaOjMLN7M04FvgP865T4IZj4iIiEgoCmpC55zLdc4lADHA5WbW7tQ6ZnaXmaWaWerevXsrPkgRERGRSq5SrHJ1zh0AFgFXFrPvr865ROdcYuPGjSs8NhEREZHKLpirXBubWT3f+xrAj4FNwYpHREREJFQFc5VrU2CmmYXjJZZvOOfeDWI8IiIiIiEpmKtc1wIdgtW/iIiIyA9FpbiGTkRERETKTgmdiIiISIhTQiciIiIS4pTQiYiIiIQ4JXQiIiIiIU4JnYiIiEiIU0InIiIiEuKU0ImIiIiEOCV0IiIiIiFOCZ2IiIhIiFNCJyIiIhLilNCJiIiIhDgldCIiIiIhTgmdiIiISIhTQiciIiIS4pTQiYiIiIQ4JXQiIiIiIU4JnYiIiEiIU0InIiIiEuKU0ImIiIiEOCV0IiIiIiFOCZ2IiIhIiFNCJyIiIhLilNCJiIiIhDgldCIiIiIhTgmdiIiISIhTQiciIiIS4pTQiYiIiIQ4JXQiIiIiIe6MCZ2Z/SGQMhEREREJjkBG6PoWU3bVuQ5ERERERMomoqQdZnYP8HOgpZmtLbCrNrCsvAMTERERkcCUmNABrwH/Ap4EHi5Qftg59125RiUiIiIiASsxoXPOHQQOAkPNLBxo4qsfbWbRzrkdFRSjiIiIiJzG6UboADCz+4DHgT1Anq/YAfHlF5aIiIiIBOqMCR3wAHCZc25feQcjIiIiIqUXyCrXnXhTryIiIiJSCQUyQrcdWGRm7wHH8wudc5PLLSoRERERCVggCd0O36ua7yUiIiIilcgZEzrn3BMAZlbLOXek/EMSERERkdII5NFf3cxsA7DRt93ezJ4r98hEREREJCCBLIp4FugP7ANwzn0O9CrPoEREREQkcIEkdDjndp5SlFsOsYiIiIhIGQSyKGKnmV0BODOrBozGN/0qIiIiIsEXyAjd3cC9QHMgA0jwbYuIiIhIJRDIKtdM4NYKiEVEREREyiCQZ7k2Bu4EYgvWd87dUX5hiYiIiEigArmGbj6wBPgQLYYQERERqXQCSehqOuceKvdIRERERKRMAlkU8a6ZXV3ukYiIiIhImQSS0N2Pl9Rlm9lh3+tQeQcmIiIiIoEJZJVr7YoIRERERETKJpBr6DCzAXz/uK9Fzrl3yy8kERERESmNM065mtlTeNOuG3yv+31lIiIiIlIJBDJCdzWQ4JzLAzCzmcAa4OGz6djMWgCvAOcDecBfnXN/Ops2RURERKqiQBZFANQr8L7uOeo7BxjrnGsNdAXuNbM256htERERkSojkBG6J4E1ZvYRYHjX0k04246dc7uB3b73h81sI97zYjecbdsiIiIiVUkgq1xnm9kioLOv6CHn3DfnMggziwU6AJ8Us+8u4C6ACy644Fx2KyIiIvKDEOiUazcgGUjyvT9nzCwamAc84Jwrcn8759xfnXOJzrnExo0bn8uuRURERH4QAlnl+hxwN7AOWA/8zMz+37no3Mwi8ZK5Wc65t85FmyIiIiJVTSDX0CUB7ZxzDvyrXNedbcdmZsB0YKNzbvLZticiIiJSVQUy5fo/oODFay2Ateeg7+7AMKC3maX5XnpmrIiIiEgpBTJC1xDYaGaf+rY7AyvMbAGAc25AWTp2zi3FWzUrIiIiImchkIRuYrlHISIiIiJlFshtSxYDmFmdgvWdc9+VY1wiIiIiEqAzJnS++8D9BjiG94guAxzQsnxDExEREZFABDLlOh5o65zLLO9gRERERKT0Alnlug04Wt6BiIiIiEjZBDJCNwFYbmafAMfzC51zo8stKhEREREJWCAJ3QvAQrybCeeVbzgiIiIiUlqBJHQ5zrlflnskIiIiIlImgVxD95GZ3WVmTc2sQf6r3CMTERERkYAEMkJ3i++/EwqU6bYlIiIiIpVEIDcWvqgiAhERERGRsgnkxsKRwD1AL1/RIuAF59zJcoxLRERERAIUyJTrX4BI4Dnf9jBf2ajyCkpEREREAhdIQtfZOde+wPZCM/u8vAISERERkdIJZJVrrpm1yt8ws5ZAbvmFJCIiIiKlEeizXD8ys+2AARcCt5drVCIiIiISsEBWuf7XzC4BLsNL6DY5546f4TARERERqSBnnHI1s3uBGs65tc65z4GaZvbz8g9NRERERAIRyDV0dzrnDuRvOOf2A3eWX0giIiIiUhqBJHRhZmb5G2YWDlQrv5BEREREpDQCWRTxPvCGmT2P98ivu4F/l2tUIiIiIhKwQBK6h4C78J4WYcAHwEvlGZSIiIiIBC6QVa55wPO+l4iIiIhUMoFcQyciIiIilZgSOhEREZEQp4ROREREJMSVeA2dmf0Db1VrsZxzA8olIhEREREpldON0E0CngG+BI4BL/peWcD68g9NJHSMGTOGZ5991r/dv39/Ro0a5d8eO3YskydPLpe+R40axYYNG8qlbRERCQ0lJnTOucXOucVAB+fcEOfcP3yvW4AeFReiSOV3xRVXsHz5cgDy8vLIzMzkiy++8O9fvnw53bt3L5e+X3rpJdq0aVMubYuISGgI5Bq6xmbWMn/DzC4CGpdfSCKhp3v37v6E7osvvqBdu3bUrl2b/fv3c/z4cTZu3MgDDzxAWlpaoWPWrl3Ld999x/XXX098fDxdu3Zl7dq1ADz++OOMGDGCfv36ERsby1tvvcWDDz5IXFwcV155JSdPngQgOTmZ1NRUAKKjo3nkkUdo3749Xbt2Zc+ePQBs27aNrl270rlzZyZOnEh0dHRFnh4RESlngSR0Y4BFZrbIzBYBHwEPlGtUIqEgIxXeugum96VZ6pNEmGPHjh0sX76cbt260aVLF1asWEFqairx8fHcfffdzJgxA4DNmzdz/Phx4uPjeeyxx+jQoQNr167l97//PcOHD/d3sW3bNt577z3mz5/PbbfdRkpKCuvWraNGjRq89957RUI6cuQIXbt25fPPP6dXr168+OKLANx///3cf//9rFq1imbNmlXI6RERkYpzxoTOOfdv4BLgft/rMufc++UdmEil9p/HYOY1sO5N2PkprJlF9wbfsXzmE/6Erlu3bixfvpzly5dzxRVXcOONN/Luu+9y8uRJXn75ZUaOHAnA0qVLGTZsGAC9e/dm3759HDx4EICrrrqKyMhI4uLiyM3N5corrwQgLi6O9PT0ImFVq1aNa665BoBOnTr566xYsYIbb7wRgFtuuaUcT4yIiARDII/+AugExPrqtzcznHOvlFtUIpXZ9sXw6V/h5LHvy1wuVzSH5f+czbojF9GuXTtatGjBM888Q506dbjjjjuoWbMmffv2Zf78+bzxxhv+aVLnii4mNzMAqlevDkBYWBiRkZH+8rCwMHJycoocV7BOeHh4sXVEROSH54wjdGb2Kt6K1x5AZ98rsZzjEqm8Vv4/OHm0SHH3C8J5d9NxGoRlER4eToMGDThw4AArVqygW7dugLcidfTo0XTu3JkGDRoA0KtXL2bNmgXAokWLaNSoEXXq1DmnIXft2pV58+YBMGfOnHPatoiIBF8gI3SJQBtX3DCCSFW0b3uxxXHnhZF5NI9bLqj2fVlcHFlZWTRq1AjwpkHr1KnD7bff7q/z+OOPc/vttxMfH0/NmjWZOXPmOQ/52Wef5bbbbuOZZ57hJz/5CXXr1j3nfYiISPDYmfI0M3sTGO2c210xIZUsMTHR5U9TiQTN3wfB1g+L32fh0HEYXPunYnd//fXXJCcns2nTJsLCKu5BLUePHqVGjRqYGXPmzGH27NnMnz+/wvoXEamKzGy1c65CZjUDGaFrBGwws0+B4/mFelKEVFld74Gvlhc77Up4JHS+s9jDXnnlFR555BEmT55cockcwOrVq7nvvvtwzlGvXj1efvnlCu1fRETKVyAjdEnFlftuOlyhNEInlYJz8M9xkPba90mdhUFEdej1EPQcE9z4RESkUqhUI3TOucVm1gRvMQTAp865b8s3LJFKzAyungStB8Anz8OBHdD4Muh6L8R0CnZ0IiJSBZ0xoTOzm4CngUWAAX82s/HOubnlHJtI5WUGLZO8l4iISJAFcg3dI0Dn/FE5M2sMfAgooRMRERGpBAK5MjvslCnWfQEeJyIiIiIVIJARun+b2fvAbN/2EOBf5ReSiIiIiJRGIIsixpvZQLwnRRjwV+fc2+UemYiIiIgEJJBFERcB/3TOveXbrmFmsc659PIOTkRERETOLJBr4d4E8gps5/rKRERERKQSCCShi3DOncjf8L2vdpr6IiIiIlKBAkno9pqZ/zFfZnYdkFl+IYmIiIhIaQSS0N0N/MrMdprZDuAh4GflG5ZUBd988w0333wzrVq1ok2bNlx99dVs3rw5aPE8++yzHD36/fNZr776ag4cOFDqdtLT03nttdfOZWgiIiKndcaEzjm3zTnXFWgNtHXOXeGc21r+ockPmXOOG264geTkZLZt28aGDRv4/e9/z549e4IW06kJ3T//+U/q1atX6naU0ImISEU7Y0JnZk3MbDrwpnPusJm1MbOfVkBs8gP20UcfERkZyd133+0vS0hIoEePHowfP5527doRFxfH66+/DsCiRYtITk5m8ODB/OhHP+LWW2/FOQdAbGwsjz32GB07diQuLo5NmzYBcOTIEe644w46d+5Mhw4dmD9/PgC5ubmMGzeOuLg44uPj+fOf/8zUqVP5+uuvSUlJISUlxd9uZqZ3dcErr7xCfHw87du3Z9iwYQCMHDmSuXO/f2BKdHQ0AA8//DBLliwhISGBKVOmlOdpFBERAQKbcp0BvA80821vBh4or4DkByw3Bza9B4ueYv27L9ApvnWRKm+99RZpaWl8/vnnfPjhh4wfP57du3cDsGbNGp599lk2bNjA9u3bWbZsmf+4Rv+fvfuOr/n6Hzj++tybcbOETDvD10pyswdRErRifyn9KmpU8TVrVVtUjS5aStOl+lNa9UVtpUqRCEUlkYSkNYrYNEgiO3ec3x+pW2kSo42EOs/HI4/mfsY55/O52r6d8T5OThw+fJiRI0cyb948AN566y3atWtHfHw8MTExTJ48mby8PBYvXsyZM2dISkriyJEj9O/fnxdffJG6desSExNDTExMqTalpaXx1ltvsXv3blJSUvjggw/u+Jhz5syhdevWJCcnM2HChL/71lAUxRREAuj1epydnenatevfLvuviI2NfaB1Z2Vl8cknn5g+yx5PSZKku7uXgM5JCPENv6cuEULoKUldIkn3LuM4LPCG9cMh9h04/h0c5GmObgAAIABJREFUXg77Fpa6bN++ffTt2xe1Wo2rqysRERHEx8cDEBoaSv369VGpVPj7+5Oenm667+mnnwYgKCjIdHzHjh3MmTMHf39/IiMjKSws5Ny5c+zcuZMRI0ZgZlaShtHBweGOTd+9eze9e/fGycnpnq6vbDY2NqSmplJQUADADz/8QL169aq0DVVJBnSSJEn3714CujxFURwBAaAoSgsguzIqVxTlC0VRflMUJbUyypMeUvoiWNYFcq9CcS4A3o5GEi8Ww565cOw706W3hlHLY2lpafpdrVaj1+vLnLv9uBCCdevWkZycTHJyMufOnaN58+YIIVAU5Z6bX9H1ZmZmGI1G0zXFxcVlrqksnTp1YuvWrQCsXLmSvn37ms4dOnSI8PBwAgICCA8P5/jx40BJz2JoaCj+/v74+vpy8uRJ8vLy6NKlC35+fvj4+JiGtGfPnk1ISAg+Pj4MHz7c9D38+uuvPPnkk/j5+REYGMipU6cAyM3NrXD4+9YwdUJCApGRkQDs2bMHf39//P39CQgIICcnB4D33nuPkJAQfH19mTFjBlAyZH3q1Cn8/f2ZPHlymSHs8p5LkiTpcXcvAd1EYDPQSFGUH4GvgLGVVP8yoGMllSU9rH75FnQF/P53AgDaeagpMgg+P5gFe+YAEB8fT61atVi9ejUGg4GMjAzi4uIIDQ39S9VGRUXx4YcfmoKNpKQkADp06MCiRYtMgd+NGzcAsLOzMwUat2vfvj3ffPMN169fL3W9u7s7iYmJAGzatAmdTnfHcu5Z6jr4OBRm1YK36oKhmGe7PcmqVasoLCzkyJEjhIWFmS5v1qwZcXFxJCUlMXv2bKZOnQrAokWLGDduHMnJySQkJFC/fn2+//576tatS0pKCqmpqXTsWPKv35gxY4iPjzf1BG7ZsgWA/v37M3r0aFJSUti/fz916tQxvcuKhr/LM2/ePD7++GOSk5PZu3cvVlZW7Nixg5MnT3Lo0CGSk5NJTEwkLi6OOXPm0KhRI5KTk3nvvffKDGGX91ySJEmPu3tZ5XoYiADCKUlX4i2EOFIZlQsh4oAblVGW9BC7cMjUM3eLoihs6GPND6f1NJryI97e3sycOZN+/fqZFh+0a9eOd999l9q1a/+laqdPn45Op8PX1xcfHx+mT58OwNChQ2nYsKGpnlvDecOHD6dTp06mRRG3eHt7M23aNCIiIvDz82PixIkADBs2jD179hAaGspPP/2EjY0NAL6+vpiZmeHn53f/iyJ2vwmbxpQMUQsj6PLAqMN3339JP32SlStX0rlz51K3ZGdn88wzz+Dj48OECRNIS0sDoGXLlrz99tvMnTuXs2fPYmVlhVarZefOnbzyyivs3bsXe3t7oGSRSlhYGFqtlt27d5OWlkZOTg4XL16kZ8+eAGg0GqytrYE7D3+Xp1WrVkycOJHo6GiysrIwMzNjx44d7Nixg4CAAAIDAzl27Ng99baV91ySJEmPPSFEuT9ACFD7ts8DgU1ANOBQ0X33+wO4A6n3cm1QUJCQHkG73xZiloMQM2qU//N2g+pu4cMh86wQbziXeT825ggxy0HM6hsiHBwcxJEjR0RMTIzo0qWLEEKIQYMGiQ8++EAIIcSZM2eEm5ubqchff/1VfPDBB8LDw0Ps2rVLCCHE9evXxfLly0WrVq3ErFmzREFBgXBxcRHnzp0TQggxY8YMMWPGDJGdnS3q1atXppm31y2EEKNHjxZLly4VQgjRqFEjcfXqVSGEEHv37hURERGm644cOSLmzJkj6tWrJ3755RcxceJEsWjRojLlnzlzRnh7e1dYX0XPJUmS9LABEkQlxUt3+7lTD91nQDGAoihtgDmUDLdmA4sfSHRZDkVRhiuKkqAoSkJGRkZVVStVJm1vUJmVf05lDn59qrY9D6vU9VDRHEKjniH1TvP666+j1WpLncrOzjYtkli2bJnp+OnTp/H09OTFF1+ke/fuHDlyhEuXLmFtbc1zzz3HSy+9xOHDhyksLARKVgrn5uaaUrHUqFGD+vXrs3HjRgCKiopK5ekrz+3D0OvWrTMdP3XqFFqtlldeeYXg4GCOHTtGVFQUX3zxBbm5Jb23Fy9e5LfffiszZP3nz+U9lyRJ0uPuTgGdWghxazi0D7BYCLFOCDEd+NeDb1oJIcRiIUSwECLY2dm5qqqVKpNTYwgcCObWpY+rzMDaAdpMrp52PWzyr4Oh4oUV9W10jBtbdvrqyy+/zJQpU2jVqhUGwx8L0FevXo2Pjw/+/v4cO3aMgQMHcvToUdOCgrfeeovXXnuNmjVrMmzYMLRaLT169CAkJMRUxvLly4mOjsbX15fw8HCuXLlyx0eYMWMG48aNo3Xr1qjVatPxhQsX4uPjg5+fH1ZWVnTq1IkOHTrQr18/WrZsiVarpXfv3uTk5ODo6EirVq3w8fFh8uTJZYawy3suSZKkx50iKugR+H3lqb8QQq8oyjFguCiZ84aiKKlCCJ9KaYCiuANb7qW84OBgkZCQUBnVSlVNCDj8FcTNg+xzYKYp6blr9zrYuVZ36x4Oqetg84tl5hua1HSD8bI3SpIk6VGhKEqiECK4KuqqYBwMgJXAHkVRrgEFwN7fG/cvKi9tyUogEnBSFOUCMEMIsaQyypYeMooCQYNKfowGUFQlx6Q/NOsG371cfkBnbi17MiVJkqQKVRjQCSHeUhRlF1AH2CH+6MpTUUlpS4QQfe9+lfSPo1Lf/ZrHkZkFDPoWvuxakruvOLdkWFplBoEDIOC56m6hJEmS9JC6Uw8dQoiD5Rw78eCaI0mPOVcvmPgL/LwJLiaAphb4/gccG1V3yyRJkqSH2B0DOkmSqoGZZUkQ5/uf6m6JJEmS9Ii4l50ipAdErVbj7++Pj48P3bp1IysrC/j7m5/f7/3p6en4+Nx5TYqtre1fbs/91iVJkiRJ0v2RAV01srKyIjk5mdTUVBwcHPj444+ru0mSJEmSJD2CZED3kGjZsiUXL140fa5o8/Ndu3YREBCAVqtlyJAhFBUVAfD999/TrFkznnjiCdavX28qJy8vjyFDhhASEkJAQACbNm26YzvutvF5bm4u7du3JzAwEK1WayovPT2d5s2bM2zYMLy9venQoQMFBQUAJCYm4ufnR8uWLWXQKkmSJEkPgAzoqsnt+f8MBgO7du2ie/fupmPlbX5eWFjI4MGDWb16NUePHkWv1/Ppp59SWFjIsGHD+Pbbb9m7d2+p5K9vvfUW7dq1Iz4+npiYGCZPnkxeXl6F7brbxucajYYNGzZw+PBhYmJimDRpkulZTp48yejRo0lLS6NmzZqmnQKef/55oqOjOXDgQKW8O0mSJEmSSpMBXRUqLtTz06bTLJm0l09GxpCfX0ATDy8cHR25ceMGTz31lOna8jY/P378OB4eHjRp0gSAQYMGERcXx7Fjx/Dw8KBx48YoisJzz/2R3mLHjh3MmTMHf39/IiMjKSws5Ny5cxW28W4bnwshmDp1Kr6+vjz55JNcvHiRq1evAuDh4YG/vz8AQUFBpKenk52dTVZWFhEREQAMGDCgcl6mJEmSJEkmMqCrIsWFetbOTSDph3MU5ukAMFdbMLHLJ8wbuZaiwqJSw5GWlpam39VqNXq9nop29QBQKkjSK4Rg3bp1JCcnk5yczLlz52jevHmF5fTr14/NmzdjZWVFVFQUu3fvLnV+xYoVZGRkkJiYSHJyMq6urqa9QCtqc0VtkyRJkiSpcsiArookbT/HzYxCDHpjqeMGvRF9jhnDn3mFefPmodPpKiyjWbNmpKen8+uvvwIl+2xGRETQrFkzzpw5w6lTpwBYuXKl6Z6oqCg+/PBDUzCYlJR0x3bebePz7OxsXFxcMDc3JyYmhrNnz96xvJo1a2Jvb8++ffuAkoBQkiRJkqTKJQO6KpIad7FMMHeLQW9Ed74mfn5+rFq1qsIyNBoNS5cu5ZlnnkGr1aJSqRgxYgQajYbFixfTpUsXnnjiCdzc3Ez3TJ8+HZ1Oh6+vLz4+PkyfPv2O7bzbxuf9+/cnISGB4OBgVqxYQbNmze767EuXLmX06NG0bNmyzBCuJEmSJEl/n3KnYbyHTXBwsEhISKjuZtw3IQSfjIy563WjPm0rhyclSZIk6R9CUZREIURwVdQle+iqgKIoaGzM73iNxsZcBnOSJEmSJP0lMqCrIj5t6qE2K/91q81U+ETUq+IWSZIkSZL0TyEDuioSENWQGs6aMkGd2kxFDWcNAR0aVlPLJEmSJEl61MmAropYaMzo/UowAR0amoZfNTbmBHRoSO9XgrHQmFVzCyVJkiRJelTJKKIKWWjMCOvuSVh3T5mfTZIkSZKkSiN76KqJDOYkSZIkSaosMqCTpAdEUZRSW53p9XqcnZ3p2rXrHe9LSEjgxRdffNDNkyRJkv5B5JCrJD0gNjY2pKamUlBQgJWVFT/88AP16t19NXNwcDDBwVWStkiSJEn6h5A9dJJUifRGPenZ6ZzPOQ9Ap06d2Lp1K1CyJVvfvn1N1x46dIjw8HACAgIIDw/n+PHjAMTGxpp68WbOnMmQIUOIjIzE09OT6Oho0/1ff/01oaGh+Pv789///heDwVBVjylJkiQ9ZGRAJ0mVQAjBV2lfEflNJP/Z8h+e3vQ0hfpCGrRuwKpVqygsLOTIkSOEhYWZ7mnWrBlxcXEkJSUxe/Zspk6dWm7Zx44dY/v27Rw6dIhZs2ah0+n45ZdfWL16NT/++CPJycmo1Wq5T64kSdJjTA65SlIlmJ84n2+OfUOBocB0TCD44voXZB7PZOXKlXTu3LnUPdnZ2QwaNIiTJ0+iKAo6na7csrt06YKlpSWWlpa4uLhw9epVdu3aRWJiIiEhIQAUFBTg4uLy4B5QkiRJeqjJgE6S/qbf8n9j5S8rKTYWlzlXaCikqGkRL730ErGxsVy/ft10bvr06bRt25YNGzaQnp5OZGRkueVbWlqafler1ej1eoQQDBo0iHfeeafSn0eSJEl69MghV0n6m3ad23XHNDROEU4MGj8IrVZb6nh2drZpkcSyZcvuq8727duzdu1afvvtNwBu3LjB2bNn76/hkiRJ0j+GDOgk6W/K0+WhN+orPK9x1NBtULcyx19++WWmTJlCq1at7ntBg5eXF2+++SYdOnTA19eXp556isuXL9932yVJkqR/BkUIUd1tuGfBwcEiISGhupshSaUcuHSA8THjydfnl3veQm3Btz2+pa5t3Spu2eNNURSee+45li9fDpTkAaxTpw5hYWFs2bLlvsvLysrif//7H6NGjarspkqS9A+lKEqiEKJK8lDJHjpJ+pvC6oRRS1MLhbLDruYqc0JcQ2QwVw1uzwMI3HMewIpkZWXxySefVFbzJEmSKpUM6CTpb1IpKhY/tRhHK0eszaxNx63NrHGv4c7cNnOrsXWPtzvlAbxx4wY9evTA19eXFi1acOTIEaDi3H+vvvoqp06dwt/fn8mTJ5Obm0v79u0JDAxEq9WyadMmANLT02nevDnDhg3D29ubDh06mILKzz//nJCQEPz8/OjVqxf5+eX36kqSJN03IcQj8xMUFCQk6WFVpC8Sm3/dLCbvmSym7p0q9pzfI/QGfXU367FlY2MjUlJSRK9evURBQYHw8/MTMTExokuXLkIIIcaMGSNmzpwphBBi165dws/PTwghxIwZM0TLli1FYWGhyMjIEA4ODqK4uFicOXNGeHt7m8rX6XQiOztbCCFERkaGaNSokTAajeLMmTNCrVaLpKQkIYQQzzzzjFi+fLkQQohr166Z7p82bZqIjo5+8C9CkqRqAySIKoqRZNoSSaokFmoLujXqRrdGZRdASA9e7PHfiN51kuNXc7C3MkdnEDRp7k16enq5eQD37dvHunXrAGjXrh3Xr18nOzsbKD/3358JIZg6dSpxcXGoVCouXrxous7DwwN/f38AgoKCSE9PByA1NZXXXnuNrKwscnNziYqKelCvQ5Kkx4wM6CRJeuQt+/EMc78/RoHOCEBekQGdwUjfxQfp0rVbuXkARTkLwm6lnykv99+frVixgoyMDBITEzE3N8fd3Z3CwsJy77815Dp48GA2btyIn58fy5YtIzY29u8/vCRJEnIOnSRJj7jsfB3vbPsjmLvdsSs51A3tzOuvv14mD2CbNm1M26XFxsbi5OREjRo1KqzHzs6OnJycP+rNzsbFxQVzc3NiYmLuKQ9gTk4OderUQafTya3aJEmqVLKHTpKkR9quY1dRq8pP7FygM7AjXce6cePKnJs5cybPP/88vr6+WFtb8+WXX96xHkdHR1q1aoWPjw+dOnXilVdeoVu3bgQHB+Pv70+zZs3u2tY33niDsLAw3Nzc0Gq1pQJESZKkv0PmoZMk6ZH29cGzvLn1ZwrL6aEDaFbbju/Ht6niVkmSJMk8dJIkSfcsxN2hwnMWaoWIJs5V2BpJkqTqIQM6SZIeaU1r2xHsVgtLs7L/OTM3UzG4lXvVN0qSJKmKyYBOkqRH3uKBwbRt5oKFmQo7jRnWFmrcHKxZPbwldeytqrt5kiRJD5xcFCFJ0iPP2sKMRc8F8dvNQk7+lkstawua17EzpSGRJEn6p5MBnSRJ/xguNTS41NBUdzMkSZKqnBxylSRJkiRJesTJgE6SJEmSJOkRJwM6SZIkSZKkR5wM6CRJkiRJkh5xMqCTJEmSJEl6xMmATpKkKmVra1vu8cGDB7N27do73hsZGYnc/k+SJKksGdBJ0h0oisKkSZNMn+fNm8fMmTMrrfz09HQURWH69OmmY9euXcPc3JwxY8b8pTJff/11du7cWVlNlCRJkh4BMqCTpDuwtLRk/fr1XLt27YHV4enpyZYtW0yf16xZg7e3918ub/bs2Tz55JOV0bQHSgjBmDFj8PLyokuXLvz222+mc7NnzyYkJAQfHx+GDx+OEMJ0bs2aNYSGhtKkSRP27t0LQGFhIc8//zxarZaAgABiYmKq/HkkSZKqkwzoJOkOzMzMGD58OAsWLChzLiMjg169ehESEkJISAg//vgjAFqtlqysLIQQODo68tVXXwEwYMCAcnvOrKysaN68uWkocfXq1fznP/+5az3//ve/TWV/9tln9O/fHyg9dBkfH094eDh+fn6EhoaSk5NT9cGP0QAndsCWibD1JRAGEIINGzZw/Phxjh49yueff87+/ftNt4wZM4b4+HhSU1MpKCgoFfDq9XoOHTrEwoULmTVrFgAff/wxAEePHmXlypUMGjSIwsLCB/tckiRJDxEZ0D2mJkyYwMKFC02fo6KiGDp0qOnzpEmTeP/99++rzNjY2FL/U74lPT2d+vXrYzQaSx339/fn0KFDDB06lJ9//vm+6lq0aJEpmKlIQkICL7744n2VW2wo5vv07/k05VNWHluJQDB69GhWrFhBdnZ2qWvHjRvHhAkTiI+PZ926dab316pVK3788UfS0tLw9PQ09SIdPHiQFi1alFvvs88+y6pVq7hw4QJqtZq6devetZ7Fixcze/Zs9u7dy/z58/nwww9LP0txMX369OGDDz4gJSWFnTt3YmVlVbXBT0EmLHoC1j4PCUsg/nPQF8GSp4iL2UXfvn1Nz9uuXTvTbTExMYSFhaHVatm9ezdpaWmmc08//TQAQUFBpKenA7Bv3z4GDBgAQLNmzXBzc+PEiRMP5pkkSZIeQnLrr8dUeHg4a9asYfz48RiNRq5du8bNmzdN5/fv318q4LsXsbGx2NraEh4eXuq4u7s7DRo0YO/evURERABw7NgxcnJyCA0NJTQ0tNzyDAYDarW63HMjRoy4a3uCg4MJDg6+5/YfunyI8THjMQgDBfoCLNQWFOoL+fLUlwwYMIDo6GisrP7Y6H3nzp2lAtGbN2+Sk5ND69atiYuLw83NjZEjR7J48WIuXryIg4NDhQsCOnbsyPTp03F1daVPnz6lzlVUj6urK7Nnz6Zt27Zs2LABBweHUvcdP36cOnXqEBISAkCNGjWAkuBn7NixQOngx9fX957f1T1bPxyu/QrG4j+OCQFXjsCZqyiBZb+fwsJCRo0aRUJCAg0aNGDmzJmlAk5LS0sA1Go1er3+9yJFmXIkSZIeJ7KH7jEiDAbyDh4ke/Nm/C0sTL1paWlp+Pj4YGdnR2ZmJkVFRfzyyy8EBASQmJhIREQEQUFBREVFcfnyZQCio6Px8vLC19eXZ599lvT0dBYtWsSCBQvw9/c39Urd0rdvX1atWmX6vGrVKvr27QuUXrloa2vL66+/TlhYGAcOHGDJkiU0adKEyMhIhg0bZlooMHPmTObNm2e6/5VXXikzryo2NpauXbsCcOjQIcLDwwkICCA8PJzjx4+Xal96djpjdo8hR5dDvj4fgaDIUATAV2lf0aBzA5YsWUJeXp7pHqPRyIEDB0hOTiY5OZmLFy9iZ2dHmzZt2Lt3L3v37iUyMhJnZ2fWrl1L69atK/xuLCwsCAoKYv78+fTq1avUuYrqgZJeNkdHRy5dulT2+xai3M3pqyz4uXkJzuwpHczdoi+ijd05Vv3vawwGA5cvXzYN/d4K3pycnMjNzb3ryleANm3asGLFCgBOnDjBuXPnaNq0aeU9iyRJ0kNOBnSPieytWzn5RGsujB7D5ZmzMM5+A5GRwdFFn7F//35atmxpCqISEhLw9fVFURTGjh3L2rVrSUxMZMiQIUybNg2AOXPmkJSUxJEjR1i0aBHu7u6MGDGCCRMmkJycXCZ4+c9//sPGjRtNPSqrV6/m2WefLdPOvLw8fHx8+Omnn/D09OSNN97g4MGD/PDDDxw7dqzC5ytvXtXtmjVrRlxcHElJScyePZupU6eWOr80bSnFhnICD6DQUMjys8vp/UxvlixZYjreoUMHPvroI9Pn5ORkABo0aMC1a9c4efIknp6ePPHEE8ybN++OAR2UDHPPnTsXR0fHUscrqufQoUNs27aNpKQk5s2bx5kzZ8o886VLl4iPjwcgJycHvV5fdcHPtZOgtqzwdE+tLY3rO6PVahk5cqSp97ZmzZoMGzYMrVZLjx49TD2MdzJq1CgMBgNarZY+ffqwbNkyU0+eJEnS40AOuT4GsjZt5sqMGYjbhq0EEGCpYcc775Dg1pBXP/yQixcvsn//fuzt7U29WKmpqTz11FNAyRBonTp1APD19aV///706NGDHj163LUNtWvXxtvbm127duHq6oq5uTk+Pj5lrlOr1aYeqkOHDhEREWEaSnzmmWcqnBdV3ryq22VnZzNo0CBOnjyJoijodLpS5/ec34NBGCpsf7GhmN5De/PJx5+YjkVHRzN69Gh8fX1NgdKiRYsACAsLw2AoKa9169ZMmTKFJ554osLyAby9vctd3VpePR988AHDhg1j6dKl1K1bl/nz5zNkyBB2795tus/CwoLVq1czduxYCgoKsLKyYufOnYwaNYoRI0ag1WoxMzN7cMGPjTMY9WUO504tGfpVjDo+Wjgf7OuVuebNN9/kzTffLHM8NjbW9LuTk5Ppu9ZoNCxbtqxSmi1JkvQokgFdFXnrrbf43//+h1qtRqVS8dlnnxEWFnbf5cTGxmJhYWGapzZ48GC6du1K7969y71eFBdz9Y038E5JprGlJXohMFMU/l3DHn8rDUk3b5KSkIBXo0Y0aNCA+fPnU6NGDYYMGcKhQ4dQq9WmHqHbbd26lbi4ODZv3swbb7xBWloaBw8epE2bNhW2/dawq6urq2m49c80Go1p3tz9DA2WN6/qdtOnTzfNNUtPTycyMrLUeaMwlrkHwOszLwAUFGo51yI/P990zsnJidWrV5d73/Lly02/h4eHl1kQcou7uzupqalljg8ePJjBgwffsZ6UlBTT7927d6d79+4ApQKbkJAQDh48WObeKgl+XJpDjTpw/VQ5JxVw9Sk3mJMkSZLuX7UOuSqK0lFRlOOKovyqKMqr1dmWB+nAgQNs2bKFw4cPc+TIEXbu3EmDBg3+UlkVrSStSG5cHEIILBWFDe4efOvhyf/Vb0BcXi4niorYk5dLTbUZ+TGxODg4kJWVxYEDB2jZsiUNGjSguLiYAwcOAKDT6UhLS8NoNHL+/Hnatm3Lu+++S1ZWFrm5uRw6dIjMzMwK29KrVy++++67Codb/yw0NJQ9e/aQmZmJXq9n3bp19/zcf5adnU29eiXBQ3nBTFidMFRKxf86KIqCp73nX67/saQo0GsJWNiActvfHVXmYGkHPT6tvrZJkiT9w1RbQKcoihr4GOgEeAF9FUXxqq72PEiXL1/GycnJ1Ivk5ORkSkuxa9cuAgIC0Gq1DBkyhKKikon47u7upmS2CQkJREZGVrjwIC4ujvDwcDw9PctMINddugx/Gl50NDNjlmtttt+8SabBgIeZGVETJxAYGEh6ejoWFhY4OTlhbm5OYGAgr7zyCo0bN6ZGjRps2rSJH374AR8fHzQaDS4uLowdO5avvvqKnJwcPvzwQ2xtbdm7dy8jR44kODgYb29vZsyYQc2aNWnRogWurq54eHjc9b3Vq1ePqVOnEhYWxpNPPomXlxf29vZ/6Tt4+eWXmTJlCq1atTINhd5uqHYoFiqLcu/VmGkY6DUQc7X5X6r7sVY3AEb8CIHPga0r2NWBkCEw6iA4y0ULkiRJlUYIUS0/QEtg+22fpwBT7nRPUFCQeFRczy0SH+46IXp/+qN47tMY4dnUWzRu3FiMHDlSxMbGCiGEKCgoEPXr1xeAmDhxohgwYIBYsGCBeO+994S9vb3IyMgQQggRHx8vIiIihBBCzJgxQ7z33numegYNGiR69+4tDAaDSEtLE40aNSrVjqzNm8VOL2+hgPi5abNSPzVUKhHX6F/isK+vuPTVciGEECdOnBC33nNMTIzo0qWL+PHHH0VgYKA4e/asqc3Hjx8XQghTm4UQws3NzdRmIYS4fv26EEIIvV4vIiIiREpKyn2/x5ycHCGEEDqdTnTt2lWsX7/+vsu4VzvO7BDBy4NF8PJg4bMFFo8oAAAgAElEQVTMR/h96SeClgeJ1/a+JgxGwwOrV5IkSfpnAhJEFcVV1TmHrh5w/rbPF4D7n1T2EDpxNYfen+6nSG+kSF8yd8qq91zqF53FQXOFPn36MGfOHAICAvDw8CAjI4P169czf/58vv766zJ53O6mR48eqFQqvLy8uHr1aqlztm3bQgXzt27NUNMbjLy05VuOvDsXtVpdauHBL7/8wvDhw9mxYwd169YlJSUFDw8PmjRpAsCgQYP4+OOPGT9+fJnyv/nmGxYvXoxer+fy5cv8/PPP953rbObMmezcuZPCwkI6dOhwTwsw/qqn3J8itE4om09t5kTmCRw1jnRv1B3PmnKoVZIkSXq4VWdAVzZB1h8xxh8XKcpwYDhAw4YNH3SbKsXIrxPJKdSXepgCPVzQeDKgU2c+9NWyNDqaet7eFJ8+jRoY+txzpYZLVSoVRqORjIwMJk2axOHDhwkJCUGr1eLl5YVWqzUNuY4YMQKDwcDAgQMpLCxk586dpr081ba22PfuDW++AYBBCN7PyODHvFzyjEZ2FRaQ69YQPeDo6Ejt2rVJSUkhLy+PV199lStXrmA0Gvnggw+YO3cuaWlppKSkEBQUhJOTEy+88AJ5eXkEBgaa2n7y5El69uxJYWEh8fHx1KpVi8GDB/+l3Qhu5ZqrKvaW9gzwGlCldUqSJEnS31WdiyIuALevDKgPlMmOKoRYLIQIFkIEOzs7V1nj/qrjV3K4lFVYKpjTXb+A7sZFCnQG1u46Suxrr1Hr2HGcY2I5e/UqxuJiojZsZNP69aZdE2rWrEliYiLjxo3D1dWVoKAg1q1bx5YtW8jJyTFtL5WZmYmLi4spuDMYDGW2l3Ic8jwoCoqFBevz8lAr4GRhwTBnZzYIwY2GDXF0dCQ+Pt4UmH3//fc4OjrStm1b0tPT2bp1Kzt37iQ6Oho7OztWr15tykvXuXNn7O3tMTc3Jycnh6VLl9K1a1dsbGywt7fn6tWrbNu2rUrevyRJkiQ9jqqzhy4eaKwoigdwEXgW6FeN7akU13KLMFMrcNs6BKOukMwfFmEsykOVf4MCMzWzXFywVBTeql2HoRfO0+/kSRqqFDJTjuAcFEhERATjxo3jzJkz1KpVki6je/fuKIrCunXryMnJ4ebNm1y9epWoqCgOHz7MxYsXURSlzPZSiqIghKCP0cDp69co1umoZWtLjrMzOfn5tG7ThjfffBO1Wk1mZiY2NjZotVoSExOxsbHhxIkTfPfdd7Rr147Lly9Tu3ZttFqtafP5ESNG4OTkxOeff06nTp04e/YsFy5c4MqVK3h7e+Pp6UmrVq2q+JuQJEmSpMdHtQV0Qgi9oihjgO2AGvhCCJF2l9seeo2cbSnWl56zZln7X9QeMI/gq7/wWsLXWOqKTOda2tigURQ2eXiQZTDwzMYNDG3aBDc3N5YuXYqTkxNnz54ttYcowPnz5+nTpw8eHh689NJLjBs3jrVr1zJhwoRy2+Xt7U1Kaiq9evVi+PDhREVFlTrv4eHBvHnzeOedd3jnnXfYsGEDV69eZe7cuUyZMsXUc+jr62tKY3K7Xr16MWvWLN577z1WrFiBo6OjKT2Ira0tubm59/0uJUmSJEm6N9Wah04I8Z0QookQopEQ4q3qbEtlqW2voXVjJyzUZacIdjyfWCqY+7OaajVRNWqwZPFi07HK3l4qKiqKTz/91LRTwokTJ0rtT3rLF198QXh4OPn5+bz00kukpqZiYWFBRkZGmbx0UJIQOCoqipEjR/L888/fsQ2SJEmSJFUuuZfrA7Cgjz9+DWpiZa7GwkyFtYUaSzMV3rblrza93ZA6dbl+44bpc3R0tGlvVS8vL9PWUlCyvdSt1aatW7fm4sWL5W4vpdfrTTnwhg4dipeXF4GBgfj4+PDf//63zM4Kubm57N+/n2vXrjFnzhzeeustxowZg0qlYu3atYwbNw4bGxtq1KhBp06dTImOo6KiuH79Oq+++io+Pj6meX23XLt2jZYtW7J161bS09Np3bo1gYGBBAYG3leyZEmSJEmSSlPEfWyvVN2Cg4NFQkJCdTfjnh29kE18+g1sLc14ysuVovfnkrn6Gyhna6pbFEtLGu3Yjrmr69+uX1EUJk6caNqMPTQ0lNzcXGbOnFn24vwbcOkwqC35Ou4UMXF7WbJkCeHh4Xz00Uc4ODjQtWtXUlNTyc/PR6VSodFoOHnyJH379iUhIYFu3brxww8/cOHCBdO8Pzs7O2xtbTl16hTdu3fnzTff5KmnnqqwDEmSJEn6p1AUJVEIEVwVdcm9XB8gbX17tPX/2NmgqF8/stauQ1QU0CkKVn6+lRLMQcn+pkuWLGH79u0sX76cXbt2lb1IXwzfvQRHVoHaEhCs/PIa48eOBeDZZ59l5cqVjB492nSLTqdjzJgxJCcnm/LW9ezZk59//hm9Xs+7775Lv3798Pf3N13fvn17Pv74YyIiIiosQ5IkSZKkv0YGdFXIslEj7J/uSfbGTYiCAoJOHCexyR/bH6msrak9ffpfLl8YjeQdOEBuTAyiWIeZovDypEnkFRYSEBBQKqDLyMhgxIgRnEuJg8IsFnawoGWDQtwX5vJbniB12vsob3/Jhas3cHV1pWnTppw+fZqAgACys7OJiooiJSWFjIwMateuTXp6Ok8++SRbt26ldu3aDBgwAAuLkq20dDoddnZ2bN++3RTQLViwAFdXV1JSUjAajWg0mr/83JIkSZL0uJNz6KpY7ddfx3H4cFS2tiiKUvJPjSUab2/c/vc/LBs3vuP9kZGRbN++vdSxhQsX4tGwIS83bcrFsS+S+fUKsr75BlFcTNS69Xz9xRdkZ2eXumfcuHEM6BmFm+VN1j2jYei3hagUhX85qGhRX83Z8bas7m1DZGQkBQUF1KhRAw8PD5KSkmjUqBE///wzKpWK/v37A5CUlESLFi24ePEiAwcO5IUXXiAkJITExESsrKzIyckhJSWFOXPmAJCdnU2dOnVQqVQsX7683P1VJUmSJEm6NzKgqwSKojBgwB+7C+j1epydnenatWu51zqPHEGT/T+iWFpS9913cV+/nk88PQju9TRarZbVq1cDMGrUKDZv3kxsbCy1a9dmyJAh9O3bl9mzZ/Paa68BJdt+TZs2Dd3Vq9jm5mHMzzfVJYTAoqCALkIwf9o0oGSV7JgxY9i5cycz33iLX28Y6b4yn5tFgpwiQWaBILe4ZF7lqvjf6NPtKWbOnEl0dDRnz55Fq9Vy6tQpDh8+TIsWLTh69KgppYpKpUKtVtO2bVvWrVuHpaUlfn5+FBYWcv78eV599VViYmL45JNPGDVqFF9++SUtWrTgxIkT2NjYPJgvR5IkSZIeB1W1aWxl/NzaNP5BunbtmvDz8xN+fn7C1dVV1K1b1/S5qKjIdB0gnnvuOSGEEDY2NsLPz084OTmJLl26iO+++064u7uLpk2blin/em6ROHDqmjidkStsbGyEEEKsXbtWtGvXTuj1enHlyhXRoEEDcenSJbFy5Urx0ksviZiYGGFvby/CwsLEtWvXhKWlpdi8ebMQQoikpCRRz8FBzKxXX9irVGKys7MI1FiJQbVqCRWIQbVqiW/c3ISFSiXq1q0rmjRpIszMzISjo6P4Ze0c4e1iJsSMGmLpvzWiZzMzEdVILcxViNEh5sK9pkpcO5Ui3NzcRHh4uNi0aZP497//LRo3biysra3FZ599Jvz8/MTp06dNz1erVi2RkZEhYmJiRKtWrUReXp4QQoiIiAgRExPzoL42SZIkSXroAAmiimIkOYfuTxwdHU253mbOnImtrS0vvfRSqWuMRUXYaDQkbdvGiYGDEDodjd3dycrKAmDlypWMGDHClLbj0KFDjBs3ntNXb3CzWEWDHhNR1apHfmEx3Xs+TWpKMpaWlgwePJjevXsTERFBfHw8q1ev5ueff8bb2xsrGyvyNfmMXzaeYl0xo8eO5p133kEIQUFODu9nZXHTaGR7Tg6niorIMOixAF5xcaVn+hnUQnD92jXUajVQkt/uyx/Pc2v73HPZRpKvGEj6ry0zYwv5/LCOkIbWOHpoAbh58yb16tXjiy++YNKkSfz6669ER0fTsmVLVqxYwWuvvca2bdvIzMwESoZUa9WqhbW1NceOHePgwYMP+quTJEmSpMeWDOhuU2wopkBfgJ2FHSrlj9HoxMREJk6cSG5uLg7W1swoKEQUF3NDr2fkhvUUFhezdfNmXJydMRqNxMbG8tNPP3Hu3DkaN25M48aNOfjTQSyd3dEX5XFqyQRUNrUQBh3fbtqEQ62aprxta9asQQjB9u3buXHjBmq1muEjh6Mr1KFqrWL7we0IleDCpQvk5ueSmZFJdycnfvw9kDpbXIwe+E2nQ1EUMvR6ThQVsaFJU/qeO2uaqxYdHc3gwYP5NRO8PsnD1Rrae5hhr1Ho72vBwp90hLfrBEpJguSXX36ZZ555Br1eT2FhIfn5+ajVaubPn8/8+fMJDCzZrqxhw4YAdOzYkUWLFuHr60vTpk3L7C8rSZIkSVLlkQEdcDr7NAsSFrDv0j4UFDRmGvo164fBaEAIwdixY9m0aRPWags+Cw3m/WsZCCFwUKs5VVSEBeCgVnM1IwO7kyfx9fVl3759ODs7c/ToUezs7EAIDPpiDHnZqO0csXLzI+f6BSxd3Bk9cRzzZ01h9+7dREREUKtWLTp37szSpUv5V/C/+DXtVzAHTRMNl+ddBgWEXpBTlAOAotHgZm7OdYOBCBsbThUXk67TMd7RiZSCAswUhUYWFuRmZjIvOppp06bh5OTERx99VJJb7r0uLPv8UxIuG8HCluBGFnRp3YSn+o0xvaNOnTrRoEEDXnvtNXbs2IG1tTWRkZFYWlqyY8cO03ULFiww/b5t27Yq+w4lSZIk6XH22Ad0JzJPMPC7geTr8xG/Dz/qinUsS1uG7pSOp5s9TWpqKj6hrTFmZVKz4CYu6pLeO2uVinyjwABE2tqxMTuLY6dO0WfQIM6fP092djYajQYLCwv0ej0aN3/yUndhyLxEbs41QFB07TxOjXywsbGhTZs2FBcXYzAY+O6777CxseHUsVMYCgwg4MzcM4jiPxJB63NK8tltu3LFlNvu25wcBCWrXS7pdbiam6MCOp87i11QENbW1mVfQqe5ZB6tybaPP4IBG6FuAOztUeYyOYwqSZIkSQ+nx36V66z9s8jT55mCuVuKDEVcK7jGiRsnsHR2o2b/BXze1I/N7u78X4OGpuuCra3QAVF2tlioVLhZWFDX2Rkzsz9iZTs7O1BUGAtuIvTFANQdthj7NoNQ1GZMf6EXGRkZqNVqGjVqRE5ODjqdDgcXB5xaOqGpr4Hfdw1T26vReGhQaVS3pr8RHBqKnYUFZsBwBwea/b7NVysbW/ysrbBTq3Fq2BCNRkNKSoopP9ztarnUoVO3ntAgBNTlx/kdO3ZEr9ejKArdu3c3DaPOmzev/N0n7iA2NrbUdl+DBw9m7dq1d73vypUrPPvsszRq1AgvLy86d+5cKUmJ09PT8fHxASAhIYEXX3zxb5cpSZIkSVXlsQ7oruRd4Xjm8QrP64w6kq4fJTvzOjfPpqExFKMTgpNFRaZr2tnaYg40srBEBdS3sEDo9RiNgpxCPe3nx5KjrgHCiCH7CggBihph1GNubYO1vSOWFua4urpib2+PhYUFBoOBnj17knElA6dWTtg0L0npIYwCW29bCs8UYiw0orIp+fqMRiNN/P1RqVTE5eXTzdEJgMU3bjD++nVy1WqO/PILx48fx2g0oigKPj4+dOvWjem/JzKOjIwkNjYWgLS0NH777TfGjx+Pr68vP/zwA05OTlhaWrJt2zYsLS3R6XSsXbuWyMjI+37ver2+TEB3L4QQ9OzZk8jISE6dOsXPP//M22+/zdWrV+/5fqPx7vvpBgcHEx0dfV9tkyRJkqTq9FgHdNcLrmOuMr/jNXm6PGo/PYXM2GUMPZbC0+lnSC4oMJ13NDMjpWmzP25QqTBYWHIuT8HYIIBTGXkUZmcAIHTFaOp7oTa35NpXE8nc+TkUZHHjxg1sbGy4evUqN27cwGg08vXXX6PX6UmblUb2wd+TAush+1C26Vsz5pcEJ0IIatSsiWJuzrHiIj7NvIGiUiEaeXLN1hYLS0sMBgOKomBhYYGbmxtjx45FrVYzYMAAOnfuTMFtz7Ro0SLGjRtHcnIyCQkJ1K9fv9Q7MTMzY/jw4aXmy91y9uxZ2rdvj6+vL+3bt+fcuXNASQ/cxIkTadu2LX369GHRokUsWLAAf39/02rguLg4wsPD8fT0LLe3LiYmBnNzc0aMGGE65u/vT+vWrcnNzaV9+/YEBgai1WrZtGkTUNLz1rx5c0aNGkVgYCDnz59n8uTJ+Pj4lMr5d7vY2FhTDsGZM2cyZMgQIiMj8fT0LBXo9ejRg6CgILy9vVm8eHGZciRJkiSpqjzWAZ2rjSvFhuKKz/d0xbtnKDZ1GlO7/1y8+r3Nmn815ZmaNUls0pQvG7rhoylJqlvLzIx9Xt58M306Fl7tce7zJvbtSwKPBmO+RjHX0KrXVF5zqskTlmpSmjch1dsbP0tLwpo3Jy0tDbVajcFgYOTIkej1euo0DaDZvOG4PF0XVKCyVKGpq6Hh2IZY1rFEbaWmgWcDzM3NSUpKokuXLgwbNoyGHh5Y29jQsUcPLl68iL+/P7a2trRq1YqioiK6dOlC7969SU5Opk+fPtja2pYKbFq2bMnbb7/N3LlzOXv2LJYaS3ad3cXz3z9Pp3WdKDIU4ftvX1asWFFmB4oxY8YwcOBAjhw5Qv/+/UsNXZ44cYKdO3eybt06RowYwYQJE0hOTqZ169YAXL58mX379rFlyxZeffXVMt9HamoqQUFB5X5XGo2GDRs2cPjwYWJiYpg0aRIlKYDg+PHjDBw4kKSkJBISEkhOTiYlJYWdO3cyefJkLl++fMc/J8eOHWP79u0cOnSIWbNmodPpAPjiiy9ITEwkISGB6Ohorl+/fsdyJEmSJOlBeawDOicrJwJdA0ulKLmdlZkVw/0GY/g9MDhRqyE7GoZQqC7bq6eYm2Netw6O//0vq+PPU6grPbSnEkbm7f2YutfPgdGIyMtFFBaSnp1N06tXKdi4kaFDh2I0GtmzZw81nVy5UaSQf7kTxiIXUzkuPVy4vPwyRVeK8Ar0wsHOgRs3btC8eXPMzMwwNzfn6aefRq/XY25ujoeHB9bW1uTl5XHp0iVsbW2BkuCodevWfPvtt+zZs4eTJ0+a6ujXrx+bN2/GysqKqKgoes7vyZR9U0i4msCF3AsYhZFZSbOo3aY2H3zwAVlZWaxatYrGjRvz3XffkZCQQHFxMUIIvv/+e1O5zzzzjCkP3ooVK0r1CkJJj5dKpcLLy+ueh1FvEUIwdepUfH19efLJJ7l48aKpDDc3N9N8v3379tG3b1/UajWurq6mnH930qVLFywtLXFycsLFxcVUbnR0NH5+frRo0YLz58+XeoeSJEmSVJUe64AOYGb4TGpY1MBMKb0QwEptxRP1nqB7k6eY1rk5VuZqFOBT3578n083bmhqYLCwRGVjg6LRYP90T9zXrEFta0uB/k/7kgrBLp9ArAzFtLC24dP6DQDIMhi4bjDwfWYmPgMHsmXzZmxsbIiOjkZxbIhTr9cBMyzrz0BRWyCECusmIdQN8sXe3p4je4+wcOFCACIiIujUqZOpSq1Wi6OjIxqNhm3btmFtbc3AgQPp2LEjKSkpDBo0iNmzZ2Nvb8/UqVMpum1e4OnTp/H09OTFF1+kcavGJCUnUaAvHXwV6AsoCi/iw88+ZOnSpTRt2pSTJ09Ss2ZNcnNzmfb7VmPK73nsgFLbe/Xv39+0Zdgtlr8v5ih5ZaUXqQB4e3uTmJhY7ve4YsUKMjIySExMJDk5GVdXVwoLC8vUW165d3N7u9RqtWkO4M6dOzlw4AApKSkEBASY6pMkSZKkqvbYB3T1bOuxvvt6+jTrg525HWaKGW413JgSNoV5EfNQKSoGhbuz/IVQnvJ2pZGLLbquPbFcv5UmmzfitvJ/NDmwnzqzZqG2swOgXVMXzNV/BDLaa6ewLc4vU/f2nJt0r2HPrkb/YldzL5Jem46Hhwf79u1Dp7898FAAFQgzrsd4k30sh3p165nOOjg4sHHjRoqKitDpdGzYsAFXV9dyn7dZs2b4+vpy4cIFRo0axdtvv82WLVtKXbN69Wp8fHzw9/cn4WgCNi3L32dVb6XH3N2c69evExAQAECrVq1o2bIlX3zxBXv27KFWrVp07NiR9evXs3z5ctO90dHRpp6u999/n02bNjFx4kRTgFqedu3aUVRUxOeff246Fh8fz549e8jOzsbFxQVzc3NiYmI4e/ZsuWW0adOG1atXYzAYyMjIIC4ujtDQ0ArrrIhM4SJJkiQ9TB77PHQAztbOvBr6Kq+Glp23dUuwuwPB7g5/OupU7rXD2niyJvECeoMeATTKvoSZMJS57rubNxnq6AiAKCoiPymJXr168emnn1LDyglFKVkUe0vDiWv5bdVUNMV5mJs74e/vT/fu3dm7dy/vv/8+H374IQBDhw5l/PjxpKen88knnwCQm5vLvHnzUBSF9957D09PT959910+//xztFotOTk5pKamAjBlyhSmTJmCwWjAf7n/Hd+dsY4R/e858KAkUBsyZAj5+fkcPHgQMzMzVq9ezZgxY/j+++85f/48DRo0wNramq1bt7Jp0yb0ej2dO3emY8eOzJkzh4iIiHLrUhSFDRs2MH78eObMmYNGo8Hd3Z2FCxfi7e1Nt27dCA4Oxt/fn2bNmpVbRs+ePTlw4AB+fn4oisK7775L7dq1SU9Pv+Nz/pncCUOSJEl6mCh/ZQiqugQHB4uEhITqbsY9+fW3HKasP0ry+Sy6nN7P4CObsTTo7niP7ZNP0uCjkqDs50s36fXpfgp0fwSCigI1NObsnhSBo61lRcVUGiEEwV8HU2yseOHIte+zaa5/kt1rvih13N/fnxdeeIEjR46YetQ6derEtGnTeOKJJ3B3dychIYEVK1Zw/fp1Zs+eDcD06dNxdnaWeeAkSZKkR56iKIlCiOCqqOuxH3J9UP7lYseaEeEkvPYUr0wfiMbszq9asbGhRlSU6bNX3Rp89UIoTV1tMVMpmKkUQtwd2DAqvEqCOSjpEevk0Qm1oi73vDCqUNfUcvCneHakXTEdv3nzJufPn0etVpc7/6xUGY/QXygkSZIk6WElA7oHzN7KnDpeTbAKCACzike4VRbm2EV1KHUsxN2B7RMiSJz+FCkzOvDNf1vi6Wz7oJtcymj/0diY26D60x8VYVQQRivMHPpj0BXy6tyPADAYDEyaNInBgweXv83Yn7Rp04aNGzeSn59PXl4eGzZsMKUxkSRJkiTp3siArorUW/A+5vXqoVhpSh1XLCxQ2drScMkSVOVsyQUlQaGNZfVMd6xjW4eVXVYSWicUhBnCYIkwmmHIb0z+mTFgsMe55zRO/bSTxo0b06RJEzQaDW+//fY9lR8YGMjgwYMJDQ0lLCyMoUOHmhZYSNKDMGHChFKLb6Kiohg6dKjp86RJk5g9ezZz5sypjuZx6dIlevfuXS11S5L06JJz6KqQsaCArI0byfzyK/QZGahsbLDv9TQO/fph5uxc3c27q96Lt3P44nmEvgbCUHrla1DDWqwbFV5NLZOke7dmzRrWrFnDN998g9FoJCQkBAsLCw4cOACUJNZeuHAhYWFh1dxSSZIedXIO3T+UysoKh759afT9NpomJtA4bg8u48Y9EsEcwPi2QVga65cJ5qzM1Yxp969qapUk3Z9WrVqZ9hFOS0vDx8cHOzs7MjMzKSoq4pf/b+/e43Os/weOv9472MZsDkNCNIVmR8xm5tRKSL6IkGKV5BuR0EFfJX07yS9SSVQm9kUHcioihKaMzBhyakjIabPZxg6f3x/37W5rG3PafS/v5+PRo/u+ruvzud73lW7v+3N9rvdn5062bt3KkCFDAEsC6O/vT1BQEK1btwYsUwtGjhxJQEAAgYGBtifMv//+e0JCQggICODRRx+11XesV68eL7/8sm1pul27dgHwww8/EBwcTHBwMCEhIaSlpZGcnIy/vz8AMTExdO/enQ4dOnD77bfz7LPPluq1UkqVHVq2RJVY5O0+PN+xEW98uxMXJ8tvgezcPEa0b0C7RtUv0VqpookIDz30kK1OYU5ODjVr1iQsLIwlS5awaNEiduzYUeRycCWWvB7WvAGHf+FmFzdcstM4uPMX4uLiadGiBYcPH2bDhg14e3sTGBhIuXzTH8aNG8fLL79MrVq18PPzA2DatGn89ttvbNmyBRcXF06dOkVWVhYPP/wwFSpUYM+ePfTr148PP/yQlJQUzpw5w759+1iyZAlff/01EyZM4OOPP2bChAl88MEHtGzZkvT0dNzd3endu3eBVVQSEhLYsmULbm5uNGzYkKeeeoo6depc+bVQSv0jaUKnLkv/iHrc37Q2cXtPYICI+lWp6F54KTSlSqpChQps376dzMxMPDw8WLFiBbVq/VU4u0uXLnTp0uXKT7B1HiweBhdWO8nOoGWNLOL+25E404ZnnnuRw4cPExcXh7e3NxERBacOtGzZkueff54mTZowdepUAFauXMmgQYNwsT7oVKVKFbZu3Urt2rXJyLAUEe/fvz8ffPABgYGBAOzbt48//viDpk2bMn/+fFvfzzzzDH379qV79+7Url27UPhRUVF4e3sD4Ofnx4EDBzShU0oVogmdumyebi60b3yTvcNQ/yAdO3Zk6dKl9OjRgzlz5tCnTx/WrVsHWG47btq0iffff5/o6Gi8vLzYtGkTR48eZfz48fTo0YM1a9YwduxYfHx82L59O02bNmX27NlIdiabPxrMM9+kkH7e4FNeiPmXBxG1nfhw/Sl+PjKfX7bt5LbbbuPMmTO4u7tz8uRJYmNjybQXJQIAACAASURBVM7OZuDAgTz//PN88cUXLFu2jJo1azJ//nz++OMPnnrqKTzc3fEG3q5/G0f+OMy5ffvIrVCBvCKWgUtMTKRv374YY7jpppsYN24cixcv5vTp08yePZvx48ezcuVKwFL6p3nz5hw8eLBAgllU6R+llAKdQ6eUspOc3DxSMyzFtnv37s3cuXPJysoiMTHxog8kHDlyhPXr17NkyZICt2G3bNnCpEmT2LFjB/v37+fHH38ke+cynlqaxpc9Pdg80JNHg8vx4qpztLzFmR8P5RBR24ltiYnMmDGDlJQU1qxZQ+vWrXn11VcJDw+nX79+5Obm8tRTT/Hyyy/TuHFjateuTc+ePfG77TbmVKlK1KnTvPP9Sm45eYojqalkHDvG3rvuJubDDwusehIcHExsbCz/+9//cHJyYsiQIcydO5e9e/fSsGFDateubZtbZ4xh48aN9OnTh/j4+Ov0X0Ap9U+iI3RKqVKVcT6HN77ZxZebfycnL4/M87msOubOb8nJzJkzh06dOl20fdeuXXFycsLPz8+2HjBA8+bNbbcsg4ODSU5OplK5Q2w/ls3dsyyjWrkGanoKAdUtv2WPnMlh9qzP6Nr9fgICAkhKSmLQoEGsXbuW2rVrs3nzZp5++mk2btyIiNCrVy+CgoLIy87m/0aPJuj8eYyBOq6uuDk5MbJ6dcYcPUrnTfEEeHszcNo03po8ucjPsXr1ap588klSU1PJzc3Fz8+Pjh078uabb+Ll5QVYHqY4c+bMVV9zpdQ/nyZ0SqlSk5dneHD6z+w8coZzOXkAGOCjtfvwvjWUkSNHsmbNGk6ePFlsH/lXH8lfdqmoVUlMrQY0ru7Chkc9CvVz7j8VWXvSh0VbEnj1tddJSkoiISEBgOjoaKKjo1m4cCGxsbG88847eHp6MnLkSACGDRjAf2rVoq1rOTZmnOWDEycAuNOzIv/nfJyF9W5FPDzIWrGCU6dOMXHiRGbMmAFAs2bNWLZsGXXr1mXTpk3UqVOHsWPH2j6Du7s7n3/+OQB9+/Zl4sSJtpiXLFlS4mutlLqx6C1XpVSpWbf3BLuPpdmSuQsys/P4s2YEg595joCAgGt2voYt7+N4lgsbDlveZ+cakv7MJc8YDmW40W7Aq4wfP56UlBTS09Np3bo1sbGxAKxZswYfHx+8vLyoWLEiaWlptn5P/f471XItn+Hr1L9G0Co4OVHN2YUNZ89iMjP5bUYMy5YtIzIyskAfWdY5dj4+PqSnp/Pll19es8+slLoxaUKnlCo1K3YcJeN8bpH7XL18qN/ugWt6vnLlyvHlwqU8twaCPsog+KMM4v5wItfJjYeWeRDQ701CQkIYPnw4lSpVYuzYsWzatInAwECef/55Zs6cCcB9993HggULCA4OZt26dQz19WX4H4d56OABKjsXXOv4jZo1+ejkSbol/0bf9et4+eWXqV+/PtHR0QwaNIjg4GDc3Nx4/PHHCQgIoGvXroSGhpbo8zg7OxMcHIy/vz89e/a0PVFrDzExMbZafdeKrpKh1JXTlSKUUqVm7KIkZm5IpqivHXdXJ/5zrx8Phde99ic2Bg5ugEM/g2sFuKMzeN18xd0deOQRMjb8dMnjyt1+G/UXL77i8/ydp6cn6enpgOV2bNOmTXnmmWeuWf+XI//Tx0qpoulKEUqpf6R7A2vi4epc5D5jIOqO61SgWgTqRkDkcAgbeFXJHEDlXr2RChUueox4eFC5z4NXdZ6LadWqFXv37uXs2bM8+uijhIaGEhISwsKFC4GLrzLh6enJiy++SFBQEOHh4baHS4paFaNVq1a2uYVgqZ2XmJhoe5+amkq9evXIy7Pcgs7IyKBOnTpkZ2czffp0QkNDCQoK4v7777eNKEZHRzN06FAiIiLw9fW13XLOv0pGcnIyrVq1okmTJjRp0sS2uodSqmia0CmlSk2zupWJqF8Vd9eCXz0ers70a1GPmt6FH15wRBWj7sSlkjc4FfMVKoKTuzveV1EQ2RjDt799S8/FPQn/Xzh3f3E3OXk5pJ1PIycnh2+//ZaAgABee+017rzzTuLj41m9ejWjRo3i7NmzgGWViXnz5rFt2zbmzZvHoUOHADh79izh4eFs3bqV1q1bM336dMCyKsby5cvZunUrixYtAmDAgAHExMQAsHv3bs6dO2crlgzg7e1NUFAQP/zwAwCLFy/mnnvuwdXVle7duxMfH8/WrVu54447+OSTT2ztiis/c0H16tVZsWIFv/zyC/PmzWPo0KFXfC2VuhFoQqeUKjUiwtSHmvLM3Q24ycsdV2ehXtXyvPqvxozu1Mje4ZWYuLpSd9YsXGrUQMqXL7jPwwPnSpWoO3sWzp4XH8UrjjGG0etH83Lcy+w6tYuz2Wc5mnGUc1nnqNWwFiFNQ7jlllt47LHH+O6773jzzTcJDg6mbdu2ZGVlcfDgQeCvVSbc3d1tq0yAZW5h586dAWjatCnJycmAZfQtOjqa6dOnk5trmevYs2dPlixZQnZ2Np9++inR0dGF4u3Vqxfz5s0DYO7cufTq1QuA7du306pVKwICAoiNjSUpKcnWprjyMxdkZ2fb5hn27NmTHTt2XNG1VOpGoWVLlFKlysXZiYGt6zOwdX17h3JVXG++mfrLviVt+XJOzY4l58QJnCtVonKvXnjf1xmnvyV6l+OH33/g+4Pfk5mTWWC7Uzkn6o+rT8d6HXm91euAJfn76quvaNiwYYFjf/755yJLuQC4uroiIoW2T506lZ9//pmlS5cSHBxMQkICVatW5e6772bhwoV8/vnnFDWPuUuXLrzwwgucOnWKzZs3c+eddwKWW6tff/01QUFBxMTEsGbNGlub4srPXDBx4kRq1KjB1q1bycvLw93dvcTXT6kbkY7QKaXUFXJyc8O7Sxdu/Xwet6/6Ht/5X1G51wNXlcwBzEyaWSiZuyAnL4fvDnxn23/PPffw3nvv2ZKiLVu2XPF59+3bR1hYGOPGjcPHx8d2i3bAgAEMHTqU0NBQqlSpUqidp6cnzZs3Z9iwYXTu3Bln65O/aWlp1KxZk+zsbFs5mJJKTU2lZs2aODk5MWvWLNuIoVKqaJrQKaWUg/k97feL7ncSJ05lnQJgzJgxZGdnExgYiL+/P2PGjLni844aNYqAgAD8/f1p3bo1QUFBgOW2rJeXF4888kixbXv16sXs2bNtt1sBXn31VcLCwrj77rtp1Ojybqk/+eSTzJw5k/DwcHbv3k2FSzyEotSNTsuWKKWuyu+//87gwYPZsWMHeXl5dO7cmbfffpty5crZO7Qyq8+SPmw/ub3Y/a5OrqzttRbPcp6lEs8ff/xB27Zt2bVrF07FPQiilCpEy5YopcoEYwzdu3ena9eu7Nmzh927d5Oens6LL75Y4LgLc7RUyTx4x4N4uBT9xK+TOBFxc0SpJXOfffYZYWFhvPbaa5rMKeXA9P9OpdQVW7VqFe7u7rZbcc7OzkycOJFPP/2UKVOm0LNnT+677z7at29fbL20jIwMHnjgAQIDA+nVqxdhYWG2ifdz5syx3QJ87rnnbOctro7aP0WHWzvgV9UPN2e3AtudcKKia0Web164zMf10q9fPw4dOkTPnj1L7ZxKqcunCZ1S6vKc2APr3oFVr5G0Zj5NmzQpsNvLy4tbbrmFnJwcNmzYwMyZM1m1alWx9dKmTJlC5cqVSUxMZMyYMWzevBmw3OZ77rnnWLVqFQkJCcTHx/P1118DxddR+6dwdXJl2t3TGBAwAG83b5zFGVcnVzre2pHP7/uc2hVr2ztEpezq6NGj9O7dm/r16+Pn50enTp3YvXt3kcfmL1h9rY0dO5YJEyZcl74vl5YtUUqVTF4ufP1v2LEQ8nIgLwezBSS9HKQ8A5VusR1qjEFEuPvuu21PRX733XcsWrTI9uV3oV7a+vXrGTZsGAD+/v62orXx8fG0bduWatWqAZalrtauXUvXrl0L1VFbsWJFqV2G0lLOuRyDggbxROATZOVmUc6pHM5ORa+yodSNxBhDt27d6N+/P3PnzgUsRbSPHTtGgwYNrrr/nJwcXFzKXnqkI3RKqZJZ9V/YuQhysiwJHdC4Sg6bfjsNMZ3BuvTTmTNnOHToEM7OzgWeTLxQLy0hIYGEhAQOHjzIHXfcUWQNsgvHF6e4Omr/RCKCh4uHJnPqhpZz4gQpX83n1KzZLH3nHVxdXBg0aJBtf3BwMJGRkYwaNQp/f38CAgJsxa7zy8rK4pFHHiEgIICQkBBWr14NWJbKyz9FJD09naioKJo0aUJAQIBtigjAa6+9RsOGDbnrrrv49ddfbdsTEhIIDw8nMDCQbt26cfr06et4RQrThE4pdWnZWbDxI8guWBst6lZnMrINn/14CPZ9T25uLiNGjCA6Opryf6vFVly9tMjISD7//HMAduzYwbZt2wAICwvjhx9+4MSJE+Tm5jJnzhzatGlzvT+pUsqB5GVmcnjECPbeGcXR//6XP99+mw2T3sV3zx5S5s8vcOz8+fNJSEhg69atrFy5klGjRnHkyJECx3zwwQcAbNu2jTlz5tC/f3+ysrIACkwRcXd3Z8GCBfzyyy+sXr2aESNGYIxh8+bNzJ07ly1btjB//nzi4+Ntfffr14+33nqLxMREAgICeOWVV67z1SlIEzql1KWd3ANS+OtCRFjQqzxfJKZze7veNGjQAHd3d15//fVCxxZXL+3JJ5/k+PHjBAYG8tZbbxEYGIi3tzc1a9bkjTfeoF27dgQFBdGkSRP+9a9/XfePqpRyDCY7mwPR0aSt/B5z/jwmM9Py7/PnMOfPc3Tcq5ye+9co3Pr16+nTpw/Ozs7UqFGDNm3aFEi4Lhzz8MMPA9CoUSPq1q1rm3uXf4qIMYbRo0cTGBjIXXfdxeHDhzl27Bjr1q2jW7dulC9fHi8vL7pY12tOTU0lJSXF9qOzf//+rF279rpfo/zK3k1ipVTpc/GwzKErQh1vJxb39YK2L0Drkbbt0dHRBdb99PDw4KOPPirU3t3dndmzZ+Pu7s6+ffuIioqibt26ADz44IM8+OCDhdqkp6fbXvfo0YMePXpc6SdTSjmoM8uWcW73Hsy5cwW231bOje/S0jBZWRx7803LUnsVKlx0msYFFzsm/xSR2NhYjh8/zubNm3F1daVevXq2kbwL0z0cjY7QKaUurWp9KO9T/H4nF7ijyxV1nZGRQWRkJEFBQXTr1o0PP/xQixIrpTj5yaeYzMJL4IWXL895Y/giJQVESF26lPj4eCpXrsy8efPIzc3l+PHjrF27lubNmxdo27p1a9sydLt37+bgwYOF1kEGy4hb9erVcXV1ZfXq1Rw4cMDWfsGCBWRmZpKWlsbixYsB8Pb2pnLlyqxbtw6AWbNmlfoUER2hU0pdmgh0ehu+iIa/rzHq6gENOkG1K3u6rGLFikUu+K6UurGdtyZRfycivFerFm/8+ScfJ23HY8gQbgsNZdKkSaSnpxMUFISIMH78eG666SaSk5NtbZ988kkGDRpEQEAALi4uxMTE4ObmVugcffv25b777qNZs2YEBwfblq5r0qQJvXr1Ijg4mLp169KqVStbm5kzZzJo0CAyMjLw9fVlxowZTJo06dpelIvQpb+UUiX36zL4ZiRknAQnZ8uTraEDIOolcNbfh0qpa+fX0ObkpaVd/CARqkRHU+O5Z0snqMtUmkt/6TewUqrkGnaABvfAyb2QnQE+DSwjdEopdY1VaBVJ2rLltpJIRREPDzzbti29oByYXebQiUhPEUkSkTwRKZXMVSl1jYiAz+1QM0iTOaXUdVP1sceQi82nFcGlalXKNw8tvaAcmL0eitgOdAdK95lepZRSSpUJHo0bU/2Z4Yi7e+GdLi44eXlR56OPHPap09Jml4TOGLPTGPPrpY9USiml1I2qSr9+1Jk6lfJhzcHZGVxdEQ8PKvfpg++iRbj53mrvEB2GzqFTSimllMOqEB5GhfAw8s6fx2Rl4eTpiThp1bW/u24JnYisBG4qYteLxpiFRWwvrp+BwECAW2655RJHK6WUUuqfyKlcOdAalcW6bgmdMeaua9TPNGAaWMqWXIs+lVJKKaX+SXTMUimllFKqjLNX2ZJuIvI70AJYKiLL7RGHUkopVRaJiG2ReYCcnByqVatG586dAVi0aBFvvvlmse2Tk5Px9/cvct9LL73EypUrr23A6rqzy0MRxpgFwAJ7nFsppZQq6ypUqMD27dvJzMzEw8ODFStWUKtWLdv+Ll260KXLla2vPG7cuGsVpipFestVKaWUKoM6duzI0qVLAZgzZw59+vSx7YuJiWHIkCEAHDt2jG7duhEUFERQUBBxcXEA5Obm8vjjj9O4cWPat29PZqZlnebo6Gi+/PJLAL755hsaNWpEZGQkQ4cOtY0Abty4kYiICEJCQoiIiODXX3+1nbd79+506NCB22+/nWefdcwluf6JNKFTSimlyoj0czmcPZcDQO/evZk7dy5ZWVkkJiYSFhZWZJuhQ4fSpk0btm7dyi+//ELjxo0B2LNnD4MHDyYpKYlKlSrx1VdfFWiXlZXFE088wbfffsv69es5fvy4bV+jRo1Yu3YtW7ZsYdy4cYwePdq2LyEhgXnz5rFt2zbmzZvHoUOHrvVlUEXQOnRKKaWUg/tp/0nGLd7B7mOWxeqzsvPIqlib5ORk5syZQ6dOnYptu2rVKj777DMAnJ2d8fb25vTp09x6660EBwcD0LRpU5KTkwu027VrF76+vtx6q6V4b58+fZg2bRoAqamp9O/fnz179iAiZGdn29pFRUXh7e0NgJ+fHwcOHKBOnTrX5kKoYukInVJKKeXA4vaeIHrGRnYcOUNOniEnz5BnDP0+/ZmQVncxcuTIArdbS8rNzc322tnZmZycnAL7jSm+UtiYMWNo164d27dvZ/HixWRlZZW4X3V9aEKnlFJKObCXFyWRlZ1XaHtWdh57vJvx0ksvERAQUGz7qKgoPvzwQ8Ayb+7MmTMlOm+jRo3Yv3+/beRu3rx5tn2pqam2hzBiYmJK+EnU9aS3XJVSSikHdSL9HAdOZhTeYQzHF77F4aN7OHpTJZYvX17sKN27775LrVq1+OSTT8jLy6N69erMmDEDsNxGTUpKol69ejRp0qRAOw8PD6ZMmUKHDh3w8fGhefPmtn3PPvss/fv355133uHOO+8EYNOmTcTGxtKwYcNr9OnV5ZCLDak6mmbNmplNmzbZOwyllFKqVPyZlkXkW6s5n/PXCJ0xhqOzR+LpH0WVZvfy0wtRHNq7k7S0NFq1alVkP56enqSnpxfYdvToUcLCwjhw4ECx509PT8fT0xNjDIMHD8bX15eRI0demw93AxCRzcaYZqVxLr3lqpRSSjmoap5u3OTlDsZw09mT1D1zBLN/E+LkQsWQTtSuXJ6qnm4EBwcTEhJCVFQUTZo0ISAggIULCy+bnr+gcPv27fnzzz8JDg5m3bp1JCQkEB4eTmBgIN26deP06dNMnz4dT09PqlWrxpdffkl2djZt27blueeeo3nz5jRo0IB169YBsGbNmkuWNVHXj95yVUoppRyUiPBGhUPkzJ2K17l0csWJz0/8SXw5L7Jy0nmpc6jtWHd3dxYsWICXlxcnTpwgPDycLl26ICJF9r1o0SI6d+5MQkICAIGBgbz33nu0adOGl156iVdeeYVJkyaxcOFC/Pz8mDJlCgDLly8nJyeHjRs38s033/DKK68UWlniQlkTFxcXVq5cyejRowuVRVHXliZ0SimllIP6c+Ikqn72GcZa9BfAxeRSO/04z61/l4aj2tm2G2MYPXo0a9euxcnJicOHD3Ps2DFuuummS54nNTWVlJQU2rRpA0D//v3p2bOnbX+vXr0KHN+9e3eg6HInF/orrqyJuj70lqtSSinlgM7t2cOpmTMLJHMAt5VzY2dWJs5pZzj23//atsfGxnL8+HE2b95MQkICNWrUKFBO5GpUqFChwPsLpUmKK0tysbIm6vrQhE4ppZRyQKdmfoYpYmQrvHx5zhvDFydPkv7DD+ScPk18fDwHDhygevXquLq6snr16os+7PB33t7eVK5c2TYfbtasWbbRuiuhZU1KnyZ0SimllAPK3LYNcnMLbRcR3qtVi7iMs7Tf/SuBzZoxduxYOnXqxKZNm2jWrBmxsbE0atToss43c+ZMRo0aRWBgIAkJCbz00ktXHPuzzz7LCy+8QMuWLckt4jM4ChFhxIgRtvcTJkxg7Nix16Rv66hkYxGxFQkUkWdFZGoJYxsrIiV+pFjLliillFIO6LcHepGVmHjRY5wqVOCWmTPx8G9cSlH9s7i7u1OzZk3i4+Px8fFhwoQJpKenX7OkTkT2AMeA1sDNwFqgmTHm9CXauQD/AdKNMRNKci4doVNKKaUckNe9nRAPj4sf5OKCeyMt5HulXFxcGDhwIBMnTiy07/jx49x///2EhoYSGhrKjz/+CEBAQAApKSkYY6hataptndyHH3640NO+wBngCNAPmAiMBbxE5HsRSbT++xYAEYkRkXdEZDXwVv5ORORxEflWRIr9A6EJnVKqTBo+fDiTJk2yvb/nnnsYMGCA7f2IESN45513Stzf2LFjmTCh6B/CERERVxznmjVriIuLu+L26sZVqVs3xNm52P3i4UHVxx5DXLRgxWXJzoKUg3AuDYDBgwcTGxtLampqgcOGDRvG8OHDiY+P56uvvrJ9v7Rs2ZIff/yRpKQkfH19bfMOf/rpJ8LDw4s649PAa0A1Y8ws4H3gM2NMIBALTM53bAPgLmOM7T6wiAwB7gO6GmMKPiGTj/4pUEqVSREREXzxxRc8/fTT5OXlceLEiQJrVMbFxRVI+K7G1SRka9aswdPT86qSQnVjcvby4pZPPubgYwMw2dmYc+csO0QQd3cq3nUXVQc8Zt8gy5JzafDdf2DrPBCBvFzIOYeXSaNfv35MnjwZj3wjoitXrmTHjh2292fOnLGtxrF27Vrq1q3Lv//9b6ZNm8bhw4epUqUKnp6ehU5rjPlDRFYBS6ybWgDdra9nAePzHf6FMSb/pMOHgd+xJHMXrf2iI3RKqTKpZcuWtkQrKSkJf39/KlasyOnTpzl37hw7d+5k+fLlhIaG4u/vz8CBA7kwZ3jy5Mn4+fkRGBhI7969bX3u2LGDtm3b4uvry+TJf/1ovvAlvWbNGtq2bUuPHj1o1KgRffv2tfX5zTff0KhRIyIjIxk6dCidO3cmOTmZqVOnMnHiRFs1/gMHDhAVFUVgYCBRUVEcPHgQgOjoaIYOHUpERAS+vr58+eWXpXIdlWPzCAqi/orv8Hny35SrXx/XWrXwbNuWOlOncvP4txAn/Wu8RHLOw4yOkDAHcjIhOwNyz0FeDkxrw9MDHuKTTz7h7NmztiZ5eXls2LCBhIQEEhISOHz4MBUrVqR169asW7eOdevW0bZtW9sqGsUtu3ahO+s/Rcn/MMPZv+3bDtQDal/qI+qfBKVUmXEk/QiTf5nM0FVD+fjAxxgnw4EDB4iLi6NFixaEhYWxYcMGNm3aRGBgIEOGDCE+Pp7t27eTmZnJkiWWH8hvvvkmW7ZsITExkalT/3rgbNeuXSxfvpyNGzfyyiuvFFkMdcuWLUyaNIkdO3awf/9+fvzxR7KysnjiiSf49ttvWb9+PcePHwegXr16DBo0iOHDh5OQkECrVq0YMmQI/fr1IzExkb59+zJ06NC/Pt+RI6xfv54lS5bw/PPPX+erqcoKl8qV8XniCeovXcJt36+kzodTqBDWvNgVIFQRdiyEk/stSdzfZaZSZdcsHnjgAT755BPb5vbt2/P+++/b3l9YUaNOnTqcOHGCPXv24OvrS2RkJBMmTLhUQpdfHHDhl2RfYP1Fjt0CPAEsEpGbL9apJnRKqTIhZnsMnb/uTExSDKsPrear3V+RVSeLgdMHsv7H9bRo0YIWLVoQFxdHXFwcERERrF69mrCwMAICAli1ahVJSUmAZYmjvn37Mnv2bFzyzT+69957cXNzw8fHh+rVq3Ps2LFCcTRv3pzatWvj5OREcHAwycnJ7Nq1C19fX2699VYA+vTpU+zn2LBhAw8++CBgmUS9fv1f3+Vdu3bFyckJPz+/Is+tlLpCv3wG2X8f/LLKOw9b5zJixAhOnDhh2zx58mTbj0M/P78CP/7CwsJo0KABAK1ateLw4cNERkaWNJqhwCMikojlluqwix1sjFkPjASWiohPccfpHDqllMP74dAPfJDwAedzz9u25ZGHm68bCRsTcD7gTIx/DHXq1OH//u//8PLy4tFHH2XAgAFs2rSJOnXqMHbsWFu1+qVLl7J27VoWLVrEq6++akv0LlS/h+Ir4Bd1zNWUf8o/ypK/77JUUkoph3c+rcjN6aO9LC9yMqlRowYZGRm2fT4+PsybN6/IdrNmzbK9joiIIC+vuLupFsaY6Hyvk4E7L3aM9f3YfK+XA8svdg4doVNKObz3E94nK7fw0kHlby/P6YTTpLumk5GbQZUqVUhJSWHDhg20aNECsHwpp6en2+ak5eXlcejQIdq1a8f48eNJSUkhPT39quJr1KgR+/fvt61pmf8vgYoVK5KW9tdfJhEREcydOxewLNV0Gb/qlVJXql4kOJcrfv/NIaUXy3WiCZ1SyqFl52az+/TuIve513EnNy0Xr9u82Hp8K2CpEeXt7Y2Pjw+PP/44AQEBdO3aldDQUAByc3N56KGHCAgIICQkhOHDh1OpUqWritHDw4MpU6bQoUMHIiMjqVGjBt7e3gDcd999LFiwwPZQxOTJk5kxYwaBgYHMmjWLd99996rOrZQqgeZPgFMxNyVdPaD1s6Ubz3WgK0UopRza+dzzhMaGkmeKv6Xh6erJW63fonXt1qUYWUHp6el4enpijGHw4MHcfvvtDB8+3G7xKKX+Zt8qmPcQGGN5ytXF3bK9/WvQfMDF214hEdlsjGl2XTr/G51Dp5RyaOWcy1HPqx77CBkfAwAACHJJREFUU/cXe8z53PME+gSWYlSFTZ8+nZkzZ3L+/HlCQkJ44okn7BqPUupv6t8JI3ZD0nw4sQe8bgb/HuBZzd6RXRM6QqeUcnjLk5czZv0YMnMLF0l3c3ajfd32vN7qdTtEppRSxSvNETqdQ6eUcnj31LuHvnf0xc3ZDWf5aymk8i7l8avqx5gWY+wYnVJK2Z/eclVKlQnDmg7jXt97+d+u/7E3ZS9V3avSs0FPwm8Ox0n0t6lS6samCZ1Sqsy4rfJtvNTiJXuHoZRSDkd/1iqllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXGa0CmllFJKlXFijLF3DCUmIseBA/aO4yJ8gBP2DqIM0OtUMnqdSkavU8nodSoZvU4lo9epZOoaY6qVxonKVELn6ERkkzGmmb3jcHR6nUpGr1PJ6HUqGb1OJaPXqWT0OjkeveWqlFJKKVXGaUKnlFJKKVXGaUJ3bU2zdwBlhF6nktHrVDJ6nUpGr1PJ6HUqGb1ODkbn0CmllFJKlXE6QqeUUkopVcZpQncNicjbIrJLRBJFZIGIVLJ3TI5KRHqKSJKI5ImIPimVj4h0EJFfRWSviDxv73gclYh8KiJ/ish2e8fiyESkjoisFpGd1v/nhtk7JkckIu4islFEtlqv0yv2jsmRiYiziGwRkSX2jkVZaEJ3ba0A/I0xgcBu4AU7x+PItgPdgbX2DsSRiIgz8AHQEfAD+oiIn32jclgxQAd7B1EG5AAjjDF3AOHAYP0zVaRzwJ3GmCAgGOggIuF2jsmRDQN22jsI9RdN6K4hY8x3xpgc69ufgNr2jMeRGWN2GmN+tXccDqg5sNcYs98Ycx6YC/zLzjE5JGPMWuCUveNwdMaYI8aYX6yv07D8JVzLvlE5HmORbn3rav1HJ5kXQURqA/cCH9s7FvUXTeiun0eBb+0dhCpzagGH8r3/Hf3LV10jIlIPCAF+tm8kjsl6GzEB+BNYYYzR61S0ScCzQJ69A1F/cbF3AGWNiKwEbipi14vGmIXWY17EcpsjtjRjczQluVaqEClim44SqKsmIp7AV8DTxpgz9o7HERljcoFg6/znBSLib4zROZr5iEhn4E9jzGYRaWvveNRfNKG7TMaYuy62X0T6A52BKHOD14S51LVSRfodqJPvfW3gDzvFov4hRMQVSzIXa4yZb+94HJ0xJkVE1mCZo6kJXUEtgS4i0glwB7xEZLYx5iE7x3XD01uu15CIdACeA7oYYzLsHY8qk+KB20XkVhEpB/QGFtk5JlWGiYgAnwA7jTHv2DseRyUi1S5UJhARD+AuYJd9o3I8xpgXjDG1jTH1sHw/rdJkzjFoQndtvQ9UBFaISIKITLV3QI5KRLqJyO9AC2CpiCy3d0yOwPpQzRBgOZbJ658bY5LsG5VjEpE5wAagoYj8LiKP2TsmB9USeBi40/q9lGAdXVEF1QRWi0gilh9WK4wxWpJDlRm6UoRSSimlVBmnI3RKKaWUUmWcJnRKKaWUUmWcJnRKKaWUUmWcJnRKKaWUUmWcJnRKKaWUUmWcJnRKqVIhIrn5ymYkiEg9EYm7zD6eFpHy1ytGRyIiXUXEz95xKKXKBi1bopQqFSKSbozxLMFxztYlmIralww0M8acuNbxORoRiQGWGGO+tHcsSinHpyN0Sim7EZF067/bishqEfkfsE1EKojIUhHZKiLbRaSXiAwFbsZS/HV1EX2Fikictc1GEakoIu4iMkNEtonIFhFpZz02WkS+FpHFIvKbiAwRkWesx/wkIlWsx60RkUnWfreLSHPr9irW9onW4wOt28eKyKfWdvutMV+I7yFrXAki8pGIOF+4BiLymjXun0SkhohEAF2At63H17+u/yGUUmWeJnRKqdLike9264Ii9jcHXjTG+GFZQ/MPY0yQMcYfWGaMmYxlXdt2xph2+Rtal0mbBwwzxgRhWbYpExgMYIwJAPoAM0XE3drMH3jQet7XgAxjTAiW1Sf65eu+gjEmAngS+NS67RVgizEmEBgNfJbv+EbAPdZ+XxYRVxG5A+gFtDTGBAO5QN8L/QM/WeNeCzxujInDsuTbKGNMsDFm36UurlLqxuZi7wCUUjeMTGsyU5yNxpjfrK+3ARNE5C0stx3XXaLvhsARY0w8gDHmDICIRALvWbftEpEDQANrm9XGmDQgTURSgcX5zh2Yr+851vZrRcTLut5nJHC/dfsqEakqIt7W45caY84B50TkT6AGEAU0BeItS6viAfxpPf48cGGJqc3A3Zf4rEopVYgmdEopR3H2wgtjzG4RaQp0At4Qke+MMeMu0laAoiYEy0XanMv3Oi/f+zwKfjf+vV9TTL8Xjsvfb661LwFmGmNeKKJdtvlrMvOF45VS6rLoLVellMMRkZux3AKdDUwAmlh3pQEVi2iyC7hZREKt7SuKiAuWW5h9rdsaALcAv15mOL2s7SOBVGNM6t/6bQucuDAqWIzvgR4iUt3apoqI1L3EeYv7rEopVYj+ElRKOaIALA8E5AHZwL+t26cB34rIkfzz6Iwx50WkF/CeiHhgmT93FzAFmCoi24AcINoYc85627OkTlvLq3gBj1q3jQVmiEgikAH0v1gHxpgdIvIf4DsRcbJ+psHAgYs0mwtMtz5Y0UPn0SmlLkbLliilVDFEZA0w0hizyd6xKKXUxegtV6WUUkqpMk5H6JRSSimlyjgdoVNKKaWUKuM0oVNKKaWUKuM0oVNKKaWUKuM0oVNKKaWUKuM0oVNKKaWUKuM0oVNKKaWUKuP+H1M387PdOxvpAAAAAElFTkSuQmCC\n", + "image/png": "", "text/plain": [ "
" ] @@ -1966,8 +1966,8 @@ "plt.subplots(figsize=(12, 10))\n", "# Note the argument below to make sure we get the colours in the ascending\n", "# order we intuitively expect!\n", - "sns.___(x=___, y=___, size=___, hue=___, \n", - " hue_order=___, data=pca_df)\n", + "sns.scatterplot(x=pca_df.PC1, y=pca_df.PC2, size='AdultWeekend', hue='Quartile', \n", + " hue_order=pca_df.Quartile.cat.categories, data=pca_df)\n", "#and we can still annotate with the state labels\n", "for s, x, y in zip(state, x, y):\n", " plt.annotate(s, (x, y)) \n", @@ -3301,7 +3301,7 @@ "#Show a seaborn heatmap of correlations in ski_data\n", "#Hint: call pandas' `corr()` method on `ski_data` and pass that into `sns.heatmap`\n", "plt.subplots(figsize=(12,10))\n", - "sns.___(ski_data.___);" + "sns.heatmap(ski_data.corr());" ] }, { @@ -3362,7 +3362,7 @@ "#Code task 13#\n", "#Use a list comprehension to build a list of features from the columns of `ski_data` that\n", "#are _not_ any of 'Name', 'Region', 'state', or 'AdultWeekend'\n", - "features = [___ for ___ in ski_data.columns if ___ not in [___, ___, ___, ___]]" + "features = [col for col in ski_data.columns if col not in ['Name', 'Region', 'state', 'AdultWeekend']]" ] }, { @@ -3372,7 +3372,7 @@ "outputs": [ { "data": { - "image/png": "\n", + "image/png": "", "text/plain": [ "
" ] @@ -3420,7 +3420,7 @@ "outputs": [ { "data": { - "image/png": "\n", + "image/png": "", "text/plain": [ "
" ] @@ -3463,7 +3463,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**A: 1** Your answer here" + "**A: 1** I found there were many numerical data features, such as: Total state area, Total state population, Resorts per state, Total skiable area, Total night skiing area, Total days open, Resort density, Average ticket price by state and an Average ticket price scatterplot. I did not find as many categorical data features, however, I was able to find a couple; Top states by order of each of the summary statistics, Top states by resort density. I could not find a specific pattern suggesting a relationship between state and ticket price. I was lead to the conclusion that seaborn is the most comprehensive feature to use regarding subsequent modeling. I've found to always remain wary of the following aspects when performing feature selection: Multicollinearity, Irrelevant features, Overfitting, Feature scaling, Data leakage, Missing values, Feature interaction, Domain knowledge, and Dimensionality." ] }, { diff --git a/Notebooks/Copy of 05_modeling.ipynb b/Notebooks/Copy of 05_modeling.ipynb new file mode 100644 index 000000000..ca8b7bb2b --- /dev/null +++ b/Notebooks/Copy of 05_modeling.ipynb @@ -0,0 +1 @@ +{"cells":[{"cell_type":"markdown","metadata":{"id":"r6ISxyT2XvBL"},"source":["# 5 Modeling"]},{"cell_type":"markdown","metadata":{"id":"2XKgHpiPXvBO"},"source":["## 5.1 Contents\n","* [5 Modeling](#5_Modeling)\n"," * [5.1 Contents](#5.1_Contents)\n"," * [5.2 Introduction](#5.2_Introduction)\n"," * [5.3 Imports](#5.3_Imports)\n"," * [5.4 Load Model](#5.4_Load_Model)\n"," * [5.5 Load Data](#5.5_Load_Data)\n"," * [5.6 Refit Model On All Available Data (excluding Big Mountain)](#5.6_Refit_Model_On_All_Available_Data_(excluding_Big_Mountain))\n"," * [5.7 Calculate Expected Big Mountain Ticket Price From The Model](#5.7_Calculate_Expected_Big_Mountain_Ticket_Price_From_The_Model)\n"," * [5.8 Big Mountain Resort In Market Context](#5.8_Big_Mountain_Resort_In_Market_Context)\n"," * [5.8.1 Ticket price](#5.8.1_Ticket_price)\n"," * [5.8.2 Vertical drop](#5.8.2_Vertical_drop)\n"," * [5.8.3 Snow making area](#5.8.3_Snow_making_area)\n"," * [5.8.4 Total number of chairs](#5.8.4_Total_number_of_chairs)\n"," * [5.8.5 Fast quads](#5.8.5_Fast_quads)\n"," * [5.8.6 Runs](#5.8.6_Runs)\n"," * [5.8.7 Longest run](#5.8.7_Longest_run)\n"," * [5.8.8 Trams](#5.8.8_Trams)\n"," * [5.8.9 Skiable terrain area](#5.8.9_Skiable_terrain_area)\n"," * [5.9 Modeling scenarios](#5.9_Modeling_scenarios)\n"," * [5.9.1 Scenario 1](#5.9.1_Scenario_1)\n"," * [5.9.2 Scenario 2](#5.9.2_Scenario_2)\n"," * [5.9.3 Scenario 3](#5.9.3_Scenario_3)\n"," * [5.9.4 Scenario 4](#5.9.4_Scenario_4)\n"," * [5.10 Summary](#5.10_Summary)\n"," * [5.11 Further work](#5.11_Further_work)\n"]},{"cell_type":"markdown","metadata":{"id":"5orEnEkCXvBP"},"source":["## 5.2 Introduction"]},{"cell_type":"markdown","metadata":{"id":"xdD-fo8tXvBP"},"source":["In this notebook, we now take our model for ski resort ticket price and leverage it to gain some insights into what price Big Mountain's facilities might actually support as well as explore the sensitivity of changes to various resort parameters. Note that this relies on the implicit assumption that all other resorts are largely setting prices based on how much people value certain facilities. Essentially this assumes prices are set by a free market.\n","\n","We can now use our model to gain insight into what Big Mountain's ideal ticket price could/should be, and how that might change under various scenarios."]},{"cell_type":"markdown","metadata":{"id":"W84v0ZrjXvBQ"},"source":["## 5.3 Imports"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Qd1mQvG9XvBQ"},"outputs":[],"source":["import pandas as pd\n","import numpy as np\n","import os\n","import pickle\n","import matplotlib.pyplot as plt\n","import seaborn as sns\n","from sklearn import __version__ as sklearn_version\n","from sklearn.model_selection import cross_validate"]},{"cell_type":"markdown","metadata":{"id":"cchLgSt0XvBR"},"source":["## 5.4 Load Model"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"1b2OAqacXvBS","executionInfo":{"status":"ok","timestamp":1721138649551,"user_tz":240,"elapsed":194,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"63ebb8a3-b585-4ad8-c206-5e6be79c0d4e"},"outputs":[{"output_type":"stream","name":"stdout","text":["Expected model not found\n"]}],"source":["# This isn't exactly production-grade, but a quick check for development\n","# These checks can save some head-scratching in development when moving from\n","# one python environment to another, for example\n","expected_model_version = '1.0'\n","model_path = '../models/ski_resort_pricing_model.pkl'\n","if os.path.exists(model_path):\n"," with open(model_path, 'rb') as f:\n"," model = pickle.load(f)\n"," if model.version != expected_model_version:\n"," print(\"Expected model version doesn't match version loaded\")\n"," if model.sklearn_version != sklearn_version:\n"," print(\"Warning: model created under different sklearn version\")\n","else:\n"," print(\"Expected model not found\")"]},{"cell_type":"markdown","metadata":{"id":"y9dyg7T1XvBS"},"source":["## 5.5 Load Data"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"NDgmXQ2DXvBS"},"outputs":[],"source":["ski_data = pd.read_csv('https://raw.githubusercontent.com/JLindsey96/DataScienceGuidedCapstone/master/raw_data/ski_resort_data.csv')"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"qpOjDCqKXvBT"},"outputs":[],"source":["big_mountain = ski_data[ski_data.Name == 'Big Mountain Resort']"]},{"cell_type":"code","execution_count":null,"metadata":{"scrolled":true,"colab":{"base_uri":"https://localhost:8080/","height":896},"id":"Pq0I8LmeXvBT","executionInfo":{"status":"ok","timestamp":1721138655685,"user_tz":240,"elapsed":179,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"4e514691-3ea3-44b9-f943-b160a39c9ed1"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" 151\n","Name Big Mountain Resort\n","Region Montana\n","state Montana\n","summit_elev 6817\n","vertical_drop 2353\n","base_elev 4464\n","trams 0\n","fastEight 0.0\n","fastSixes 0\n","fastQuads 3\n","quad 2\n","triple 6\n","double 0\n","surface 3\n","total_chairs 14\n","Runs 105.0\n","TerrainParks 4.0\n","LongestRun_mi 3.3\n","SkiableTerrain_ac 3000.0\n","Snow Making_ac 600.0\n","daysOpenLastYear 123.0\n","yearsOpen 72.0\n","averageSnowfall 333.0\n","AdultWeekday 81.0\n","AdultWeekend 81.0\n","projectedDaysOpen 123.0\n","NightSkiing_ac 600.0"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
151
NameBig Mountain Resort
RegionMontana
stateMontana
summit_elev6817
vertical_drop2353
base_elev4464
trams0
fastEight0.0
fastSixes0
fastQuads3
quad2
triple6
double0
surface3
total_chairs14
Runs105.0
TerrainParks4.0
LongestRun_mi3.3
SkiableTerrain_ac3000.0
Snow Making_ac600.0
daysOpenLastYear123.0
yearsOpen72.0
averageSnowfall333.0
AdultWeekday81.0
AdultWeekend81.0
projectedDaysOpen123.0
NightSkiing_ac600.0
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"big_mountain\",\n \"rows\": 27,\n \"fields\": [\n {\n \"column\": 151,\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 19,\n \"samples\": [\n \"Big Mountain Resort\",\n 0,\n 4.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":5}],"source":["big_mountain.T"]},{"cell_type":"markdown","metadata":{"id":"QVlVPMSUXvBU"},"source":["## 5.6 Refit Model On All Available Data (excluding Big Mountain)"]},{"cell_type":"markdown","metadata":{"id":"0BjP7G3lXvBU"},"source":["This next step requires some careful thought. We want to refit the model using all available data. But should we include Big Mountain data? On the one hand, we are _not_ trying to estimate model performance on a previously unseen data sample, so theoretically including Big Mountain data should be fine. One might first think that including Big Mountain in the model training would, if anything, improve model performance in predicting Big Mountain's ticket price. But here's where our business context comes in. The motivation for this entire project is based on the sense that Big Mountain needs to adjust its pricing. One way to phrase this problem: we want to train a model to predict Big Mountain's ticket price based on data from _all the other_ resorts! We don't want Big Mountain's current price to bias this. We want to calculate a price based only on its competitors."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"du2gn7zzXvBU"},"outputs":[],"source":["# Assuming 'model' is an object with an attribute 'X_columns',\n","# you need to define 'model' before using it.\n","# For example, if 'model' is a scikit-learn model:\n","\n","from sklearn.linear_model import LinearRegression\n","\n","# Initialize the model\n","model = LinearRegression()\n","\n","# Define the columns you want to use as features\n","# Replace with the actual names of columns you want to use\n","model.X_columns = [\"summit_elev\", \"vertical_drop\", \"trams\", \"fastEight\"]\n","\n","# Now you can use the 'model' object\n","X = ski_data.loc[ski_data.Name != \"Big Mountain Resort\", model.X_columns]\n","y = ski_data.loc[ski_data.Name != \"Big Mountain Resort\", 'AdultWeekend']"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Q7p1j3bNXvBU","executionInfo":{"status":"ok","timestamp":1721138666337,"user_tz":240,"elapsed":184,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"8aa72191-8241-482b-f480-1dbfbae831b7"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["(329, 329)"]},"metadata":{},"execution_count":7}],"source":["len(X), len(y)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":75},"id":"e_w7Ac74XvBU","executionInfo":{"status":"ok","timestamp":1721138677770,"user_tz":240,"elapsed":147,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"73fb0094-61bc-4bdd-d1e5-cb9216c35dfe"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["LinearRegression()"],"text/html":["
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
"]},"metadata":{},"execution_count":8}],"source":["# Assuming 'ski_data' is a pandas DataFrame\n","import pandas as pd\n","from sklearn.linear_model import LinearRegression\n","\n","# Initialize the model\n","model = LinearRegression()\n","\n","# Define the columns you want to use as features\n","model.X_columns = [\"summit_elev\", \"vertical_drop\", \"trams\", \"fastEight\"]\n","\n","# Handle missing values (NaN) in 'ski_data'\n","# Option 1: Drop rows with missing values in BOTH X and y\n","ski_data_cleaned = ski_data.dropna(subset=model.X_columns + ['AdultWeekend'])\n","\n","# Option 2: Fill missing values with a specific value (e.g., 0) in BOTH X and y\n","# ski_data_cleaned = ski_data.fillna(0)\n","\n","# Now you can use the 'model' object with the cleaned data\n","X = ski_data_cleaned.loc[ski_data_cleaned.Name != \"Big Mountain Resort\", model.X_columns]\n","y = ski_data_cleaned.loc[ski_data_cleaned.Name != \"Big Mountain Resort\", 'AdultWeekend']\n","\n","# Fit the model\n","model.fit(X, y)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"mqKnM6zfXvBU"},"outputs":[],"source":["cv_results = cross_validate(model, X, y, scoring='neg_mean_absolute_error', cv=5, n_jobs=-1)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Fua9q6edXvBV","executionInfo":{"status":"ok","timestamp":1721138688955,"user_tz":240,"elapsed":167,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"11155a85-6d01-4418-e6e1-ae3c41e5c2a6"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([-15.91489073, -11.35405252, -13.66349 , -13.94094958,\n"," -16.69944277])"]},"metadata":{},"execution_count":10}],"source":["cv_results['test_score']"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OZL0Svh_XvBV","executionInfo":{"status":"ok","timestamp":1721138690822,"user_tz":240,"elapsed":151,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"5f699c63-a6d5-4b60-eb2a-8c4cfda59ca8"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["(14.314565119673142, 1.8749296358815086)"]},"metadata":{},"execution_count":11}],"source":["mae_mean, mae_std = np.mean(-1 * cv_results['test_score']), np.std(-1 * cv_results['test_score'])\n","mae_mean, mae_std"]},{"cell_type":"markdown","metadata":{"id":"RMxod_L1XvBV"},"source":["These numbers will inevitably be different to those in the previous step that used a different training data set. They should, however, be consistent. It's important to appreciate that estimates of model performance are subject to the noise and uncertainty of data!"]},{"cell_type":"markdown","metadata":{"id":"w1uzOY9fXvBV"},"source":["## 5.7 Calculate Expected Big Mountain Ticket Price From The Model"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Lma2-VIEXvBV"},"outputs":[],"source":["X_bm = ski_data.loc[ski_data.Name == \"Big Mountain Resort\", model.X_columns]\n","y_bm = ski_data.loc[ski_data.Name == \"Big Mountain Resort\", 'AdultWeekend']"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"iAu8BCwoXvBV"},"outputs":[],"source":["bm_pred = model.predict(X_bm).item()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"RCH-2580XvBW"},"outputs":[],"source":["y_bm = y_bm.values.item()"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Db4ik3C2XvBW","executionInfo":{"status":"ok","timestamp":1721138729667,"user_tz":240,"elapsed":148,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"21a4518f-f2bb-4ed5-bacb-5656db2142ca"},"outputs":[{"output_type":"stream","name":"stdout","text":["Big Mountain Resort modelled price is $82.53, actual price is $81.00.\n","Even with the expected mean absolute error of $14.31, this suggests there is room for an increase.\n"]}],"source":["print(f'Big Mountain Resort modelled price is ${bm_pred:.2f}, actual price is ${y_bm:.2f}.')\n","print(f'Even with the expected mean absolute error of ${mae_mean:.2f}, this suggests there is room for an increase.')"]},{"cell_type":"markdown","metadata":{"id":"tn9lbbzkXvBW"},"source":["This result should be looked at optimistically and doubtfully! The validity of our model lies in the assumption that other resorts accurately set their prices according to what the market (the ticket-buying public) supports. The fact that our resort seems to be charging that much less that what's predicted suggests our resort might be undercharging.\n","But if ours is mispricing itself, are others? It's reasonable to expect that some resorts will be \"overpriced\" and some \"underpriced.\" Or if resorts are pretty good at pricing strategies, it could be that our model is simply lacking some key data? Certainly we know nothing about operating costs, for example, and they would surely help."]},{"cell_type":"markdown","metadata":{"id":"C-p69uVLXvBW"},"source":["## 5.8 Big Mountain Resort In Market Context"]},{"cell_type":"markdown","metadata":{"id":"fosXfx2HXvBW"},"source":["Features that came up as important in the modeling (not just our final, random forest model) included:\n","* vertical_drop\n","* Snow Making_ac\n","* total_chairs\n","* fastQuads\n","* Runs\n","* LongestRun_mi\n","* trams\n","* SkiableTerrain_ac"]},{"cell_type":"markdown","metadata":{"id":"wPU523JLXvBW"},"source":["A handy glossary of skiing terms can be found on the [ski.com](https://www.ski.com/ski-glossary) site. Some potentially relevant contextual information is that vertical drop, although nominally the height difference from the summit to the base, is generally taken from the highest [_lift-served_](http://verticalfeet.com/) point."]},{"cell_type":"markdown","metadata":{"id":"QnNgg9hpXvBW"},"source":["It's often useful to define custom functions for visualizing data in meaningful ways. The function below takes a feature name as an input and plots a histogram of the values of that feature. It then marks where Big Mountain sits in the distribution by marking Big Mountain's value with a vertical line using `matplotlib`'s [axvline](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.axvline.html) function. It also performs a little cleaning up of missing values and adds descriptive labels and a title."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"rJa5shoGXvBW"},"outputs":[],"source":["#Code task 1#\n","#Add code to the `plot_compare` function that displays a vertical, dashed line\n","#on the histogram to indicate Big Mountain's position in the distribution\n","#Hint: plt.axvline() plots a vertical line, its position for 'feature1'\n","#would be `big_mountain['feature1'].values, we'd like a red line, which can be\n","#specified with c='r', a dashed linestyle is produced by ls='--',\n","#and it's nice to give it a slightly reduced alpha value, such as 0.8.\n","#Don't forget to give it a useful label (e.g. 'Big Mountain') so it's listed\n","#in the legend.\n","import matplotlib.pyplot as plt # Import the matplotlib.pyplot module\n","\n","def plot_compare(feat_name, description, state=None, figsize=(10, 5)):\n"," \"\"\"Graphically compare distributions of features.\n","\n"," Plot histogram of values for all resorts and reference line to mark\n"," Big Mountain's position.\n","\n"," Arguments:\n"," feat_name - the feature column name in the data\n"," description - text description of the feature\n"," state - select a specific state (None for all states)\n"," figsize - (optional) figure size\n"," \"\"\"\n","\n"," plt.subplots(figsize=figsize)\n"," # quirk that hist sometimes objects to NaNs, sometimes doesn't\n"," # filtering only for finite values tidies this up\n"," if state is None:\n"," ski_x = ski_data[feat_name]\n"," else:\n"," ski_x = ski_data.loc[ski_data.state == state, feat_name]\n"," ski_x = ski_x[np.isfinite(ski_x)]\n"," plt.hist(ski_x, bins=30)\n"," plt.axvline(x=big_mountain[feat_name].values, c='r', ls='--', alpha=0.8, label='Big Mountain')\n"," plt.xlabel(description)\n"," plt.ylabel('frequency')\n"," plt.title(description + ' distribution for resorts in market share')\n"," plt.legend()"]},{"cell_type":"markdown","metadata":{"id":"-y5Q3dEtXvBX"},"source":["### 5.8.1 Ticket price"]},{"cell_type":"markdown","metadata":{"id":"hNQZFl7wXvBX"},"source":["Look at where Big Mountain sits overall amongst all resorts for price and for just other resorts in Montana."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"Dk4ghqFFXvBX","executionInfo":{"status":"ok","timestamp":1721138838597,"user_tz":240,"elapsed":382,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"f425b0e8-38ad-48ee-9071-b624dc2fb85c"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('AdultWeekend', 'Adult weekend ticket price ($)')"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"B2Xbqot-XvBX","executionInfo":{"status":"ok","timestamp":1721138871501,"user_tz":240,"elapsed":485,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"96d41bd8-9d70-4d45-f895-e9999f7a306f"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('AdultWeekend', 'Adult weekend ticket price ($) - Montana only', state='Montana')"]},{"cell_type":"markdown","metadata":{"id":"6VpYubEYXvBX"},"source":["### 5.8.2 Vertical drop"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"clXnvyC3XvBY","executionInfo":{"status":"ok","timestamp":1721138882254,"user_tz":240,"elapsed":506,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"c548c394-25a2-49df-9d13-86ecf336d452"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('vertical_drop', 'Vertical drop (feet)')"]},{"cell_type":"markdown","metadata":{"id":"8owrkoAsXvBY"},"source":["Big Mountain is doing well for vertical drop, but there are still quite a few resorts with a greater drop."]},{"cell_type":"markdown","metadata":{"id":"P89un3eLXvBd"},"source":["### 5.8.3 Snow making area"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"noE5PISxXvBd","executionInfo":{"status":"ok","timestamp":1721138899684,"user_tz":240,"elapsed":420,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"f6438436-fd2e-4afb-8a10-83dcfe8497c4"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('Snow Making_ac', 'Area covered by snow makers (acres)')"]},{"cell_type":"markdown","metadata":{"id":"mf_Mp_DeXvBd"},"source":["Big Mountain is very high up the league table of snow making area."]},{"cell_type":"markdown","metadata":{"id":"xnDZDOE1XvBd"},"source":["### 5.8.4 Total number of chairs"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"tQZZA50vXvBd","executionInfo":{"status":"ok","timestamp":1721138926936,"user_tz":240,"elapsed":383,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"ccbf2e34-1eec-45b1-bd5d-f4f932b358c9"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('total_chairs', 'Total number of chairs')"]},{"cell_type":"markdown","metadata":{"id":"r8Xf-gU-XvBe"},"source":["Big Mountain has amongst the highest number of total chairs, resorts with more appear to be outliers."]},{"cell_type":"markdown","metadata":{"id":"E_dsjupAXvBe"},"source":["### 5.8.5 Fast quads"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"KTR4M9XWXvBe","executionInfo":{"status":"ok","timestamp":1721138944231,"user_tz":240,"elapsed":432,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"a6602cfb-5996-447a-e395-d7f126119a7b"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('fastQuads', 'Number of fast quads')"]},{"cell_type":"markdown","metadata":{"id":"MUTsI3PeXvBe"},"source":["Most resorts have no fast quads. Big Mountain has 3, which puts it high up that league table. There are some values much higher, but they are rare."]},{"cell_type":"markdown","metadata":{"id":"m_tR6skVXvBe"},"source":["### 5.8.6 Runs"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"D6ym1RcSXvBe","executionInfo":{"status":"ok","timestamp":1721138957714,"user_tz":240,"elapsed":670,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"25c17a82-422d-4479-bc4a-5b0b89f382ef"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('Runs', 'Total number of runs')"]},{"cell_type":"markdown","metadata":{"id":"Mw5KtnKZXvBf"},"source":["Big Mountain compares well for the number of runs. There are some resorts with more, but not many."]},{"cell_type":"markdown","metadata":{"id":"vkZqqNhgXvBf"},"source":["### 5.8.7 Longest run"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"c4KrILuHXvBf","executionInfo":{"status":"ok","timestamp":1721138967787,"user_tz":240,"elapsed":422,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"cd787e98-0eb2-4bed-964e-6713ba9b3ab8"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["plot_compare('LongestRun_mi', 'Longest run length (miles)')"]},{"cell_type":"markdown","metadata":{"id":"iSwKop2eXvBf"},"source":["Big Mountain has one of the longest runs. Although it is just over half the length of the longest, the longer ones are rare."]},{"cell_type":"markdown","metadata":{"id":"Z8bTdjxdXvBf"},"source":["### 5.8.8 Trams"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"T-V_w4i8XvBf","executionInfo":{"status":"ok","timestamp":1721138984560,"user_tz":240,"elapsed":631,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"1ca04ab5-9a39-4c6d-c512-343e49967dcd"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"iVBORw0KGgoAAAANSUhEUgAAA1IAAAHWCAYAAAB9mLjgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABdPElEQVR4nO3dd3gU1fv38c+mbUJCEgKkADFU6UVpBhAQAqGKAmKhi+hXg4CAAjaKSrMgIs1GsQOKBRSkCYJIb9IEpEpCqAk1bef5g1/2YUkhExJ2gffruvZi58zZmXvOnp3szZk5azEMwxAAAAAAIMfcnB0AAAAAANxqSKQAAAAAwCQSKQAAAAAwiUQKAAAAAEwikQIAAAAAk0ikAAAAAMAkEikAAAAAMIlECgAAAABMIpECAAAAAJNIpABIkn7//XdZLBbNnTvX2aHkyPHjx9WxY0cVLlxYFotF77//vrNDcgkzZsyQxWLRwYMH7WWNGzdW48aNb8r+LRaLhg8fbl8ePny4LBaLTp48eVP2X7JkSfXo0eOm7Ota69evV7169eTr6yuLxaItW7Y4JY7bXfq56vfff3d2KHmqcePGqlKlirPDyFLJkiXVpk0bZ4cBuBQSKeAmSv+S6+3trf/++y/Delf/Q+pKXnjhBS1atEhDhw7V559/rhYtWmRa7+LFixo+fPht96Urv/35558aPny4zp496+xQMnDF2FJSUvTII4/o9OnTGj9+vD7//HNFREQ4OyyXMHnyZM2YMcPZYdxRdu7cqeHDhzv8hwqAvOfh7ACAO1FSUpLGjBmjiRMnOjuUW9ayZcvUrl07DRo0KNt6Fy9e1IgRIyTppo3KuJrffvvN9Gv+/PNPjRgxQj169FBgYGCOX3fp0iV5eOTvn5bsYtuzZ4/c3G7+/xHu379fhw4d0scff6ynnnrqpu/flU2ePFlFihTJs5HChg0b6tKlS/Ly8sqT7d2Odu7cqREjRqhx48YqWbKks8MBbluMSAFOUKNGDX388cc6duyYs0O56S5cuJAn24mPjzf1BT+n8io+V+Ll5ZWvXzptNpsuX74sSfL29s73RCo7VqtVnp6eN32/8fHxkpSnfdJsX3S1vnvx4sV82a6bm5u8vb2dkjDnB1d735zp8uXLstlszg4DyLHb4ywE3GJefvllpaWlacyYMdnWO3jwoCwWS6aXxWR1L8o///yjLl26KCAgQEWLFtVrr70mwzB05MgRtWvXTv7+/goNDdW7776b6T7T0tL08ssvKzQ0VL6+vnrwwQd15MiRDPXWrl2rFi1aKCAgQAUKFFCjRo20evVqhzrpMe3cuVNPPPGEChUqpAYNGmR7zP/++68eeeQRBQUFqUCBArrvvvu0YMEC+/r0yyMNw9CkSZNksVhksViybL+iRYtKkkaMGGGvm95uPXr0kJ+fn/bv369WrVqpYMGC6ty5syTpjz/+0COPPKK77rpLVqtV4eHheuGFF3Tp0iWHfaRv4/Dhw2rTpo38/PxUvHhxTZo0SZK0fft2NWnSRL6+voqIiNBXX33l8PqUlBSNGDFC5cqVk7e3twoXLqwGDRpo8eLF2baTJO3YsUNNmjSRj4+PSpQooTfffDPTLyGZ3SM1ceJEVa5cWQUKFFChQoVUq1Yte2zDhw/Xiy++KEkqVaqUvd3SLxOyWCzq06ePvvzyS1WuXFlWq1ULFy60r7u6X6Y7efKkOnXqJH9/fxUuXFj9+vWzJ19Szvv69WLL7B6p6/Up6f/fdzN79my99dZbKlGihLy9vdW0aVPt27cvQ0xX69Gjhxo1aiRJeuSRR2SxWBzae9myZbr//vvl6+urwMBAtWvXTrt27XLYhtnPSvrnYMWKFXruuecUHBysEiVK2Nf/+uuv9n0WLFhQrVu31o4dOxy2ERcXp549e6pEiRKyWq0KCwtTu3btMlwONnnyZPv7XKxYMcXExGS4rDL9suSNGzeqYcOGKlCggF5++WWVLFlSO3bs0IoVK+zvVXrb5LbvZ3aPVPr+d+7cqQceeEAFChRQ8eLFNW7cuGy3lS69T8+ZM0eVKlWSj4+PIiMjtX37dknStGnTVLZsWXl7e6tx48YZ2sjs+SKzc05mfvvtNxUoUECPP/64UlNTJUm7d+9Wx44dFRQUJG9vb9WqVUs//fST/TUzZszQI488Ikl64IEH7O2e3eXNOe0LkrRq1SrVqVNH3t7eKl26tGbNmuWw/vTp0xo0aJCqVq0qPz8/+fv7q2XLltq6datDvfT38ZtvvtGrr76q4sWLq0CBAkpMTJSUs78xgLNxaR/gBKVKlVK3bt308ccfa8iQISpWrFiebfvRRx9VxYoVNWbMGC1YsEBvvvmmgoKCNG3aNDVp0kRjx47Vl19+qUGDBql27dpq2LChw+vfeustWSwWDR48WPHx8Xr//fcVFRWlLVu2yMfHR9KVL4YtW7ZUzZo1NWzYMLm5uWn69Olq0qSJ/vjjD9WpU8dhm4888ojKlSunUaNGyTCMLGM/fvy46tWrp4sXL6pv374qXLiwZs6cqQcffFBz587Vww8/rIYNG+rzzz9X165d1axZM3Xr1i3L7RUtWlRTpkzRs88+q4cffljt27eXJFWrVs1eJzU1VdHR0WrQoIHeeecdFShQQJI0Z84cXbx4Uc8++6wKFy6sdevWaeLEiTp69KjmzJnjsJ+0tDS1bNlSDRs21Lhx4/Tll1+qT58+8vX11SuvvKLOnTurffv2mjp1qrp166bIyEiVKlVK0pUv0KNHj9ZTTz2lOnXqKDExURs2bNCmTZvUrFmzLI8tLi5ODzzwgFJTUzVkyBD5+vrqo48+sr9H2fn444/Vt29fdezY0Z7QbNu2TWvXrtUTTzyh9u3b659//tHXX3+t8ePHq0iRIvb2TLds2TLNnj1bffr0UZEiRa57+VCnTp1UsmRJjR49Wn/99Zc++OADnTlzJsOXsOvJSWxXy0mfutqYMWPk5uamQYMGKSEhQePGjVPnzp21du3aLGN65plnVLx4cY0aNUp9+/ZV7dq1FRISIklasmSJWrZsqdKlS2v48OG6dOmSJk6cqPr162vTpk0Z2i2nn5V0zz33nIoWLarXX3/dPrLx+eefq3v37oqOjtbYsWN18eJFTZkyRQ0aNNDmzZvt++zQoYN27Nih559/XiVLllR8fLwWL16sw4cP2+sMHz5cI0aMUFRUlJ599lnt2bNHU6ZM0fr167V69WqH0b9Tp06pZcuWeuyxx9SlSxeFhISocePGev755+Xn56dXXnlFkuxtk9u+n5UzZ86oRYsWat++vTp16qS5c+dq8ODBqlq1qlq2bHnd1//xxx/66aefFBMTI0kaPXq02rRpo5deekmTJ0/Wc889pzNnzmjcuHF68skntWzZMvtrzZwvsjrnXGv+/Pnq2LGjHn30UX322Wdyd3fXjh07VL9+fRUvXtz+uZ89e7Yeeughfffdd/ZzZN++ffXBBx/o5ZdfVsWKFSXJ/m9mctIXJGnfvn3q2LGjevXqpe7du+uzzz5Tjx49VLNmTVWuXFnSlf+4+OGHH/TII4+oVKlSOn78uKZNm6ZGjRpp586dGf7evfHGG/Ly8tKgQYOUlJQkLy8v039jAKcxANw006dPNyQZ69evN/bv3294eHgYffv2ta9v1KiRUblyZfvygQMHDEnG9OnTM2xLkjFs2DD78rBhwwxJxtNPP20vS01NNUqUKGFYLBZjzJgx9vIzZ84YPj4+Rvfu3e1ly5cvNyQZxYsXNxITE+3ls2fPNiQZEyZMMAzDMGw2m1GuXDkjOjrasNls9noXL140SpUqZTRr1ixDTI8//niO2qd///6GJOOPP/6wl507d84oVaqUUbJkSSMtLc3h+GNiYq67zRMnTmRoq3Tdu3c3JBlDhgzJsO7ixYsZykaPHm1YLBbj0KFDGbYxatQoe1l6+1osFuObb76xl+/evTtDLNWrVzdat2593eO4VnpbrV271l4WHx9vBAQEGJKMAwcO2MsbNWpkNGrUyL7crl07h36WmbfffjvDdtJJMtzc3IwdO3Zkui6zfvnggw861HvuuecMScbWrVsNwzDX17OLLSIiwqFf57RPpff/ihUrGklJSfa6EyZMMCQZ27dvz7Cvq6W/fs6cOQ7lNWrUMIKDg41Tp07Zy7Zu3Wq4ubkZ3bp1s5eZ/aykn0saNGhgpKamOhxbYGCg0bt3b4f6cXFxRkBAgL38zJkzhiTj7bffznIf8fHxhpeXl9G8eXOHz96HH35oSDI+++wze1mjRo0MScbUqVMzbKdy5coO/S9dbvt+elsvX748w/5nzZplL0tKSjJCQ0ONDh06XHebkgyr1erQp6ZNm2ZIMkJDQx3OiUOHDs3Q/8yeLzI751x9/v/uu+8MT09Po3fv3g5t37RpU6Nq1arG5cuX7WU2m82oV6+eUa5cOXvZnDlzMrRRVnLSFwzjymdLkrFy5Up7WXx8vGG1Wo2BAwfayy5fvuwQs2Fc+XxbrVZj5MiR9rL097F06dIO7WfmbwzgbFzaBzhJ6dKl1bVrV3300UeKjY3Ns+1efaO7u7u7atWqJcMw1KtXL3t5YGCgypcvr3///TfD67t166aCBQvalzt27KiwsDD98ssvkqQtW7Zo7969euKJJ3Tq1CmdPHlSJ0+e1IULF9S0aVOtXLkyw+Vl//vf/3IU+y+//KI6deo4XNLk5+enp59+WgcPHtTOnTtz1ggmPfvssxnKrh7ZuXDhgk6ePKl69erJMAxt3rw5Q/2r2z29fX19fdWpUyd7efny5RUYGOjQ7oGBgdqxY4f27t1rKuZffvlF9913n8P/zBYtWjTby4Su3ufRo0e1fv16U/u8WqNGjVSpUqUc10//X/50zz//vCTZ+1V+Mdunevbs6XA/2f333y9JmX5Wric2NlZbtmxRjx49FBQUZC+vVq2amjVrlumx5/Szkq53795yd3e3Ly9evFhnz57V448/bv9snjx5Uu7u7qpbt66WL18u6Ur/9vLy0u+//64zZ85kuu0lS5YoOTlZ/fv3d7gfqXfv3vL3989weaTValXPnj1zHHtu+35W/Pz81KVLF/uyl5eX6tSpk+P3rmnTpg6jL3Xr1pV0ZbTm6nNievnV2zV7vsjsnJPu66+/1qOPPqpnnnlG06ZNs7f96dOntWzZMnXq1Ennzp2zv7enTp1SdHS09u7dm+lssNeTk76QrlKlSvbPhHTlnHPt3xKr1WqPOS0tTadOnZKfn5/Kly+vTZs2Zdhm9+7dHdovN39jAGchkQKc6NVXX1Vqaup175Uy46677nJYDggIkLe3t/0SqKvLM/ujWa5cOYdli8WismXL2q+VT//S0717dxUtWtTh8cknnygpKUkJCQkO20i/jO16Dh06pPLly2coT78k5dChQznajhkeHh4O95akO3z4sP0LsJ+fn4oWLWq/D+ba4/P29s5waVlAQIBKlCiR4f6ta9t95MiROnv2rO6++25VrVpVL774orZt23bduA8dOpThvZKUaftda/DgwfLz81OdOnVUrlw5xcTEmL73IKfvabprYy1Tpozc3NzyfXpms33q2s9PoUKFJOm6XzCz2reU+XtSsWJF+5fDq5lt12vrp38+mzRpkuHz+dtvv9knxbBarRo7dqx+/fVXhYSE2C9LjYuLu278Xl5eKl26dIa2K168uKlJTXLb97OS2eetUKFCOX7vMjt3SlJ4eHim5Vdv18z5IqtzjiQdOHBAXbp0UYcOHTRx4kSH49m3b58Mw9Brr72W4b0dNmyYpP8/6YkZOekL6a5tIyljG9tsNo0fP17lypWT1WpVkSJFVLRoUW3bti1DW0hZ92Ezf2MAZ+EeKcCJSpcurS5duuijjz7SkCFDMqzPahKFtLS0LLd59f9OZ1cmKUf3YFwr/X8C3377bdWoUSPTOn5+fg7LOblvx1mu/t/TdGlpaWrWrJlOnz6twYMHq0KFCvL19dV///2nHj16ZPjf0KzaNyft3rBhQ+3fv18//vijfvvtN33yyScaP368pk6dmm/TaFesWFF79uzR/PnztXDhQn333XeaPHmyXn/9dftU8ddzo+/ptX07N309P+TlZyU3zLbrtfXT++bnn3+u0NDQDPWvnlGxf//+atu2rX744QctWrRIr732mkaPHq1ly5bpnnvuyffY87rv3+h7l9vPsdnzRWbnnHRhYWH2KwA2bNigWrVq2delb2fQoEGKjo7O9PVly5bN/iCzkNO+kJM2HjVqlF577TU9+eSTeuONNxQUFCQ3Nzf1798/05GkrPqwmb8xgLOQSAFO9uqrr+qLL77Q2LFjM6xL/9/wa2fIyo+RmXTXXmZjGIb27dtnn6ChTJkykiR/f39FRUXl6b4jIiK0Z8+eDOW7d++2rzcrqy/o2dm+fbv++ecfzZw502Eyi5zMpJcbQUFB6tmzp3r27Knz58+rYcOGGj58eLZfJiMiIjK9JCqz9suMr6+vHn30UT366KNKTk5W+/bt9dZbb2no0KHy9vbOVbtlZ+/evQ7/87xv3z7ZbDb7pVRm+rqZ2PKjT5nZt5T5e7J7924VKVJEvr6+ebrP9M9ncHBwjj6fZcqU0cCBAzVw4EDt3btXNWrU0LvvvqsvvvjCIf7SpUvbX5OcnKwDBw7k+POf3fuVm77vavLyfOHt7a358+erSZMmatGihVasWGGfxCH9PfD09Lxu2+fm85tdXzBj7ty5euCBB/Tpp586lJ89ezbDlRFZxSHlz98YIK9xaR/gZGXKlFGXLl00bdq0DJdS+Pv7q0iRIlq5cqVD+eTJk/MtnlmzZuncuXP25blz5yo2NtY+61XNmjVVpkwZvfPOOzp//nyG1584cSLX+27VqpXWrVunNWvW2MsuXLigjz76SCVLljR1T0669Bmxrv2Cnp30/3W9+n9ZDcPQhAkTTO//ek6dOuWw7Ofnp7JlyyopKSnb17Vq1Up//fWX1q1bZy87ceKEvvzyS9P79PLyUqVKlWQYhlJSUiTJ/gXfTLtlJ306+HTpP0ad3q/M9HUzseVHn8qpsLAw1ahRQzNnznSI9e+//9Zvv/2mVq1a5fk+o6Oj5e/vr1GjRtnfy6ulfz4vXrzoMP28dOVcVLBgQXvfi4qKkpeXlz744AOHz8Knn36qhIQEtW7dOkcx+fr6Zvpe5bbvu5q8Pl8EBARo0aJFCg4OVrNmzbR//35JV5Ljxo0ba9q0aZneV3v1udfMZyQnfcEMd3f3DKOAc+bMyfH9W/n5NwbIa4xIAS7glVde0eeff649e/bY//cx3VNPPaUxY8boqaeeUq1atbRy5Ur9888/+RZLUFCQGjRooJ49e+r48eN6//33VbZsWfXu3VvSlR/D/OSTT9SyZUtVrlxZPXv2VPHixfXff/9p+fLl8vf3188//5yrfQ8ZMkRff/21WrZsqb59+yooKEgzZ87UgQMH9N133+XqBzh9fHxUqVIlffvtt7r77rsVFBSkKlWqqEqVKlm+pkKFCipTpowGDRqk//77T/7+/vruu+9ydZ/M9VSqVEmNGzdWzZo1FRQUpA0bNmju3Lnq06dPtq976aWX9Pnnn6tFixbq16+fffrziIiI695n0rx5c4WGhqp+/foKCQnRrl279OGHH6p169b2m+pr1qwp6UrffOyxx+Tp6am2bdvmegTlwIEDevDBB9WiRQutWbNGX3zxhZ544glVr17dXienfd1MbPnRp8x4++231bJlS0VGRqpXr1726c8DAgIy/b2tG+Xv768pU6aoa9euuvfee/XYY4+paNGiOnz4sBYsWKD69evrww8/1D///KOmTZuqU6dOqlSpkjw8PDRv3jwdP35cjz32mKQrEwkMHTpUI0aMUIsWLfTggw9qz549mjx5smrXru0wsUN2atasqSlTpujNN99U2bJlFRwcrCZNmuS677ua/DhfFClSRIsXL1aDBg0UFRWlVatW2X+frkGDBqpatap69+6t0qVL6/jx41qzZo2OHj1q/62mGjVqyN3dXWPHjlVCQoKsVquaNGmi4ODgDPvKSV8wo02bNho5cqR69uypevXqafv27fryyy8dRjWzk59/Y4A8d7OnCQTuZFdPf36t9Glxr52W+uLFi0avXr2MgIAAo2DBgkanTp2M+Pj4LKeZPnHiRIbt+vr6ZtjftVOtp09F+/XXXxtDhw41goODDR8fH6N169YO0/em27x5s9G+fXujcOHChtVqNSIiIoxOnToZS5cuvW5M2dm/f7/RsWNHIzAw0PD29jbq1KljzJ8/P0M95XD6c8MwjD///NOoWbOm4eXl5dBuWbWNYRjGzp07jaioKMPPz88oUqSI0bt3b2Pr1q0ZpujOafumi4iIcJjy+c033zTq1KljBAYGGj4+PkaFChWMt956y0hOTr7ucW3bts1o1KiR4e3tbRQvXtx44403jE8//fS6059PmzbNaNiwof29K1OmjPHiiy8aCQkJDtt/4403jOLFixtubm4O28yu7bPqlzt37jQ6duxoFCxY0ChUqJDRp08f49KlSw6vzWlfzy62a6c/N4yc9amspi/Pblr2nLzeMAxjyZIlRv369Q0fHx/D39/faNu2rbFz506HOmY/K9mdS9LjiY6ONgICAgxvb2+jTJkyRo8ePYwNGzYYhmEYJ0+eNGJiYowKFSoYvr6+RkBAgFG3bl1j9uzZGbb14YcfGhUqVDA8PT2NkJAQ49lnnzXOnDnjUCer/m4YV6Zeb926tVGwYEFDkr0v5rbvZzX9eWb77969uxEREZHt9gwj8z6d/t5fOy14Zu/1jZ4vsjqGffv2GWFhYUbFihXtfWP//v1Gt27djNDQUMPT09MoXry40aZNG2Pu3LkOr/3444+N0qVLG+7u7tlOhZ7TvnDtuevquK8+v1y+fNkYOHCgERYWZvj4+Bj169c31qxZk6Fedp8Zw8jZ3xjA2SyGcZPuoAUAAACA2wT3SAEAAACASSRSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUgAAAABgEj/IK8lms+nYsWMqWLCgLBaLs8MBAAAA4CSGYejcuXMqVqxYtj/cTiIl6dixYwoPD3d2GAAAAABcxJEjR1SiRIks15NISSpYsKCkK43l7+/vvEBsNun48SvPQ0KkbDJgAAAAAHkvMTFR4eHh9hwhKyRSkv1yPn9/f+cmUpcuSZ07X3n+xx+Sj4/zYgEAAADuYNe75YchDwAAAAAwiUQKAAAAAExyaiI1ZcoUVatWzX5JXWRkpH799Vf7+suXLysmJkaFCxeWn5+fOnTooOPp9xD9n8OHD6t169YqUKCAgoOD9eKLLyo1NfVmHwoAAACAO4hT75EqUaKExowZo3LlyskwDM2cOVPt2rXT5s2bVblyZb3wwgtasGCB5syZo4CAAPXp00ft27fX6tWrJUlpaWlq3bq1QkND9eeffyo2NlbdunWTp6enRo0a5cxDAwAAwE1kGIZSU1OVlpbm7FDg4tzd3eXh4XHDP3tkMQzDyKOY8kRQUJDefvttdezYUUWLFtVXX32ljh07SpJ2796tihUras2aNbrvvvv066+/qk2bNjp27JhCQkIkSVOnTtXgwYN14sQJeXl55WifiYmJCggIUEJCgvMnm7j//ivPmWwCAAAgR5KTkxUbG6uLFy86OxTcIgoUKKCwsLBM84Wc5gYuM2tfWlqa5syZowsXLigyMlIbN25USkqKoqKi7HUqVKigu+66y55IrVmzRlWrVrUnUZIUHR2tZ599Vjt27NA999yT6b6SkpKUlJRkX05MTMy/AwMAAEC+sdlsOnDggNzd3VWsWDF5eXnd8EgDbl+GYSg5OVknTpzQgQMHVK5cuWx/dDc7Tk+ktm/frsjISF2+fFl+fn6aN2+eKlWqpC1btsjLy0uBgYEO9UNCQhQXFydJiouLc0ii0tenr8vK6NGjNWLEiLw9kLzg7i498sj/fw4AAIBsJScny2azKTw8XAUKFHB2OLgF+Pj4yNPTU4cOHVJycrK8vb1ztR2nJ1Lly5fXli1blJCQoLlz56p79+5asWJFvu5z6NChGjBggH05/Ue3nM7LSxo82NlRAAAA3HJyO6qAO1Ne9BenJ1JeXl4qW7asJKlmzZpav369JkyYoEcffVTJyck6e/asw6jU8ePHFRoaKkkKDQ3VunXrHLaXPqtfep3MWK1WWa3WPD4SAAAAAHcKl0vdbTabkpKSVLNmTXl6emrp0qX2dXv27NHhw4cVGRkpSYqMjNT27dsVHx9vr7N48WL5+/urUqVKNz32G2YY0pkzVx6uNQcIAAAAgKs4NZEaOnSoVq5cqYMHD2r79u0aOnSofv/9d3Xu3FkBAQHq1auXBgwYoOXLl2vjxo3q2bOnIiMjdd9990mSmjdvrkqVKqlr167aunWrFi1apFdffVUxMTG35ojT5ctSs2ZXHpcvOzsaAAAAONnBgwdlsVi0ZcsWZ4fiUkqWLKn333/fqTE4NZGKj49Xt27dVL58eTVt2lTr16/XokWL1KxZM0nS+PHj1aZNG3Xo0EENGzZUaGiovv/+e/vr3d3dNX/+fLm7uysyMlJdunRRt27dNHLkSGcdEgAAAJAjPXr0kMVisT8KFy6sFi1aaNu2bfY64eHhio2NVZUqVW5oXyVLlpTFYtE333yTYV3lypVlsVg0Y8aMG9pHblgsFv3www+mX7d+/Xo9/fTTeR+QCU69R+rTTz/Ndr23t7cmTZqkSZMmZVknIiJCv/zyS16HBgAAAOS7Fi1aaPr06ZKuzDr96quvqk2bNjp8+LCkKwMH2d37b0Z4eLimT5+uxx57zF72119/KS4uTr6+vnmyj5ulaNGizg7B9e6RAgAAAPLEpUtZP5KTc173qt8fzbZuLlitVoWGhio0NFQ1atTQkCFDdOTIEZ04cUJS5pf2/fTTTypXrpy8vb31wAMPaObMmbJYLDp79my2++rcubNWrFihI0eO2Ms+++wzde7cWR4ejuMrhw8fVrt27eTn5yd/f3916tTJPqmbdGU07aGHHnJ4Tf/+/dW4cWP7cuPGjdW3b1+99NJLCgoKUmhoqIYPH25fX7JkSUnSww8/LIvFYl/ev3+/2rVrp5CQEPn5+al27dpasmSJw76uvbTPYrHok08+0cMPP6wCBQqoXLly+umnn7JtjxtFIgUAAIDb0/33Z/148UXHus2aZV33+ecd67Ztm3m9G3T+/Hl98cUXKlu2rAoXLpxpnQMHDqhjx4566KGHtHXrVj3zzDN65ZVXcrT9kJAQRUdHa+bMmZKkixcv6ttvv9WTTz7pUM9ms6ldu3Y6ffq0VqxYocWLF+vff//Vo48+avqYZs6cKV9fX61du1bjxo3TyJEjtXjxYklXLs+TpOnTpys2Nta+fP78ebVq1UpLly7V5s2b1aJFC7Vt29Y+SpeVESNGqFOnTtq2bZtatWqlzp076/Tp06ZjzimnT3+OjLb9l6BHXv1VSZ55P2HGwTGt83ybAAAAyJ358+fLz89PknThwgWFhYVp/vz5Wf7O0bRp01S+fHm9/fbbkq78Juvff/+tt956K0f7e/LJJzVw4EC98sormjt3rsqUKaMaNWo41Fm6dKm2b9+uAwcO2H9rddasWapcubLWr1+v2rVr5/j4qlWrpmHDhkmSypUrpw8//FBLly5Vs2bN7JfnBQYGOly+WL16dVWvXt2+/MYbb2jevHn66aef1KdPnyz31aNHDz3++OOSpFGjRumDDz7QunXr1KJFixzHawaJFAAAAG5Pf/yR9Tp3d8fl/xslydS1Sc3PP+c+pms88MADmjJliiTpzJkzmjx5slq2bKl169YpIiIiQ/09e/ZkSGTq1KmT4/21bt1azzzzjFauXKnPPvssw2iUJO3atUvh4eH2JEqSKlWqpMDAQO3atct0InW1sLAwh58uysz58+c1fPhwLViwQLGxsUpNTdWlS5euOyJ19b58fX3l7+9/3X3dCBIpV+LuLrVpo2VL/pGNX+cGAAC4MT4+zq97Hb6+vipbtqx9+ZNPPlFAQIA+/vhjvfnmm3m2n3QeHh7q2rWrhg0bprVr12revHm52o6bm5uMa373NCUlJUM9T09Ph2WLxSKbzZbttgcNGqTFixfrnXfeUdmyZeXj46OOHTsq+dr72vJgXzeCb+uuxMtLGj5c7zforBR3z+vXBwAAwG3FYrHIzc1Nl7KYvKJ8+fLasGGDQ1n6vUU59eSTT2rFihVq166dChUqlGF9xYoVdeTIEYdJKXbu3KmzZ8+qUqVKkq7MmhcbG+vwutz81pWnp6fS0tIcylavXq0ePXro4YcfVtWqVRUaGqqDBw+a3nZ+I5ECAAAAnCQpKUlxcXGKi4vTrl279Pzzz+v8+fNq27ZtpvWfeeYZ7d69W4MHD9Y///yj2bNn23//yWKx5GifFStW1MmTJ+3Trl8rKipKVatWVefOnbVp0yatW7dO3bp1U6NGjVSrVi1JUpMmTbRhwwbNmjVLe/fu1bBhw/T333+bPv6SJUtq6dKliouL05kzZyRduZfq+++/15YtW7R161Y98cQT+TqylFskUq7EMKRLl2RNSbryHAAAALe1hQsXKiwsTGFhYapbt67Wr1+vOXPmOEwjfrVSpUpp7ty5+v7771WtWjVNmTLFPmuf1ZrzicoKFy4snywuUbRYLPrxxx9VqFAhNWzYUFFRUSpdurS+/fZbe53o6Gi99tpreumll1S7dm2dO3dO3bp1y/mB/593331XixcvVnh4uO655x5J0nvvvadChQqpXr16atu2raKjo3Xvvfea3nZ+sxjXXtx4B0pMTFRAQIASEhLk7+/vvEAuXZLuv//KrH1PjGXWPgAAgOu4fPmyDhw4oFKlSsnb29vZ4TjFW2+9palTpzpciofsZddvcpobMNkEAAAAcAuZPHmyateurcKFC2v16tV6++23s50WHPmDRAoAAAC4hezdu1dvvvmmTp8+rbvuuksDBw7U0KFDnR3WHYdECgAAALiFjB8/XuPHj3d2GHc8JpsAAAAAAJNIpAAAAHDLY/40mJEX/YVECgAAALcsT09PSdLFixedHAluJen9Jb3/5Ab3SLkSd3epaVP9uWK/bG7kuAAAANfj7u6uwMBAxcfHS5IKFCiQ4x+mxZ3HMAxdvHhR8fHxCgwMlLu7e663RSLlSry8pLFjNWbIAmdHAgAAcMsIDQ2VJHsyBVxPYGCgvd/kFokUAAAAbmkWi0VhYWEKDg5WSkqKs8OBi/P09Lyhkah0JFIAAAC4Lbi7u+fJF2QgJ7gRx5VcuiTVqqWfZvaXNSXJ2dEAAAAAyAKJFAAAAACYRCIFAAAAACaRSAEAAACASSRSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUi5End3qX59bSxeUTY33hoAAADAVfFt3ZV4eUkTJmhE1DNKcfd0djQAAAAAskAiBQAAAAAmkUgBAAAAgEkkUq7k0iWpQQPN+eIlWVOSnB0NAAAAgCyQSLmay5dlTUt2dhQAAAAAskEiBQAAAAAmkUgBAAAAgEkkUgAAAABgEokUAAAAAJhEIgUAAAAAJnk4OwBcxc1Nuvde/Z16WIaFHBcAAABwVSRSrsRqlT76SC8PWeDsSAAAAABkg2EPAAAAADCJRAoAAAAATCKRciWXLklRUfrim1dkTUlydjQAAAAAskAi5WrOnpV/0gVnRwEAAAAgGyRSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUgAAAABgkoezA8BV3NykSpW09/JRGRZyXAAAAMBVkUi5EqtVmjVLA4cscHYkAAAAALLBsAcAAAAAmEQiBQAAAAAmOTWRGj16tGrXrq2CBQsqODhYDz30kPbs2eNQp3HjxrJYLA6P//3vfw51Dh8+rNatW6tAgQIKDg7Wiy++qNTU1Jt5KHnj8mWpbVt9MneErKnJzo4GAAAAQBaceo/UihUrFBMTo9q1ays1NVUvv/yymjdvrp07d8rX19der3fv3ho5cqR9uUCBAvbnaWlpat26tUJDQ/Xnn38qNjZW3bp1k6enp0aNGnVTj+eGGYYUG6vgCwlXngMAAABwSU5NpBYuXOiwPGPGDAUHB2vjxo1q2LChvbxAgQIKDQ3NdBu//fabdu7cqSVLligkJEQ1atTQG2+8ocGDB2v48OHy8vLK12MAAAAAcOdxqXukEhISJElBQUEO5V9++aWKFCmiKlWqaOjQobp48aJ93Zo1a1S1alWFhITYy6Kjo5WYmKgdO3Zkup+kpCQlJiY6PAAAAAAgp1xm+nObzab+/furfv36qlKlir38iSeeUEREhIoVK6Zt27Zp8ODB2rNnj77//ntJUlxcnEMSJcm+HBcXl+m+Ro8erREjRuTTkQAAAAC43blMIhUTE6O///5bq1atcih/+umn7c+rVq2qsLAwNW3aVPv371eZMmVyta+hQ4dqwIAB9uXExESFh4fnLnAAAAAAdxyXuLSvT58+mj9/vpYvX64SJUpkW7du3bqSpH379kmSQkNDdfz4cYc66ctZ3VdltVrl7+/v8AAAAACAnHJqImUYhvr06aN58+Zp2bJlKlWq1HVfs2XLFklSWFiYJCkyMlLbt29XfHy8vc7ixYvl7++vSpUq5Uvc+cZikUqX1uGA0CvPAQAAALgkp17aFxMTo6+++ko//vijChYsaL+nKSAgQD4+Ptq/f7+++uortWrVSoULF9a2bdv0wgsvqGHDhqpWrZokqXnz5qpUqZK6du2qcePGKS4uTq+++qpiYmJktVqdeXjmeXtLs2erz5AFzo4EAAAAQDacOiI1ZcoUJSQkqHHjxgoLC7M/vv32W0mSl5eXlixZoubNm6tChQoaOHCgOnTooJ9//tm+DXd3d82fP1/u7u6KjIxUly5d1K1bN4ffnQIAAACAvOTUESnjOj86Gx4erhUrVlx3OxEREfrll1/yKiwAAAAAyJZLTDaB/3P5stSpkz78YYysqcnOjgYAAABAFlxm+nNIMgzp3391V0LClecAAAAAXBIjUgAAAABgEokUAAAAAJhEIgUAAAAAJpFIAQAAAIBJJFIAAAAAYBKz9rkSi0UKC1P8WbcrzwEAAAC4JBIpV+LtLf38s54assDZkQAAAADIBpf2AQAAAIBJJFIAAAAAYBKJlCtJSpK6ddO789+VV2qKs6MBAAAAkAXukXIlNpu0c6fKnUqQxbA5OxoAAAAAWWBECgAAAABMIpECAAAAAJNIpAAAAADAJBIpAAAAADCJRAoAAAAATCKRcjWBgUq0+jo7CgAAAADZYPpzV+LjIy1Zoi5DFjg7EgAAAADZYEQKAAAAAEwikQIAAAAAk0ikXElSkvT00xq1cKK8UlOcHQ0AAACALHCPlCux2aRNm1TleIIshs3Z0QAAAADIAiNSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUgAAAABgEomUq/H2VpK7l7OjAAAAAJANpj93JT4+0qpVemTIAmdHAgAAACAbjEgBAAAAgEkkUgAAAABgEomUK0lOlvr107Al0+SZluLsaAAAAABkgXukXElamrR6tWr+lyA3m01yd3ZAAAAAADLDiBQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUgAAAABgEokUAAAAAJhEIgUAAAAAJjH9uSvx8ZE2bNCDQxY4OxIAAAAA2WBECgAAAABMIpECAAAAAJNIpFxJcrI0eLCG/D5dnmkpzo4GAAAAQBZIpFxJWpq0dKnqHdoqN5vN2dEAAAAAyAKJFAAAAACYRCIFAAAAACaRSAEAAACASSRSAAAAAGASiRQAAAAAmEQiBQAAAAAmOTWRGj16tGrXrq2CBQsqODhYDz30kPbs2eNQ5/Lly4qJiVHhwoXl5+enDh066Pjx4w51Dh8+rNatW6tAgQIKDg7Wiy++qNTU1Jt5KHnD21v64w898sRYJXl4OTsaAAAAAFlwaiK1YsUKxcTE6K+//tLixYuVkpKi5s2b68KFC/Y6L7zwgn7++WfNmTNHK1as0LFjx9S+fXv7+rS0NLVu3VrJycn6888/NXPmTM2YMUOvv/66Mw7pxlgsko+PkjytV54DAAAAcEkWwzAMZweR7sSJEwoODtaKFSvUsGFDJSQkqGjRovrqq6/UsWNHSdLu3btVsWJFrVmzRvfdd59+/fVXtWnTRseOHVNISIgkaerUqRo8eLBOnDghL6+MIztJSUlKSkqyLycmJio8PFwJCQny9/e/OQebjZJDFuTbtg+OaZ1v2wYAAABudYmJiQoICLhubuBS90glJCRIkoKCgiRJGzduVEpKiqKioux1KlSooLvuuktr1qyRJK1Zs0ZVq1a1J1GSFB0drcTERO3YsSPT/YwePVoBAQH2R3h4eH4dkjnJydLw4eq/6kt5pqU4OxoAAAAAWXCZRMpms6l///6qX7++qlSpIkmKi4uTl5eXAgMDHeqGhIQoLi7OXufqJCp9ffq6zAwdOlQJCQn2x5EjR/L4aHIpLU2aP19N9q+Xm83m7GgAAAAAZMHD2QGki4mJ0d9//61Vq1bl+76sVqusVmu+7wcAAADA7cklRqT69Omj+fPna/ny5SpRooS9PDQ0VMnJyTp79qxD/ePHjys0NNRe59pZ/NKX0+sAAAAAQF5yaiJlGIb69OmjefPmadmyZSpVqpTD+po1a8rT01NLly61l+3Zs0eHDx9WZGSkJCkyMlLbt29XfHy8vc7ixYvl7++vSpUq3ZwDAQAAAHBHceqlfTExMfrqq6/0448/qmDBgvZ7mgICAuTj46OAgAD16tVLAwYMUFBQkPz9/fX8888rMjJS9913nySpefPmqlSpkrp27apx48YpLi5Or776qmJiYrh8DwAAAEC+cGoiNWXKFElS48aNHcqnT5+uHj16SJLGjx8vNzc3dejQQUlJSYqOjtbkyZPtdd3d3TV//nw9++yzioyMlK+vr7p3766RI0ferMMAAAAAcIdxaiKVk5+w8vb21qRJkzRp0qQs60REROiXX37Jy9AAAAAAIEsuM2sfJHl7S4sXq8uI35TkkfGHhAEAAAC4BhIpV2KxSIUKKdHbz9mRAAAAAMiGS0x/DgAAAAC3EhIpV5KcLI0dq//9NVeeaSnOjgYAAABAFkikXElamjRnjlrtWSU3m83Z0QAAAADIAokUAAAAAJhEIgUAAAAAJpFIAQAAAIBJJFIAAAAAYBKJFAAAAACYRCIFAAAAACZ5ODsAXMVqlX76SU+9tUTJHp7OjgYAAABAFkikXImbm1SsmOILFnZ2JAAAAACywaV9AAAAAGASiZQrSUmRJkxQzw0/yiMt1dnRAAAAAMgCiZQrSU2VPv9cD+9YLndbmrOjAQAAAJAFEikAAAAAMIlECgAAAABMMp1I/fvvv/kRBwAAAADcMkwnUmXLltUDDzygL774QpcvX86PmAAAAADApZlOpDZt2qRq1appwIABCg0N1TPPPKN169blR2wAAAAA4JJMJ1I1atTQhAkTdOzYMX322WeKjY1VgwYNVKVKFb333ns6ceJEfsQJAAAAAC4j15NNeHh4qH379pozZ47Gjh2rffv2adCgQQoPD1e3bt0UGxubl3HeGaxWafZsxbQbomQPT2dHAwAAACALuU6kNmzYoOeee05hYWF67733NGjQIO3fv1+LFy/WsWPH1K5du7yM887g5iaVLq0jgaEyLEyoCAAAALgqD7MveO+99zR9+nTt2bNHrVq10qxZs9SqVSu5uV354l+qVCnNmDFDJUuWzOtYAQAAAMAlmE6kpkyZoieffFI9evRQWFhYpnWCg4P16aef3nBwd5yUFGn6dD2+ZbvmVI1SqrvptwcAAADATWD6m/revXuvW8fLy0vdu3fPVUB3tNRU6aOP9Ph/Cfq+8gMkUgAAAICLMn0jzvTp0zVnzpwM5XPmzNHMmTPzJCgAAAAAcGWmE6nRo0erSJEiGcqDg4M1atSoPAkKAAAAAFyZ6UTq8OHDKlWqVIbyiIgIHT58OE+CAgAAAABXZjqRCg4O1rZt2zKUb926VYULF86ToAAAAADAlZlOpB5//HH17dtXy5cvV1pamtLS0rRs2TL169dPjz32WH7ECAAAAAAuxfS0cG+88YYOHjyopk2bysPjysttNpu6devGPVIAAAAA7ggWwzCM3Lzwn3/+0datW+Xj46OqVasqIiIir2O7aRITExUQEKCEhAT5+/s7LxCbTdq9W1HvrdD+wiVkWEwPGF7XwTGt83ybAAAAwO0ip7lBrn+o6O6779bdd9+d25cjM25uUqVK2lfkgLMjAQAAAJAN04lUWlqaZsyYoaVLlyo+Pl42m81h/bJly/IsOAAAAABwRaYTqX79+mnGjBlq3bq1qlSpIovFkh9x3ZlSUqSvv1b7v7fop4qNlOqe6wFDAAAAAPnI9Df1b775RrNnz1arVq3yI547W2qq9MEH6vFfghaUb0AiBQAAALgo07MZeHl5qWzZsvkRCwAAAADcEkwnUgMHDtSECROUy8n+AAAAAOCWZ/rasVWrVmn58uX69ddfVblyZXl6ejqs//777/MsOAAAAABwRaYTqcDAQD388MP5EQsAAAAA3BJMJ1LTp0/PjzgAAAAA4JZh+h4pSUpNTdWSJUs0bdo0nTt3TpJ07NgxnT9/Pk+DAwAAAABXZHpE6tChQ2rRooUOHz6spKQkNWvWTAULFtTYsWOVlJSkqVOn5kecdwarVZo2TS9PXKVkD8/r1wcAAADgFKZHpPr166datWrpzJkz8vHxsZc//PDDWrp0aZ4Gd8dxc5Nq1tTfoWVlWHI1WAgAAADgJjA9IvXHH3/ozz//lJeXl0N5yZIl9d9//+VZYAAAAADgqkwPe9hsNqWlpWUoP3r0qAoWLJgnQd2xUlOl2bPVetcfcrdlbGMAAAAArsF0ItW8eXO9//779mWLxaLz589r2LBhatWqVV7GdudJSZHGjdMz676TR1qqs6MBAAAAkAXTl/a9++67io6OVqVKlXT58mU98cQT2rt3r4oUKaKvv/46P2IEAAAAAJdiOpEqUaKEtm7dqm+++Ubbtm3T+fPn1atXL3Xu3Nlh8gkAAAAAuF3lamo4Dw8PdenSRePGjdPkyZP11FNP5SqJWrlypdq2batixYrJYrHohx9+cFjfo0cPWSwWh0eLFi0c6pw+fVqdO3eWv7+/AgMD1atXL37PCgAAAEC+Mj0iNWvWrGzXd+vWLcfbunDhgqpXr64nn3xS7du3z7ROixYtNH36dPuy1Wp1WN+5c2fFxsZq8eLFSklJUc+ePfX000/rq6++ynEcAAAAAGCG6USqX79+DsspKSm6ePGivLy8VKBAAVOJVMuWLdWyZcts61itVoWGhma6bteuXVq4cKHWr1+vWrVqSZImTpyoVq1a6Z133lGxYsVyHAsAAAAA5JTpS/vOnDnj8Dh//rz27NmjBg0a5MtkE7///ruCg4NVvnx5Pfvsszp16pR93Zo1axQYGGhPoiQpKipKbm5uWrt2bZbbTEpKUmJiosMDAAAAAHIqV/dIXatcuXIaM2ZMhtGqG9WiRQvNmjVLS5cu1dixY7VixQq1bNnS/jtWcXFxCg4OdniNh4eHgoKCFBcXl+V2R48erYCAAPsjPDw8T+PONS8v6f33NbJpb6W4mx4sBAAAAHCT5Nm3dQ8PDx07diyvNidJeuyxx+zPq1atqmrVqqlMmTL6/fff1bRp01xvd+jQoRowYIB9OTEx0TWSKXd3qUEDbZif4OxIAAAAAGTDdCL1008/OSwbhqHY2Fh9+OGHql+/fp4FlpnSpUurSJEi2rdvn5o2barQ0FDFx8c71ElNTdXp06ezvK9KunLf1bWTVgAAAABATplOpB566CGHZYvFoqJFi6pJkyZ699138yquTB09elSnTp1SWFiYJCkyMlJnz57Vxo0bVbNmTUnSsmXLZLPZVLdu3XyNJV+kpkq//qqm+9br99K1lObm7uyIAAAAAGTCdCJls9nybOfnz5/Xvn377MsHDhzQli1bFBQUpKCgII0YMUIdOnRQaGio9u/fr5deeklly5ZVdHS0JKlixYpq0aKFevfuralTpyolJUV9+vTRY489dmvO2JeSIo0YoX7/JWhVRA0SKQAAAMBF5clkE7m1YcMG3XPPPbrnnnskSQMGDNA999yj119/Xe7u7tq2bZsefPBB3X333erVq5dq1qypP/74w+GyvC+//FIVKlRQ06ZN1apVKzVo0EAfffSRsw4JAAAAwB3A9IjU1ZM0XM97772X7frGjRvLMIws1y9atOi6+wgKCuLHdwEAAADcVKYTqc2bN2vz5s1KSUlR+fLlJUn//POP3N3dde+999rrWSyWvIsSAAAAAFyI6USqbdu2KliwoGbOnKlChQpJuvIjvT179tT999+vgQMH5nmQAAAAAOBKTN8j9e6772r06NH2JEqSChUqpDfffDPfZ+0DAAAAAFdgOpFKTEzUiRMnMpSfOHFC586dy5OgAAAAAMCVmU6kHn74YfXs2VPff/+9jh49qqNHj+q7775Tr1691L59+/yI8c7h5SWNGaOxjXooxd30VZcAAAAAbhLT39anTp2qQYMG6YknnlBKSsqVjXh4qFevXnr77bfzPMA7iru7FBWl1UuSnB0JAAAAgGyYTqQKFCigyZMn6+2339b+/fslSWXKlJGvr2+eBwcAAAAArijXP8gbGxur2NhYlStXTr6+vtn+HhRyKC1NWrJE9Q9ukZstzdnRAAAAAMiC6UTq1KlTatq0qe6++261atVKsbGxkqRevXox9fmNSk6WhgzR4BUz5JmW6uxoAAAAAGTBdCL1wgsvyNPTU4cPH1aBAgXs5Y8++qgWLlyYp8EBAAAAgCsyfY/Ub7/9pkWLFqlEiRIO5eXKldOhQ4fyLDAAAAAAcFWmR6QuXLjgMBKV7vTp07JarXkSFAAAAAC4MtOJ1P33369Zs2bZly0Wi2w2m8aNG6cHHnggT4MDAAAAAFdk+tK+cePGqWnTptqwYYOSk5P10ksvaceOHTp9+rRWr16dHzECAAAAgEsxPSJVpUoV/fPPP2rQoIHatWunCxcuqH379tq8ebPKlCmTHzECAAAAgEsxNSKVkpKiFi1aaOrUqXrllVfyK6Y7l6enNGyYJsxcr1R304OFAAAAAG4SU9/WPT09tW3btvyKBR4eUtu2Wro617+TDAAAAOAmMP2NvUuXLvr000/zIxYAAAAAuCWYvn4sNTVVn332mZYsWaKaNWvK19fXYf17772XZ8HdcdLSpDVrVOvoDm0qVkE2N3dnRwQAAAAgEzlKpLZt26YqVarIzc1Nf//9t+69915J0j///ONQz2Kx5H2Ed5LkZKl/f73+X4IeeWKskkikAAAAAJeUo0TqnnvuUWxsrIKDg3Xo0CGtX79ehQsXzu/YAAAAAMAl5egeqcDAQB04cECSdPDgQdlstnwNCgAAAABcWY5GpDp06KBGjRopLCxMFotFtWrVkrt75ped/fvvv3kaIAAAAAC4mhwlUh999JHat2+vffv2qW/fvurdu7cKFiyY37EBAAAAgEvK8ax9LVq0kCRt3LhR/fr1I5ECAAAAcMcyPf359OnT8yMOAAAAALhlmP5BXuQjT0/ppZc0rU4HpbqbznEBAAAA3CR8W3clHh5Sp05asMn3+nUBAAAAOA0jUgAAAABgEomUK7HZpI0bVSVunywGv9UFAAAAuCoSKVeSlCQ984xGLfpQXqkpzo4GAAAAQBZIpAAAAADAJBIpAAAAADCJRAoAAAAATCKRAgAAAACTSKQAAAAAwCQSKQAAAAAwiUTKlXh4SH37akbNtkpzc3d2NAAAAACy4OHsAHAVT0+pWzd9v7OwsyMBAAAAkA1GpAAAAADAJBIpV2KzSTt3quzJw7IYNmdHAwAAACALJFKuJClJ6tZN7y14T16pKc6OBgAAAEAWSKQAAAAAwCQSKQAAAAAwiUQKAAAAAEwikQIAAAAAk0ikAAAAAMAkEikAAAAAMIlEypV4eEhPP62vq7dQmpu7s6MBAAAAkAUPZweAq3h6Xkmk/l3g7EgAAAAAZIMRKQAAAAAwiUTKldhs0r//KvxsnCyGzdnRAAAAAMiCUxOplStXqm3btipWrJgsFot++OEHh/WGYej1119XWFiYfHx8FBUVpb179zrUOX36tDp37ix/f38FBgaqV69eOn/+/E08ijyUlCR16qRJP46RV2qKs6MBAAAAkAWnJlIXLlxQ9erVNWnSpEzXjxs3Th988IGmTp2qtWvXytfXV9HR0bp8+bK9TufOnbVjxw4tXrxY8+fP18qVK/X000/frEMAAAAAcAdy6mQTLVu2VMuWLTNdZxiG3n//fb366qtq166dJGnWrFkKCQnRDz/8oMcee0y7du3SwoULtX79etWqVUuSNHHiRLVq1UrvvPOOihUrdtOOBQAAAMCdw2XvkTpw4IDi4uIUFRVlLwsICFDdunW1Zs0aSdKaNWsUGBhoT6IkKSoqSm5ublq7dm2W205KSlJiYqLDAwAAAAByymUTqbi4OElSSEiIQ3lISIh9XVxcnIKDgx3We3h4KCgoyF4nM6NHj1ZAQID9ER4ensfRAwAAALiduWwilZ+GDh2qhIQE++PIkSPODgkAAADALcRlE6nQ0FBJ0vHjxx3Kjx8/bl8XGhqq+Ph4h/Wpqak6ffq0vU5mrFar/P39HR4AAAAAkFMum0iVKlVKoaGhWrp0qb0sMTFRa9euVWRkpCQpMjJSZ8+e1caNG+11li1bJpvNprp16970mG+Yh4fUtavmVX5AaW7uzo4GAAAAQBacOmvf+fPntW/fPvvygQMHtGXLFgUFBemuu+5S//799eabb6pcuXIqVaqUXnvtNRUrVkwPPfSQJKlixYpq0aKFevfuralTpyolJUV9+vTRY489dmvO2OfpKfXrp+mxC5wdCQAAAIBsODWR2rBhgx544AH78oABAyRJ3bt314wZM/TSSy/pwoULevrpp3X27Fk1aNBACxculLe3t/01X375pfr06aOmTZvKzc1NHTp00AcffHDTjwUAAADAncNiGIbh7CCcLTExUQEBAUpISHDu/VI2mxQXpzpvLdEJv0IyLHl/5eXBMa3zfJsAAADA7SKnuYHL3iN1R0pKkh58UJ98/4a8UlOcHQ0AAACALJBIAQAAAIBJJFIAAAAAYBKJFAAAAACYRCIFAAAAACaRSAEAAACASSRSAAAAAGASiZQrcXeXHnlEv5RvIJsbbw0AAADgqjycHQCu4uUlDR6sqWcWODsSAAAAANlg2AMAAAAATCKRciWGIZ05I//L5688BwAAAOCSSKRcyeXLUrNm+uLbV2VNTXZ2NAAAAACyQCIFAAAAACaRSAEAAACASSRSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUq7E3V1q00bLytSWzY23BgAAAHBVHs4OAFfx8pKGD9f7lxc4OxIAAAAA2WDYAwAAAABMIpFyJYYhXboka0rSlecAAAAAXBKJlCu5fFm6/37N+WqwrKnJzo4GAAAAQBZIpAAAAADAJBIpAAAAADCJRAoAAAAATCKRAgAAAACTSKQAAAAAwCQSKQAAAAAwiUTKlbi7S02b6s+I6rK58dYAAAAAropv667Ey0saO1ZjGvdUiruns6MBAAAAkAUSKQAAAAAwiUQKAAAAAEwikXIlly5JtWrpp5n9ZU1JcnY0AAAAALJAIgUAAAAAJpFIAQAAAIBJJFIAAAAAYBKJFAAAAACYRCIFAAAAACaRSAEAAACASSRSrsTdXapfXxuLV5TNjbcGAAAAcFV8W3clXl7ShAkaEfWMUtw9nR0NAAAAgCyQSAEAAACASSRSAAAAAGASiZQruXRJatBAc754SdaUJGdHAwAAACALJFKu5vJlWdOSnR0FAAAAgGyQSAEAAACASSRSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkunUgNHz5cFovF4VGhQgX7+suXLysmJkaFCxeWn5+fOnTooOPHjzsx4hvk5ibde6/+Dikjw+LSbw0AAABwR3P5b+uVK1dWbGys/bFq1Sr7uhdeeEE///yz5syZoxUrVujYsWNq3769E6O9QVar9NFHernF80r28HR2NAAAAACy4OHsAK7Hw8NDoaGhGcoTEhL06aef6quvvlKTJk0kSdOnT1fFihX1119/6b777rvZoQIAAAC4Q7j8iNTevXtVrFgxlS5dWp07d9bhw4clSRs3blRKSoqioqLsdStUqKC77rpLa9asyXabSUlJSkxMdHgAAAAAQE65dCJVt25dzZgxQwsXLtSUKVN04MAB3X///Tp37pzi4uLk5eWlwMBAh9eEhIQoLi4u2+2OHj1aAQEB9kd4eHg+HoUJly5JUVH64ptXZE1JcnY0AAAAALLg0pf2tWzZ0v68WrVqqlu3riIiIjR79mz5+PjkertDhw7VgAED7MuJiYmuk0ydPSv/pAvOjgIAAABANlx6ROpagYGBuvvuu7Vv3z6FhoYqOTlZZ8+edahz/PjxTO+puprVapW/v7/DAwAAAABy6pZKpM6fP6/9+/crLCxMNWvWlKenp5YuXWpfv2fPHh0+fFiRkZFOjBIAAADA7c6lL+0bNGiQ2rZtq4iICB07dkzDhg2Tu7u7Hn/8cQUEBKhXr14aMGCAgoKC5O/vr+eff16RkZHM2AcAAAAgX7l0InX06FE9/vjjOnXqlIoWLaoGDRror7/+UtGiRSVJ48ePl5ubmzp06KCkpCRFR0dr8uTJTo4aAAAAwO3OpROpb775Jtv13t7emjRpkiZNmnSTIgIAAAAAF0+k7jhublKlStp7+agMyy11+xoAAABwRyGRciVWqzRrlgYOWeDsSAAAAABkg2EPAAAAADCJRAoAAAAATCKRciWXL0tt2+qTuSNkTU12djQAAAAAssA9Uq7EMKTYWAVfSLjyHAAAAIBLYkQKAAAAAEwikQIAAAAAk0ikAAAAAMAkEikAAAAAMIlECgAAAABMYtY+V2KxSKVL6/D52CvPAQAAALgkEilX4u0tzZ6tPkMWODsSAAAAANng0j4AAAAAMIlECgAAAABMIpFyJZcvS5066cMfxsiamuzsaAAAAABkgXukXIlhSP/+q7sSEq48BwAAAOCSGJECAAAAAJNIpAAAAADAJBIpAAAAADCJRAoAAAAATCKRAgAAAACTmLXPlVgsUliY4s+6XXkOAAAAwCWRSLkSb2/p55/11JAFzo4EAAAAQDa4tA8AAAAATCKRAgAAAACTuLTPlSQlSb17691NRzW0RV8le3g6OyJAJfP5UtODY1rn6/YBAADyA4mUK7HZpJ07Ve5UgiyGzdnRAAAAAMgCl/YBAAAAgEkkUgAAAABgEokUAAAAAJhEIgUAAAAAJpFIAQAAAIBJJFKuJjBQiVZfZ0cBAAAAIBtMf+5KfHykJUvUJZ9/twcAAADAjWFECgAAAABMIpECAAAAAJNIpFxJUpL09NMatXCivFJTnB0NAAAAgCxwj5QrsdmkTZtU5XiCLIbN2dEAAAAAyAIjUgAAAABgEokUAAAAAJhEIgUAAAAAJpFIAQAAAIBJJFIAAAAAYBKJlKvx9laSu5ezowAAAACQDaY/dyU+PtKqVXpkyAJnRwIAAAAgG4xIAQAAAIBJjEgBAPB/SubzFQEHx7TO1+0DAG4eRqRcSXKy1K+fhi2ZJs+0FGdHAwAAACALjEi5krQ0afVq1fwvQW42m+Tu7IAAAIAz5ecoKSOkwI1hRAoAAAAATCKRAgAAAACTbptEatKkSSpZsqS8vb1Vt25drVu3ztkhAQAAALhN3RaJ1LfffqsBAwZo2LBh2rRpk6pXr67o6GjFx8c7OzQAAAAAt6HbYrKJ9957T71791bPnj0lSVOnTtWCBQv02WefaciQIU6ODgAAALj1MfmJo1s+kUpOTtbGjRs1dOhQe5mbm5uioqK0Zs2aTF+TlJSkpKQk+3JCQoIkKTExMX+DvZ5Ll6S0NJ232ZSWdFE2W1qe78Lpx4hbji3pYr5unz4JV0J/h6vJzz5Jf4RZd0p/TI/FMIxs693yidTJkyeVlpamkJAQh/KQkBDt3r0709eMHj1aI0aMyFAeHh6eLzHmyqSu+bLZgPfzZbNArtEncSehv8OV0B/hSlyxP547d04BAQFZrr/lE6ncGDp0qAYMGGBfttlsOn36tAoXLiyLxeLEyK5kwOHh4Tpy5Ij8/f2dGsvtiPbNX7Rv/qJ98xftm79o3/xF++Yv2jd/uVr7Goahc+fOqVixYtnWu+UTqSJFisjd3V3Hjx93KD9+/LhCQ0MzfY3VapXVanUoCwwMzK8Qc8Xf398lOtLtivbNX7Rv/qJ98xftm79o3/xF++Yv2jd/uVL7ZjcSle6Wn7XPy8tLNWvW1NKlS+1lNptNS5cuVWRkpBMjAwAAAHC7uuVHpCRpwIAB6t69u2rVqqU6dero/fff14ULF+yz+AEAAABAXrotEqlHH31UJ06c0Ouvv664uDjVqFFDCxcuzDABxa3AarVq2LBhGS49RN6gffMX7Zu/aN/8RfvmL9o3f9G++Yv2zV+3avtajOvN6wcAAAAAcHDL3yMFAAAAADcbiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUk4wadIklSxZUt7e3qpbt67WrVuXbf05c+aoQoUK8vb2VtWqVfXLL7/cpEhvTWbad8aMGbJYLA4Pb2/vmxjtrWXlypVq27atihUrJovFoh9++OG6r/n999917733ymq1qmzZspoxY0a+x3mrMtu+v//+e4b+a7FYFBcXd3MCvoWMHj1atWvXVsGCBRUcHKyHHnpIe/bsue7rOP/mTG7al/Nvzk2ZMkXVqlWz/1hpZGSkfv3112xfQ9/NObPtS9+9MWPGjJHFYlH//v2zrXcr9GESqZvs22+/1YABAzRs2DBt2rRJ1atXV3R0tOLj4zOt/+eff+rxxx9Xr169tHnzZj300EN66KGH9Pfff9/kyG8NZttXuvIr2rGxsfbHoUOHbmLEt5YLFy6oevXqmjRpUo7qHzhwQK1bt9YDDzygLVu2qH///nrqqae0aNGifI701mS2fdPt2bPHoQ8HBwfnU4S3rhUrVigmJkZ//fWXFi9erJSUFDVv3lwXLlzI8jWcf3MuN+0rcf7NqRIlSmjMmDHauHGjNmzYoCZNmqhdu3basWNHpvXpu+aYbV+Jvptb69ev17Rp01StWrVs690yfdjATVWnTh0jJibGvpyWlmYUK1bMGD16dKb1O3XqZLRu3dqhrG7dusYzzzyTr3Heqsy27/Tp042AgICbFN3tRZIxb968bOu89NJLRuXKlR3KHn30USM6OjofI7s95KR9ly9fbkgyzpw5c1Niup3Ex8cbkowVK1ZkWYfzb+7lpH05/96YQoUKGZ988kmm6+i7Ny679qXv5s65c+eMcuXKGYsXLzYaNWpk9OvXL8u6t0ofZkTqJkpOTtbGjRsVFRVlL3Nzc1NUVJTWrFmT6WvWrFnjUF+SoqOjs6x/J8tN+0rS+fPnFRERofDw8Ov+DxTMof/eHDVq1FBYWJiaNWum1atXOzucW0JCQoIkKSgoKMs69N/cy0n7Spx/cyMtLU3ffPONLly4oMjIyEzr0HdzLyftK9F3cyMmJkatW7fO0Dczc6v0YRKpm+jkyZNKS0tTSEiIQ3lISEiW9zTExcWZqn8ny037li9fXp999pl+/PFHffHFF7LZbKpXr56OHj16M0K+7WXVfxMTE3Xp0iUnRXX7CAsL09SpU/Xdd9/pu+++U3h4uBo3bqxNmzY5OzSXZrPZ1L9/f9WvX19VqlTJsh7n39zJafty/jVn+/bt8vPzk9Vq1f/+9z/NmzdPlSpVyrQufdc8M+1L3zXvm2++0aZNmzR69Ogc1b9V+rCHswMAnCkyMtLhf5zq1aunihUratq0aXrjjTecGBlwfeXLl1f58uXty/Xq1dP+/fs1fvx4ff75506MzLXFxMTo77//1qpVq5wdym0pp+3L+dec8uXLa8uWLUpISNDcuXPVvXt3rVixIssv+zDHTPvSd805cuSI+vXrp8WLF992k3KQSN1ERYoUkbu7u44fP+5Qfvz4cYWGhmb6mtDQUFP172S5ad9reXp66p577tG+ffvyI8Q7Tlb919/fXz4+Pk6K6vZWp04dEoRs9OnTR/Pnz9fKlStVokSJbOty/jXPTPtei/Nv9ry8vFS2bFlJUs2aNbV+/XpNmDBB06ZNy1CXvmuemfa9Fn03exs3blR8fLzuvfdee1laWppWrlypDz/8UElJSXJ3d3d4za3Sh7m07yby8vJSzZo1tXTpUnuZzWbT0qVLs7wONzIy0qG+JC1evDjb63bvVLlp32ulpaVp+/btCgsLy68w7yj035tvy5Yt9N9MGIahPn36aN68eVq2bJlKlSp13dfQf3MuN+17Lc6/5thsNiUlJWW6jr5747Jr32vRd7PXtGlTbd++XVu2bLE/atWqpc6dO2vLli0ZkijpFurDzp7t4k7zzTffGFar1ZgxY4axc+dO4+mnnzYCAwONuLg4wzAMo2vXrsaQIUPs9VevXm14eHgY77zzjrFr1y5j2LBhhqenp7F9+3ZnHYJLM9u+I0aMMBYtWmTs37/f2Lhxo/HYY48Z3t7exo4dO5x1CC7t3LlzxubNm43Nmzcbkoz33nvP2Lx5s3Ho0CHDMAxjyJAhRteuXe31//33X6NAgQLGiy++aOzatcuYNGmS4e7ubixcuNBZh+DSzLbv+PHjjR9++MHYu3evsX37dqNfv36Gm5ubsWTJEmcdgst69tlnjYCAAOP33383YmNj7Y+LFy/a63D+zb3ctC/n35wbMmSIsWLFCuPAgQPGtm3bjCFDhhgWi8X47bffDMOg794os+1L371x187ad6v2YRIpJ5g4caJx1113GV5eXkadOnWMv/76y76uUaNGRvfu3R3qz54927j77rsNLy8vo3LlysaCBQtucsS3FjPt279/f3vdkJAQo1WrVsamTZucEPWtIX267Wsf6W3avXt3o1GjRhleU6NGDcPLy8soXbq0MX369Jse963CbPuOHTvWKFOmjOHt7W0EBQUZjRs3NpYtW+ac4F1cZu0qyaE/cv7Nvdy0L+ffnHvyySeNiIgIw8vLyyhatKjRtGlT+5d8w6Dv3iiz7UvfvXHXJlK3ah+2GIZh3LzxLwAAAAC49XGPFAAAAACYRCIFAAAAACaRSAEAAACASSRSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUgAAAABgEokUAMDlHTx4UBaLRVu2bHF2KHa7d+/WfffdJ29vb9WoUcPZ4QAAbjISKQDAdfXo0UMWi0VjxoxxKP/hhx9ksVicFJVzDRs2TL6+vtqzZ4+WLl2aaZ3GjRurf//+NzcwAMBNQSIFAMgRb29vjR07VmfOnHF2KHkmOTk516/dv3+/GjRooIiICBUuXDjX2zEMQ6mpqbl+PQDAOUikAAA5EhUVpdDQUI0ePTrLOsOHD89wmdv777+vkiVL2pd79Oihhx56SKNGjVJISIgCAwM1cuRIpaam6sUXX1RQUJBKlCih6dOnZ9j+7t27Va9ePXl7e6tKlSpasWKFw/q///5bLVu2lJ+fn0JCQtS1a1edPHnSvr5x48bq06eP+vfvryJFiig6OjrT47DZbBo5cqRKlCghq9WqGjVqaOHChfb1FotFGzdu1MiRI2WxWDR8+PAM2+jRo4dWrFihCRMmyGKxyGKx6ODBg/r9999lsVj066+/qmbNmrJarVq1apX279+vdu3aKSQkRH5+fqpdu7aWLFnisM2SJUvqzTffVLdu3eTn56eIiAj99NNPOnHihNq1ayc/Pz9Vq1ZNGzZssL/m0KFDatu2rQoVKiRfX19VrlxZv/zyS6bHDQDIORIpAECOuLu7a9SoUZo4caKOHj16Q9tatmyZjh07ppUrV+q9997TsGHD1KZNGxUqVEhr167V//73Pz3zzDMZ9vPiiy9q4MCB2rx5syIjI9W2bVudOnVKknT27Fk1adJE99xzjzZs2KCFCxfq+PHj6tSpk8M2Zs6cKS8vL61evVpTp07NNL4JEybo3Xff1TvvvKNt27YpOjpaDz74oPbu3StJio2NVeXKlTVw4EDFxsZq0KBBmW4jMjJSvXv3VmxsrGJjYxUeHm5fP2TIEI0ZM0a7du1StWrVdP78ebVq1UpLly7V5s2b1aJFC7Vt21aHDx922O748eNVv359bd68Wa1bt1bXrl3VrVs3denSRZs2bVKZMmXUrVs3GYYhSYqJiVFSUpJWrlyp7du3a+zYsfLz8zP5jgEAMjAAALiO7t27G+3atTMMwzDuu+8+48knnzQMwzDmzZtnXP2nZNiwYUb16tUdXjt+/HgjIiLCYVsRERFGWlqavax8+fLG/fffb19OTU01fH19ja+//towDMM4cOCAIckYM2aMvU5KSopRokQJY+zYsYZhGMYbb7xhNG/e3GHfR44cMSQZe/bsMQzDMBo1amTcc8891z3eYsWKGW+99ZZDWe3atY3nnnvOvly9enVj2LBh2W6nUaNGRr9+/RzKli9fbkgyfvjhh+vGUblyZWPixIn25YiICKNLly725djYWEOS8dprr9nL1qxZY0gyYmNjDcMwjKpVqxrDhw+/7r4AAOYwIgUAMGXs2LGaOXOmdu3alettVK5cWW5u//9PUEhIiKpWrWpfdnd3V+HChRUfH+/wusjISPtzDw8P1apVyx7H1q1btXz5cvn5+dkfFSpUkHTlfqZ0NWvWzDa2xMREHTt2TPXr13cor1+//g0d87Vq1arlsHz+/HkNGjRIFStWVGBgoPz8/LRr164MI1LVqlWzPw8JCZEkh7ZLL0tvu759++rNN99U/fr1NWzYMG3bti3PjgEA7mQkUgAAUxo2bKjo6GgNHTo0wzo3Nzf7JWXpUlJSMtTz9PR0WLZYLJmW2Wy2HMd1/vx5tW3bVlu2bHF47N27Vw0bNrTX8/X1zfE289O1cQwaNEjz5s3TqFGj9Mcff2jLli2qWrVqhgkxrm6n9BkTMytLb7unnnpK//77r7p27art27erVq1amjhxYr4cEwDcSUikAACmjRkzRj///LPWrFnjUF60aFHFxcU5JFN5+dtPf/31l/15amqqNm7cqIoVK0qS7r33Xu3YsUMlS5ZU2bJlHR5mkid/f38VK1ZMq1evdihfvXq1KlWqZCpeLy8vpaWl5aju6tWr1aNHDz388MOqWrWqQkNDdfDgQVP7y0p4eLj+97//6fvvv9fAgQP18ccf58l2AeBORiIFADCtatWq6ty5sz744AOH8saNG+vEiRMaN26c9u/fr0mTJunXX3/Ns/1OmjRJ8+bN0+7duxUTE6MzZ87oySeflHRlUoXTp0/r8ccf1/r167V//34tWrRIPXv2zHEyk+7FF1/U2LFj9e2332rPnj0aMmSItmzZon79+pnaTsmSJbV27VodPHhQJ0+ezHaErVy5cvr++++1ZcsWbd26VU888YSpEbms9O/fX4sWLdKBAwe0adMmLV++3J58AgByj0QKAJArI0eOzPBFv2LFipo8ebImTZqk6tWra926dZnOaJdbY8aM0ZgxY1S9enWtWrVKP/30k4oUKSJJ9lGktLQ0NW/eXFWrVlX//v0VGBjocD9WTvTt21cDBgzQwIEDVbVqVS1cuFA//fSTypUrZ2o7gwYNkru7uypVqqSiRYtmuN/pau+9954KFSqkevXqqW3btoqOjta9995ran+ZSUtLU0xMjCpWrKgWLVro7rvv1uTJk294uwBwp7MY117MDgAAAADIFiNSAAAAAGASiRQAAAAAmEQiBQAAAAAmkUgBAAAAgEkkUgAAAABgEokUAAAAAJhEIgUAAAAAJpFIAQAAAIBJJFIAAAAAYBKJFAAAAACYRCIFAAAAACb9P+/3OjRmnIO5AAAAAElFTkSuQmCC\n"},"metadata":{}}],"source":["plot_compare('trams', 'Number of trams')"]},{"cell_type":"markdown","metadata":{"id":"ceKY6psBXvBf"},"source":["The vast majority of resorts, such as Big Mountain, have no trams."]},{"cell_type":"markdown","metadata":{"id":"zg1NanTuXvBg"},"source":["### 5.8.9 Skiable terrain area"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"Z3EZZGb3XvBg","executionInfo":{"status":"ok","timestamp":1721138998784,"user_tz":240,"elapsed":488,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"62c09821-c3d6-4f9f-db3a-3e92ac9ffb0d"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"iVBORw0KGgoAAAANSUhEUgAAA1IAAAHWCAYAAAB9mLjgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABoC0lEQVR4nO3dd3gUVdvH8d+mB9JoKUgIvRcFFKM0pYQiUkWKSMdHQAUElccCiAqCUiyI5ZFiQ1FEbCgdBERAilQBQVAIPUCA1D3vH3mzsKSQgYQNy/dzXXNdszNnztwze2Z3752ZMzZjjBEAAAAAIMc8XB0AAAAAANxoSKQAAAAAwCISKQAAAACwiEQKAAAAACwikQIAAAAAi0ikAAAAAMAiEikAAAAAsIhECgAAAAAsIpECAAAAAItIpHBTsdlsGjRoULZl9u/fL5vNphkzZliuf9SoUbLZbDp+/PgVy5YqVUo9e/a0vI4b1YwZM2Sz2bR//35Xh+ISdrtd1apV08svv+zqUK5J586d1alTp2uux2azadSoUY7X17N99OzZU6VKlXK8Tj/mX3vttTxft3Txc8IV4uPj1bdvX4WHh8tms2nw4MEuieNm4I6f8enH6fr1610dSqasfAcDuYFECm7hjz/+UMeOHRUVFSU/Pz/dcsstatq0qd58801Xh3bdTJ069aqSP1wfn332mQ4ePHjFRD6/e/rpp/XVV19p8+bNrg5F58+f16hRo7Rs2TJXh5JBfo3tlVde0YwZM/Too4/qo48+Uvfu3V0dUr6wevVqjRo1SnFxca4O5abyyiuvaN68ea4OA7hqJFK44a1evVp16tTR5s2b1a9fP7311lvq27evPDw8NGXKFMv1RUVF6cKFCzfcD4z8nkh1795dFy5cUFRUlKtDcYkJEyaoc+fOCg4OdnUo1+S2225TnTp19Prrr+dqvVfTPs6fP6/Ro0dbTlbef/997dq1y2KE1mQX23PPPacLFy7k6fqzsmTJEt15550aOXKkHnroIdWuXdslceQ3q1ev1ujRo3M1kdq1a5fef//9XKvPHZFI4Ubn5eoAgGv18ssvKzg4WOvWrVNISIjTvKNHj1quz2azyc/PL5eiu7GlpKTIbrfLx8cnw7xz586pYMGCOa7L09NTnp6euRneVbPb7UpKSrpu7/PGjRu1efPmXE8+riQhIUE+Pj7y8Mjd/8w6deqkkSNHaurUqQoICMiVOq9H+0hvs97e3nm6nivx8vKSl5drvn6PHj2qKlWq5Fp92X1GZOZ6H3tXYvVzzApfX988qdcV0j9LIBljlJCQIH9/f1eHgnyAM1K44e3du1dVq1bNkERJUmho6BWXf+mll+Th4eG4DDCze6S2bNminj17qkyZMvLz81N4eLh69+6tEydOZFrn8ePH1alTJwUFBalIkSJ64oknlJCQcMVY4uLiNHjwYEVGRsrX11flypXTq6++Krvdnu1ypUqV0rZt27R8+XLZbDbZbDY1atTIUr2X3icyefJklS1bVr6+vtq+fbvjuvPt27era9euKlSokOrVq2dp32R2D0ypUqV033336ZdfftEdd9whPz8/lSlTRrNmzbrivpKk1157TXfddZeKFCkif39/1a5dW19++WWGcun3xn3yySeqWrWqfH19tWDBAknSv//+q969eyssLEy+vr6qWrWqPvzwQ6flk5KS9MILL6h27doKDg5WwYIFVb9+fS1dujRHcc6bN08+Pj5q0KCB0/S///5bAwYMUMWKFeXv768iRYrogQceyPQ+obi4OA0ZMkSlSpWSr6+vSpQooYcffthxL8CyZctks9k0e/ZsPffcc7rllltUoEABnTlzRpK0du1aNW/eXMHBwSpQoIAaNmyoVatWOa3j7NmzGjx4sGMdoaGhatq0qX7//Xenck2bNtW5c+e0cOHCK257YmKihgwZomLFiikwMFD333+//vnnnwzlMmsf69evV0xMjIoWLSp/f3+VLl1avXv3lpTWXosVKyZJGj16tKPdp9931bNnTwUEBGjv3r1q2bKlAgMD1a1bN8e8S++RutSkSZMUFRUlf39/NWzYUFu3bnWa36hRI6djK92ldV4ptszukUpJSdGYMWMcx12pUqX03//+V4mJiU7lrvaYSW8f+/bt0/fff++IKX1/Hz16VH369FFYWJj8/PxUs2ZNzZw506mO7D4jsnKtx54kvfnmm6pataoKFCigQoUKqU6dOvr000+dymzcuFEtWrRQUFCQAgIC1LhxY/36669OZdLb2PLlyzVgwACFhoaqRIkSGjVqlIYPHy5JKl26dIZ9s3DhQtWrV08hISEKCAhQxYoV9d///jfb/S1lvEcqff2rVq3S0KFDVaxYMRUsWFDt2rXTsWPHrlhfeps+cOCA7rvvPgUEBOiWW27R22+/LSntEvd7771XBQsWVFRUVIZ9dPLkSQ0bNkzVq1dXQECAgoKC1KJFiwyX6V7ps+Ryp06d0h133KESJUo4zvQmJiZq5MiRKleunHx9fRUZGamnnnrKqT3bbDadO3dOM2fOdOzzK91TlpO2IKV9Xvbs2VMhISEKDg5Wr169dP78eacy06dP17333qvQ0FD5+vqqSpUqeueddzLUlX7M/fTTT6pTp478/f317rvvOtZzNd/ZcB+ckcINLyoqSmvWrNHWrVtVrVo1S8s+99xzeuWVV/Tuu++qX79+WZZbuHCh/vrrL/Xq1Uvh4eHatm2b3nvvPW3btk2//vprhh9FnTp1UqlSpTR27Fj9+uuveuONN3Tq1Klsf+ycP39eDRs21L///qtHHnlEJUuW1OrVqzVixAgdPnxYkydPznLZyZMn67HHHlNAQICeffZZSVJYWNhV1Tt9+nQlJCSof//+8vX1VeHChR3zHnjgAZUvX16vvPKKjDFXtW8ut2fPHnXs2FF9+vRRjx499OGHH6pnz56qXbu2qlatmu2yU6ZM0f33369u3bopKSlJs2fP1gMPPKDvvvtOrVq1ciq7ZMkSffHFFxo0aJCKFi2qUqVK6ciRI7rzzjsdP/aKFSumH3/8UX369NGZM2ccN+KfOXNGH3zwgbp06aJ+/frp7Nmz+t///qeYmBj99ttvuvXWW7ONc/Xq1apWrVqGMyHr1q3T6tWr1blzZ5UoUUL79+/XO++8o0aNGmn79u0qUKCApLQOAurXr68dO3aod+/eqlWrlo4fP6758+frn3/+UdGiRR11jhkzRj4+Pho2bJgSExPl4+OjJUuWqEWLFqpdu7ZGjhwpDw8Px4+IlStX6o477pAk/ec//9GXX36pQYMGqUqVKjpx4oR++eUX7dixQ7Vq1XKso0qVKvL399eqVavUrl27bLe9b9+++vjjj9W1a1fdddddWrJkSYb3JjNHjx5Vs2bNVKxYMT3zzDMKCQnR/v37NXfuXElSsWLF9M477+jRRx9Vu3bt1L59e0lSjRo1HHWkpKQoJiZG9erV02uvvebYn1mZNWuWzp49q4EDByohIUFTpkzRvffeqz/++MNxPOVETmK7XN++fTVz5kx17NhRTz75pNauXauxY8dqx44d+vrrr53KXs0xU7lyZX300UcaMmSISpQooSeffNIR64ULF9SoUSPt2bNHgwYNUunSpTVnzhz17NlTcXFxeuKJJ5zqyu4zIjPXcuy9//77evzxx9WxY0fHH1JbtmzR2rVr1bVrV0nStm3bVL9+fQUFBempp56St7e33n33XTVq1EjLly9X3bp1neIZMGCAihUrphdeeEHnzp1TixYt9Oeff+qzzz7TpEmTHMdTsWLFtG3bNt13332qUaOGXnzxRfn6+mrPnj0Z/oSw4rHHHlOhQoU0cuRI7d+/X5MnT9agQYP0+eefX3HZ1NRUtWjRQg0aNND48eP1ySefaNCgQSpYsKCeffZZdevWTe3bt9e0adP08MMPKzo6WqVLl5Yk/fXXX5o3b54eeOABlS5dWkeOHNG7776rhg0bavv27SpevLjTujL7LLnc8ePH1bRpU508eVLLly9X2bJlZbfbdf/99+uXX35R//79VblyZf3xxx+aNGmS/vzzT8elfB999JH69u2rO+64Q/3795cklS1bNsttz0lbSNepUyeVLl1aY8eO1e+//64PPvhAoaGhevXVVx1l3nnnHVWtWlX333+/vLy89O2332rAgAGy2+0aOHCgU327du1Sly5d9Mgjj6hfv36qWLHiNX1nw40Y4Ab3888/G09PT+Pp6Wmio6PNU089ZX766SeTlJSUoawkM3DgQGOMMU8++aTx8PAwM2bMcCqzb98+I8lMnz7dMe38+fMZ6vrss8+MJLNixQrHtJEjRxpJ5v7773cqO2DAACPJbN682TEtKirK9OjRw/F6zJgxpmDBgubPP/90WvaZZ54xnp6e5sCBA9nuh6pVq5qGDRtmmJ7TetO3OygoyBw9etSpbPp2denSJUP9Od0306dPN5LMvn37HNOioqIylDt69Kjx9fU1Tz75ZLbbm9m6k5KSTLVq1cy9997rNF2S8fDwMNu2bXOa3qdPHxMREWGOHz/uNL1z584mODjYUX9KSopJTEx0KnPq1CkTFhZmevfufcU4S5QoYTp06HDF+I0xZs2aNUaSmTVrlmPaCy+8YCSZuXPnZihvt9uNMcYsXbrUSDJlypRxqtdut5vy5cubmJgYR9n0dZcuXdo0bdrUMS04ONhxfFxJhQoVTIsWLbIts2nTJiPJDBgwwGl6165djSQzcuRIx7TL28fXX39tJJl169ZlWf+xY8cy1JOuR48eRpJ55plnMp0XFRXleJ3e9v39/c0///zjmL527VojyQwZMsQxrWHDhpkeZ5fXmV1s6cdTuvT91LdvX6dyw4YNM5LMkiVLHNOu9ZiJiooyrVq1cpo2efJkI8l8/PHHjmlJSUkmOjraBAQEmDNnzhhjsv+MyMq1Hntt2rQxVatWzXYdbdu2NT4+Pmbv3r2OaYcOHTKBgYGmQYMGjmnpbaxevXomJSXFqY4JEyZk+HwyxphJkyYZSebYsWM52t5LXf4Zn77+Jk2aOB2LQ4YMMZ6eniYuLi7b+tLb9CuvvOKYdurUKePv729sNpuZPXu2Y/rOnTsztL+EhASTmprqVOe+ffuMr6+vefHFFx3TsvosuXQb1q1bZw4fPmyqVq1qypQpY/bv3+8o89FHHxkPDw+zcuVKp2WnTZtmJJlVq1Y5phUsWNBpH2UnJ20h/di6/HO5Xbt2pkiRIk7TMvv8jYmJMWXKlHGaln7MLViwwGn6tX5nwz1waR9ueE2bNtWaNWt0//33a/PmzRo/frxiYmJ0yy23aP78+RnKG2M0aNAgTZkyRR9//LF69OhxxXVcei10QkKCjh8/rjvvvFOSMlz2JCnDv1mPPfaYJOmHH37Ich1z5sxR/fr1VahQIR0/ftwxNGnSRKmpqVqxYsUV48yNejt06OC4LOly//nPfzJMs7pvLlelShXVr1/f8bpYsWKqWLGi/vrrrysue+m6T506pdOnT6t+/fqZrrdhw4ZO94YYY/TVV1+pdevWMsY47ZuYmBidPn3aUY+np6fj31i73a6TJ08qJSVFderUydE2njhxQoUKFco2/uTkZJ04cULlypVTSEiIU71fffWVatasmenZn8vP+PXo0cOp3k2bNmn37t3q2rWrTpw44djGc+fOqXHjxlqxYoXjMpSQkBCtXbtWhw4duuI2pben7KS398cff9xpek663E6/VPe7775TcnLyFctn5dFHH81x2bZt2+qWW25xvL7jjjtUt27dbI/b3JBe/9ChQ52mp581+v77752mX8sxk9X6w8PD1aVLF8c0b29vPf7444qPj9fy5cudymf3GZGZazn2QkJC9M8//2jdunWZ1p2amqqff/5Zbdu2VZkyZRzTIyIi1LVrV/3yyy8ZLknr169fju/HS2+H33zzTa5drtW/f3+n47Z+/fpKTU3V33//naPl+/bt6xRfxYoVVbBgQafHElSsWFEhISFObcLX19dxv2RqaqpOnDjhuFQxs8+xyz9LLvXPP/+oYcOGSk5O1ooVK5w6iZkzZ44qV66sSpUqOb239957ryTl+JLoy12pLVzq8u+q+vXr68SJE05t4dJtO336tI4fP66GDRvqr7/+0unTp52WL126tGJiYpym5dV3Nm4sXNoHt3D77bdr7ty5SkpK0ubNm/X1119r0qRJ6tixozZt2uT0JT5r1izFx8frnXfecfrhkJ2TJ09q9OjRmj17doYOLC7/wJWk8uXLO70uW7asPDw8sn1Gzu7du7Vly5Ysf6BcTccZV1Nv+mUgmclsntV9c7mSJUtmmFaoUCGdOnXqist+9913eumll7Rp06YM195fKfZjx44pLi5O7733nt57771M6790e2bOnKnXX39dO3fudPphn93+upT5/0shL3XhwgWNHTtW06dP17///utU5tJ9t3fvXnXo0CFH67k8nt27d0tStn8YnD59WoUKFdL48ePVo0cPRUZGqnbt2mrZsqUefvhhpx+ol27PlS7b/Pvvv+Xh4ZHhcp2KFStecTsaNmyoDh06aPTo0Zo0aZIaNWqktm3bqmvXrjm+id/Ly0slSpTIUVkp43ErSRUqVNAXX3yR4zquRvp+KleunNP08PBwhYSEZPiBfS3HTFbrL1++fIZOSSpXruyYf6mctvmsyls59p5++mktWrRId9xxh8qVK6dmzZqpa9euuvvuux11nT9/PtM2VblyZdntdh08eNDpkkcr8T/44IP64IMP1LdvXz3zzDNq3Lix2rdvr44dO151Jy6Xv3/pf7Lk5P3z8/PL8FkeHBysEiVKZDgeg4ODneq02+2aMmWKpk6dqn379ik1NdUxr0iRIhnWld1+6t69u7y8vLRjxw6Fh4c7zdu9e7d27NiR699lV2oLl8puHwcFBUmSVq1apZEjR2rNmjUZ7p86ffq0Uw+rme2LvPrOxo2FRApuxcfHR7fffrtuv/12VahQQb169dKcOXM0cuRIR5m7775bmzZt0ltvvaVOnTpd8fp+Ke1669WrV2v48OG69dZbFRAQILvdrubNm+foX8qcPHzTbreradOmeuqppzKdX6FChSvWkRv1ZtcTUWbzrnXfZPXPcGaJx6VWrlyp+++/Xw0aNNDUqVMVEREhb29vTZ8+PdObjy+PPT22hx56KMskI/2elo8//lg9e/ZU27ZtNXz4cIWGhsrT01Njx47V3r17r7iNRYoUyfRH0mOPPabp06dr8ODBio6OVnBwsGw2mzp37nzV/35ntZ0TJkzI8l6u9J73OnXqpPr16+vrr7/Wzz//rAkTJujVV1/V3Llz1aJFC6dlTp06lWnikVtsNpu+/PJL/frrr/r222/1008/qXfv3nr99df166+/5qi3wEv/gc/NuDJrm5f+KL2WunPiao+Z3GK1t7JrOfYqV66sXbt26bvvvtOCBQv01VdfaerUqXrhhRc0evToq4jeWvz+/v5asWKFli5dqu+//14LFizQ559/rnvvvVc///zzVfU0eS3vX1bL5qTOV155Rc8//7x69+6tMWPGqHDhwvLw8NDgwYMz/bzJbj+1b99es2bN0pQpUzR27FineXa7XdWrV9fEiRMzXTYyMjLLerNjpS1caX/s3btXjRs3VqVKlTRx4kRFRkbKx8dHP/zwgyZNmpRhf2S2L/LqOxs3FhIpuK06depIkg4fPuw0vVy5cho/frwaNWqk5s2ba/HixQoMDMyynlOnTmnx4sUaPXq0XnjhBcf09H/6M7N7926nf7D27Nkju92eZU9hUtpZq/j4eDVp0uRKm5aprH6EXWu92bmafZNbvvrqK/n5+emnn35yOkMxffr0HC2f3otcamrqFffNl19+qTJlymju3LlO+/nSBD07lSpV0r59+zKtt0ePHk7doickJGR4lk3ZsmUz9B6XU+lng4KCgnLUBiIiIjRgwAANGDBAR48eVa1atfTyyy87JVIpKSk6ePCg7r///mzrioqKkt1u1969e53OGFh5htOdd96pO++8Uy+//LI+/fRTdevWTbNnz1bfvn1znHjkVGbt9s8//3Q6bgsVKpTpJXSXn7WxElv6ftq9e7fjLJAkHTlyRHFxcXn+7LWoqCht2bJFdrvdKfHcuXOnY35usnLsSVLBggX14IMP6sEHH1RSUpLat2+vl19+WSNGjFCxYsVUoECBTNvUzp075eHhkaMf7tm9Xx4eHmrcuLEaN26siRMn6pVXXtGzzz6rpUuX5snnal758ssvdc899+h///uf0/S4uDinDmty4rHHHlO5cuX0wgsvKDg4WM8884xjXtmyZbV582Y1btz4iseB1WM4u7ZgpUv9b7/9VomJiZo/f77T2Ssrlx3m5XcrbhzcI4Ub3tKlSzP9Jy/9voPMLvmoUaOGfvjhB+3YsUOtW7fO9uGY6f9sXb6O7HrkSe+ONl161+qX/6t/qU6dOmnNmjX66aefMsyLi4tTSkpKlstKaV8wmT1M8lrrzc7V7Jvc4unpKZvN5nQmYP/+/Tl+uKOnp6c6dOigr776KtMk5dLuiDPbzrVr12rNmjU5Wld0dLS2bt2aoStrT0/PDPvuzTffzHB2o0OHDo5LVi93pX+xa9eurbJly+q1115TfHx8hvnp25mamprhUszQ0FAVL148Q9zbt29XQkKC7rrrrmzXnd7e33jjDafpOWkfp06dyrBt6WfU0uNJ74Uvtx6iOm/ePP3777+O17/99pvWrl3rdNyWLVtWO3fudGofmzdvztCLm5XYWrZsKSnjfkn/Rz8nvRxei5YtWyo2Ntap17iUlBS9+eabCggIUMOGDXN1fVaOvcsfo+Dj46MqVarIGKPk5GR5enqqWbNm+uabb5wunT5y5Ig+/fRT1atXz3EpV3bSnyV1+ft18uTJDGUvb4c3isw+b+bMmePU5q14/vnnNWzYMI0YMcKp2/BOnTrp33//zfRhxBcuXNC5c+ccr7P63srMldqCFZl9pp8+fTrHf8RJefvdihsHZ6Rww3vsscd0/vx5tWvXTpUqVVJSUpJWr16tzz//XKVKlVKvXr0yXe7OO+/UN998o5YtW6pjx46aN29epg/qDAoKcnQ1m5ycrFtuuUU///xzpmcY0u3bt0/333+/mjdvrjVr1ji6f65Zs2aWywwfPlzz58/Xfffd5+jK+Ny5c/rjjz/05Zdfav/+/dn+a1i7dm298847eumll1SuXDmFhobq3nvvveZ6s3M1+ya3tGrVShMnTlTz5s3VtWtXHT16VG+//bbKlSunLVu25KiOcePGaenSpapbt6769eunKlWq6OTJk/r999+1aNEix4+o++67T3PnzlW7du3UqlUr7du3T9OmTVOVKlUyTU4u16ZNG40ZM0bLly9Xs2bNHNPvu+8+ffTRRwoODlaVKlW0Zs0aLVq0KMP9CsOHD9eXX36pBx54QL1791bt2rV18uRJzZ8/X9OmTcu2XXl4eOiDDz5QixYtVLVqVfXq1Uu33HKL/v33Xy1dulRBQUH69ttvdfbsWZUoUUIdO3ZUzZo1FRAQoEWLFmndunUZHiS8cOFCFShQQE2bNs12u2+99VZ16dJFU6dO1enTp3XXXXdp8eLF2rNnzxX32cyZMzV16lS1a9dOZcuW1dmzZ/X+++8rKCjIkXj4+/urSpUq+vzzz1WhQgUVLlxY1apVs/wYhHTlypVTvXr19OijjyoxMVGTJ09WkSJFnC7d6d27tyZOnKiYmBj16dNHR48e1bRp01S1atUMN7LnNLaaNWuqR48eeu+99xQXF6eGDRvqt99+08yZM9W2bVvdc889V7U9OdW/f3+9++676tmzpzZs2KBSpUrpyy+/1KpVqzR58uRsz9hfrZwee82aNVN4eLjuvvtuhYWFaceOHXrrrbfUqlUrR1wvvfSS41lPAwYMkJeXl959910lJiZq/PjxOYqndu3akqRnn31WnTt3lre3t1q3bq0XX3xRK1asUKtWrRQVFaWjR49q6tSpKlGihONZejeK++67Ty+++KJ69eqlu+66S3/88Yc++eSTTO+BzKkJEybo9OnTGjhwoAIDA/XQQw+pe/fu+uKLL/Sf//xHS5cu1d13363U1FTt3LlTX3zxheN5TFLafl+0aJEmTpyo4sWLq3Tp0hm6q0+Xk7aQU82aNZOPj49at26tRx55RPHx8Xr//fcVGhqa4SqWrOTldytuINetf0Agj/z444+md+/eplKlSiYgIMD4+PiYcuXKmccee8wcOXLEqawu6f483TfffGO8vLzMgw8+aFJTUzPt/vyff/4x7dq1MyEhISY4ONg88MAD5tChQxm6l03venX79u2mY8eOJjAw0BQqVMgMGjTIXLhwwWm9l3eNa4wxZ8+eNSNGjDDlypUzPj4+pmjRouauu+4yr732WqbduV8qNjbWtGrVygQGBhpJTl0056Te9O2eMGFChrrTtyuzLoBzum+y6v788q6Yjcm6i+nL/e9//zPly5c3vr6+plKlSmb69OkZupY2JvP3Pd2RI0fMwIEDTWRkpPH29jbh4eGmcePG5r333nOUsdvt5pVXXjFRUVHG19fX3Hbbbea7777L0OV1dmrUqGH69OnjNO3UqVOmV69epmjRoiYgIMDExMSYnTt3Zto2Tpw4YQYNGmRuueUW4+PjY0qUKGF69Ojh6D46vcviOXPmZLr+jRs3mvbt25siRYoYX19fExUVZTp16mQWL15sjDEmMTHRDB8+3NSsWdMEBgaaggULmpo1a5qpU6dmqKtu3brmoYceytF2X7hwwTz++OOmSJEipmDBgqZ169bm4MGDV2wfv//+u+nSpYspWbKk8fX1NaGhoea+++4z69evd6p/9erVpnbt2sbHx8epzh49epiCBQtmGlNW3Z9PmDDBvP766yYyMtL4+vqa+vXrOz2yIN3HH39sypQpY3x8fMytt95qfvrpp0zbQlaxZdZGk5OTzejRo03p0qWNt7e3iYyMNCNGjDAJCQlO5a71mMlq+SNHjjjaoo+Pj6levbrTZ+Dl+ymnrvXYe/fdd02DBg0c7bZs2bJm+PDh5vTp0051/f777yYmJsYEBASYAgUKmHvuucesXr3aqcylXXdnZsyYMeaWW24xHh4ejra4ePFi06ZNG1O8eHHj4+Njihcvbrp06ZKhy+vMZNX9+eXrTz92ly5dmm19WbXphg0bZtot+OXvdUJCgnnyySdNRESE8ff3N3fffbdZs2ZNhraT3WdJZtuQmppqunTpYry8vMy8efOMMWnd57/66qumatWqxtfX1xQqVMjUrl3bjB492um927lzp2nQoIHx9/c3krLtCj0nbSGr76rMvn/mz59vatSoYfz8/EypUqXMq6++aj788MMcf08Zc23f2XAPNmOu092pAHAT++ijjzRw4EAdOHDA0aXyjWjTpk2qVauWfv/99ys+iBgAAHdGIgUA14HdbleNGjXUpUsXPfvss64O56ql9yiY112CAwCQ35FIAQAAAIBF9NoHAAAAABaRSAEAAACARSRSAAAAAGARiRQAAAAAWMQDeZXWm9ahQ4cUGBgom83m6nAAAAAAuIgxRmfPnlXx4sXl4ZH1eScSKUmHDh1SZGSkq8MAAAAAkE8cPHhQJUqUyHI+iZSkwMBASWk7KygoyMXR4KrY7dKRI2njYWFSNv8eAAAAAFk5c+aMIiMjHTlCVkikJMflfEFBQSRSN6oLF6Ru3dLGV66U/P1dGw8AAABuaFe65Ye/7QEAAADAIhIpAAAAALCIRAoAAAAALOIeKQAAANzwjDFKSUlRamqqq0NBPufp6SkvL69rfuwRiRQAAABuaElJSTp8+LDOnz/v6lBwgyhQoIAiIiLk4+Nz1XWQSAEAAOCGZbfbtW/fPnl6eqp48eLy8fG55jMNcF/GGCUlJenYsWPat2+fypcvn+1Dd7NDIgX34OkpPfDAxXEAAHBTSEpKkt1uV2RkpAoUKODqcHAD8Pf3l7e3t/7++28lJSXJz8/vquohkYJ78PGRnn7a1VEAAAAXudqzCrg55UZ7ocUBAAAAgEWckYJ7MEaKi0sbDwmRuDYaAAAAeYgzUnAPCQlS06ZpQ0KCq6MBAADIFfv375fNZtOmTZtcHUq+UqpUKU2ePNmlMZBIAQAAAC7Qs2dP2Ww2x1CkSBE1b95cW7ZscZSJjIzU4cOHVa1atWtaV6lSpWSz2TR79uwM86pWrSqbzaYZM2Zc0zquhs1m07x58ywvt27dOvXv3z/3A7KARAoAAABwkebNm+vw4cM6fPiwFi9eLC8vL913332O+Z6engoPD5eX17XfkRMZGanp06c7Tfv1118VGxurggULXnP911OxYsVc3ksjiRQAAADc04ULWQ9JSTkvm5iYs7JXwdfXV+Hh4QoPD9ett96qZ555RgcPHtSxY8ckZX5p3/z581W+fHn5+fnpnnvu0cyZM2Wz2RSXfr94Frp166bly5fr4MGDjmkffvihunXrliFRO3DggNq0aaOAgAAFBQWpU6dOOnLkiGN+z5491bZtW6dlBg8erEaNGjleN2rUSI8//rieeuopFS5cWOHh4Ro1apRjfqlSpSRJ7dq1k81mc7zeu3ev2rRpo7CwMAUEBOj222/XokWLnNZ1+aV9NptNH3zwgdq1a6cCBQqofPnymj9/frb741qRSAEAAMA91a+f9TB8uHPZpk2zLvvYY85lW7fOvNw1io+P18cff6xy5cqpSJEimZbZt2+fOnbsqLZt22rz5s165JFH9Oyzz+ao/rCwMMXExGjmzJmSpPPnz+vzzz9X7969ncrZ7Xa1adNGJ0+e1PLly7Vw4UL99ddfevDBBy1v08yZM1WwYEGtXbtW48eP14svvqiFCxdKSrs8T5KmT5+uw4cPO17Hx8erZcuWWrx4sTZu3KjmzZurdevWOnDgQLbrGj16tDp16qQtW7aoZcuW6tatm06ePGk55pyi1758qNQz3+dZ3fvHtcqzugEAAGDNd999p4CAAEnSuXPnFBERoe+++y7L5xy9++67qlixoiZMmCBJqlixorZu3aqXX345R+vr3bu3nnzyST377LP68ssvVbZsWd16661OZRYvXqw//vhD+/btU2RkpCRp1qxZqlq1qtatW6fbb789x9tXo0YNjRw5UpJUvnx5vfXWW1q8eLGaNm2qYsWKSZJCQkIUHh7uWKZmzZqqWbOm4/WYMWP09ddfa/78+Ro0aFCW6+rZs6e6dOkiSXrllVf0xhtv6LffflPz5s1zHK8VJFIAAABwTytXZj3P09P59f+fJcnU5UnNt99efUyXueeee/TOO+9Ikk6dOqWpU6eqRYsW+u233xQVFZWh/K5duzIkMnfccUeO19eqVSs98sgjWrFihT788MMMZ6MkaceOHYqMjHQkUZJUpUoVhYSEaMeOHZYTqUtFRETo6NGj2S4THx+vUaNG6fvvv9fhw4eVkpKiCxcuXPGM1KXrKliwoIKCgq64rmtBIgX34Okppd+YefkHIwAAuDn5+7u+7BUULFhQ5cqVc7z+4IMPFBwcrPfff18vvfRSrq0nnZeXl7p3766RI0dq7dq1+vrrr6+qHg8PDxljnKYlJydnKOft7e302mazyW63Z1v3sGHDtHDhQr322msqV66c/P391bFjRyVdfl9bLqzrWpBIwT34+EiX3LwIAABwI7LZbPLw8NCFLDqvqFixon744Qenaen3FuVU79699dprr+nBBx9UoUKFMsyvXLmyDh48qIMHDzrOSm3fvl1xcXGqUqWKpLRe87Zu3eq03KZNmzIkM1fi7e2t1NRUp2mrVq1Sz5491a5dO0lpZ6j2799vqd7rgc4mAAAAABdJTExUbGysYmNjtWPHDj322GOKj49X69atMy3/yCOPaOfOnXr66af1559/6osvvnA8/8lms+VonZUrV9bx48czdIWerkmTJqpevbq6deum33//Xb/99psefvhhNWzYUHXq1JEk3XvvvVq/fr1mzZql3bt3a+TIkRkSq5woVaqUFi9erNjYWJ06dUpS2r1Uc+fO1aZNm7R582Z17do1T88sXS2XJlJjx47V7bffrsDAQIWGhqpt27batWuXU5lGjRo5PajMZrPpP//5j1OZAwcOqFWrVipQoIBCQ0M1fPhwpaSkXM9NgasZc7Hr0ctOMwMAAORXCxYsUEREhCIiIlS3bl2tW7dOc+bMcepG/FKlS5fWl19+qblz56pGjRp65513HL32+fr65ni9RYoUkX8WlyjabDZ98803KlSokBo0aKAmTZqoTJky+vzzzx1lYmJi9Pzzz+upp57S7bffrrNnz+rhhx/O+Yb/v9dff10LFy5UZGSkbrvtNknSxIkTVahQId11111q3bq1YmJiVKtWLct15zWbufzixuuoefPm6ty5s26//XalpKTov//9r7Zu3art27c7HgrWqFEjVahQQS+++KJjuQIFCigoKEiSlJqaqltvvVXh4eGaMGGCDh8+rIcfflj9+vXTK6+8kqM4zpw5o+DgYJ0+fdpRryvRa99VuHDhYrejK1fm6rXLAAAg/0pISNC+fftUunRp+fn5uTocl3j55Zc1bdo0p+dDIXvZtZuc5gYuvUdqwYIFTq9nzJih0NBQbdiwQQ0aNHBML1CggFOXiJf6+eeftX37di1atEhhYWG69dZbNWbMGD399NMaNWqUfHx88nQbAAAAgOtp6tSpuv3221WkSBGtWrVKEyZMyLZbcOSNfHWP1OnTpyVJhQsXdpr+ySefqGjRoqpWrZpGjBih8+fPO+atWbNG1atXV1hYmGNaTEyMzpw5o23btmW6nsTERJ05c8ZpAAAAAG4Eu3fvVps2bVSlShWNGTNGTz75pEbR6dZ1l2967bPb7Ro8eLDuvvtuVatWzTG9a9euioqKUvHixbVlyxY9/fTT2rVrl+bOnStJio2NdUqiJDlex8bGZrqusWPHavTo0Xm0JQAAAEDemTRpkiZNmuTqMG56+SaRGjhwoLZu3apffvnFaXr//v0d49WrV1dERIQaN26svXv3qmzZsle1rhEjRmjo0KGO12fOnHF64BgAAAAAZCdfXNo3aNAgfffdd1q6dKlKlCiRbdm6detKkvbs2SNJCg8P15EjR5zKpL/O6r4qX19fBQUFOQ0AAAC4cbmw/zTcgHKjvbg0kTLGaNCgQfr666+1ZMkSlS5d+orLbNq0SZIUEREhSYqOjtYff/yho0ePOsosXLhQQUFBjgeGAQAAwD2lPwD20nvogStJby9WHyB8KZde2jdw4EB9+umn+uabbxQYGOi4pyk4OFj+/v7au3evPv30U7Vs2VJFihTRli1bNGTIEDVo0EA1atSQJDVr1kxVqlRR9+7dNX78eMXGxuq5557TwIEDLfWljxucp6fUuPHFcQAAcFPw9PRUSEiI40/1AgUK5PjBtLj5GGN0/vx5HT16VCEhIfK8ht+NLn2OVFaNfPr06erZs6cOHjyohx56SFu3btW5c+cUGRmpdu3a6bnnnnO6HO/vv//Wo48+qmXLlqlgwYLq0aOHxo0bJy+vnOWJPEcKAADgxmWMUWxsrOLi4lwdCm4QISEhCg8PzzQfuSGeI3WlHC4yMlLLly+/Yj1RUVH64YcfcissAAAA3EBsNpsiIiIUGhqq5ORkV4eDfM7b2/uazkSlyze99gEAAADXwtPTM1d+IAM5kS967QOu2YULUp06acOFC66OBgAAAG6ORAoAAAAALCKRAgAAAACLSKQAAAAAwCISKQAAAACwiEQKAAAAACwikQIAAAAAi3iOFNyDp6d0990XxwEAAIA8RCIF9+DjI02Z4uooAAAAcJPg0j4AAAAAsIhECgAAAAAsIpGCe7hwQapXL224cMHV0QAAAMDNcY8U3EdCgqsjAAAAwE2CM1IAAAAAYBGJFAAAAABYRCIFAAAAABaRSAEAAACARSRSAAAAAGARvfbBPXh4SLVqXRwHAAAA8hCJFNyDr6/03nuujgIAAAA3Cf66BwAAAACLSKQAAAAAwCISKbiHCxekJk3ShgsXXB0NAAAA3Bz3SMF9xMW5OgIAAADcJDgjBQAAAAAWkUgBAAAAgEUkUgAAAABgEYkUAAAAAFhEIgUAAAAAFtFrH9yDh4dUpcrFcQAAACAPkUjBPfj6SrNmuToKAAAA3CT46x4AAAAALCKRAgAAAACLSKTgHhISpNat04aEBFdHAwAAADfHPVJwD8ZIhw9fHAcAAADyEGekAAAAAMAiEikAAAAAsIhECgAAAAAsIpECAAAAAItIpAAAAADAInrtg3uw2aQyZS6OAwAAAHmIRAruwc9P+uILV0cBAACAmwSX9gEAAACARSRSAAAAAGARiRTcQ0KC1KlT2pCQ4OpoAAAA4Oa4RwruwRjpr78ujgMAAAB5iDNSAAAAAGARiRQAAAAAWEQiBQAAAAAWkUgBAAAAgEUkUgAAAABgEb32wT3YbFJExMVxAAAAIA+RSME9+PlJ337r6igAAABwk+DSPgAAAACwiEQKAAAAACwikYJ7SEyUHn44bUhMdHU0AAAAcHPcIwX3YLdL27dfHAcAAADyEGekAAAAAMAiEikAAAAAsMilidTYsWN1++23KzAwUKGhoWrbtq127drlVCYhIUEDBw5UkSJFFBAQoA4dOujIkSNOZQ4cOKBWrVqpQIECCg0N1fDhw5WSknI9NwUAAADATcSlidTy5cs1cOBA/frrr1q4cKGSk5PVrFkznTt3zlFmyJAh+vbbbzVnzhwtX75chw4dUvv27R3zU1NT1apVKyUlJWn16tWaOXOmZsyYoRdeeMEVmwQAAADgJmAzxhhXB5Hu2LFjCg0N1fLly9WgQQOdPn1axYoV06effqqOHTtKknbu3KnKlStrzZo1uvPOO/Xjjz/qvvvu06FDhxQWFiZJmjZtmp5++mkdO3ZMPj4+V1zvmTNnFBwcrNOnTysoKChPtzEnSj3zfZ7VvX9cqzyr26UuXJDq108bX7lS8vd3bTwAAAC4IeU0N8hX90idPn1aklS4cGFJ0oYNG5ScnKwmTZo4ylSqVEklS5bUmjVrJElr1qxR9erVHUmUJMXExOjMmTPatm1bputJTEzUmTNnnAa4gZCQtAEAAADIY/kmkbLb7Ro8eLDuvvtuVatWTZIUGxsrHx8fhVz24zgsLEyxsbGOMpcmUenz0+dlZuzYsQoODnYMkZGRubw1uO78/aVFi9IGzkYBAAAgj+WbRGrgwIHaunWrZs+enefrGjFihE6fPu0YDh48mOfrBAAAAOA+8sUDeQcNGqTvvvtOK1asUIkSJRzTw8PDlZSUpLi4OKezUkeOHFF4eLijzG+//eZUX3qvfullLufr6ytfX99c3goAAAAANwuXnpEyxmjQoEH6+uuvtWTJEpUuXdppfu3ateXt7a3Fixc7pu3atUsHDhxQdHS0JCk6Olp//PGHjh496iizcOFCBQUFqUqVKtdnQ+B6iYlS//5pQ2Kiq6MBAACAm3PpGamBAwfq008/1TfffKPAwEDHPU3BwcHy9/dXcHCw+vTpo6FDh6pw4cIKCgrSY489pujoaN15552SpGbNmqlKlSrq3r27xo8fr9jYWD333HMaOHAgZ51uJna79PvvF8cBAACAPOTSROqdd96RJDVq1Mhp+vTp09WzZ09J0qRJk+Th4aEOHTooMTFRMTExmjp1qqOsp6envvvuOz366KOKjo5WwYIF1aNHD7344ovXazMAAAAA3GRcmkjl5BFWfn5+evvtt/X2229nWSYqKko//PBDboYGAAAAAFnKN732AQAAAMCNgkQKAAAAACwikQIAAAAAi/LFc6SAXOHn5+oIAAAAcJMgkYJ78PeXfvnF1VEAAADgJsGlfQAAAABgEYkUAAAAAFhEIgX3kJQkPfFE2pCU5OpoAAAA4Oa4RwruITVVWrXq4jgAAACQhzgjBQAAAAAWkUgBAAAAgEUkUgAAAABgEYkUAAAAAFhEIgUAAAAAFpFIAQAAAIBFdH8O9+DvL61f7+ooAAAAcJPgjBQAAAAAWEQiBQAAAAAWkUjBPSQlSU8/nTYkJbk6GgAAALg5Eim4h9RUafHitCE11dXRAAAAwM2RSAEAAACARSRSAAAAAGARiRQAAAAAWEQiBQAAAAAWkUgBAAAAgEUkUgAAAABgkZerAwByhZ+ftHLlxXEAAAAgD5FIwT3YbJK/v6ujAAAAwE2CS/sAAAAAwCLOSME9JCVJr7ySNv7f/0o+Pq6NBwAAAG6NM1JwD6mp0nffpQ2pqa6OBgAAAG6ORAoAAAAALCKRAgAAAACLSKQAAAAAwCISKQAAAACwiEQKAAAAACwikQIAAAAAi3iOFNyDn5+0cOHFcQAAACAPkUjBPdhsUqFCro4CAAAANwku7QMAAAAAizgjBfeQlCRNmpQ2PmSI5OPj2ngAAADg1jgjBfeQmirNmZM2pKa6OhoAAAC4ORIpAAAAALCIRAoAAAAALCKRAgAAAACLSKQAAAAAwCISKQAAAACwiEQKAAAAACziOVJwD76+0vz5F8cBAACAPEQiBffg4SEVL+7qKAAAAHCT4NI+AAAAALCIM1JwD8nJ0tSpaeMDBkje3q6NBwAAAG6NM1JwDykp0kcfpQ0pKa6OBgAAAG6ORAoAAAAALCKRAgAAAACLLCdSf/31V17EAQAAAAA3DMuJVLly5XTPPffo448/VkJCQl7EBAAAAAD5muVE6vfff1eNGjU0dOhQhYeH65FHHtFvv/2WF7EBAAAAQL5kOZG69dZbNWXKFB06dEgffvihDh8+rHr16qlatWqaOHGijh07lhdxAgAAAEC+cdWdTXh5eal9+/aaM2eOXn31Ve3Zs0fDhg1TZGSkHn74YR0+fDg34wSy5+srffFF2uDr6+poAAAA4OauOpFav369BgwYoIiICE2cOFHDhg3T3r17tXDhQh06dEht2rTJzTiB7Hl4SGXKpA0edEYJAACAvGX5F+fEiRNVvXp13XXXXTp06JBmzZqlv//+Wy+99JJKly6t+vXra8aMGfr999+vWNeKFSvUunVrFS9eXDabTfPmzXOa37NnT9lsNqehefPmTmVOnjypbt26KSgoSCEhIerTp4/i4+OtbhYAAAAA5JiX1QXeeecd9e7dWz179lRERESmZUJDQ/W///3vinWdO3dONWvWVO/evdW+fftMyzRv3lzTp093vPa97LKtbt266fDhw1q4cKGSk5PVq1cv9e/fX59++qmFrcINLzlZSm8nvXpJ3t6ujQcAAABuzXIitXv37iuW8fHxUY8ePa5YrkWLFmrRokW2ZXx9fRUeHp7pvB07dmjBggVat26d6tSpI0l688031bJlS7322msqXrz4FWOAm0hJkd57L228e3cSKQAAAOQpy5f2TZ8+XXPmzMkwfc6cOZo5c2auBHWpZcuWKTQ0VBUrVtSjjz6qEydOOOatWbNGISEhjiRKkpo0aSIPDw+tXbs2yzoTExN15swZpwEAAAAAcspyIjV27FgVLVo0w/TQ0FC98soruRJUuubNm2vWrFlavHixXn31VS1fvlwtWrRQamqqJCk2NlahoaFOy3h5ealw4cKKjY3NdhuCg4MdQ2RkZK7GDQAAAMC9Wb6078CBAypdunSG6VFRUTpw4ECuBJWuc+fOjvHq1aurRo0aKlu2rJYtW6bGjRtfdb0jRozQ0KFDHa/PnDlDMgUAAAAgxyyfkQoNDdWWLVsyTN+8ebOKFCmSK0FlpUyZMipatKj27NkjSQoPD9fRo0edyqSkpOjkyZNZ3lclpd13FRQU5DQAAAAAQE5ZTqS6dOmixx9/XEuXLlVqaqpSU1O1ZMkSPfHEE05nkPLCP//8oxMnTjh6C4yOjlZcXJw2bNjgKLNkyRLZ7XbVrVs3T2MBAAAAcPOyfGnfmDFjtH//fjVu3FheXmmL2+12Pfzww5bvkYqPj3ecXZKkffv2adOmTSpcuLAKFy6s0aNHq0OHDgoPD9fevXv11FNPqVy5coqJiZEkVa5cWc2bN1e/fv00bdo0JScna9CgQercuTM99gEAAADIMzZjjLmaBf/8809t3rxZ/v7+ql69uqKioizXsWzZMt1zzz0Zpvfo0UPvvPOO2rZtq40bNyouLk7FixdXs2bNNGbMGIWFhTnKnjx5UoMGDdK3334rDw8PdejQQW+88YYCAgJyHMeZM2cUHBys06dP54vL/Eo9832e1b1/XKs8q9ul7HZp58608UqVJA/LJ1sBAACAHOcGV51IuRMSKQAAAABSznMDy5f2paamasaMGVq8eLGOHj0qu93uNH/JkiXWowUAAACAG4jlROqJJ57QjBkz1KpVK1WrVk02my0v4gKsSU6WPvssbbxLF8nb27XxAAAAwK1ZTqRmz56tL774Qi1btsyLeICrk5IivfFG2vgDD5BIAQAAIE9ZviPfx8dH5cqVy4tYAAAAAOCGYDmRevLJJzVlyhTRRwUAAACAm5XlS/t++eUXLV26VD/++KOqVq0q78suoZo7d26uBQcAAAAA+ZHlRCokJETt2rXLi1gAAAAA4IZgOZGaPn16XsQBAAAAADcMy/dISVJKSooWLVqkd999V2fPnpUkHTp0SPHx8bkaHAAAAADkR5bPSP39999q3ry5Dhw4oMTERDVt2lSBgYF69dVXlZiYqGnTpuVFnED2fH2ld9+9OA4AAADkIctnpJ544gnVqVNHp06dkr+/v2N6u3bttHjx4lwNDsgxDw+pdu20weOqTrQCAAAAOWb5jNTKlSu1evVq+fj4OE0vVaqU/v3331wLDAAAAADyK8uJlN1uV2pqaobp//zzjwIDA3MlKMCylBQpvev99u0lL8tNGwAAAMgxy9dANWvWTJMnT3a8ttlsio+P18iRI9WyZcvcjA3IueRkafz4tCE52dXRAAAAwM1Z/tv+9ddfV0xMjKpUqaKEhAR17dpVu3fvVtGiRfXZZ5/lRYwAAAAAkK9YTqRKlCihzZs3a/bs2dqyZYvi4+PVp08fdevWzanzCQAAAABwV1d1I4mXl5ceeuih3I4FAAAAAG4IlhOpWbNmZTv/4YcfvupgAAAAAOBGYDmReuKJJ5xeJycn6/z58/Lx8VGBAgVIpAAAAAC4Pcu99p06dcppiI+P165du1SvXj06mwAAAABwU8iVh+2UL19e48aN00MPPaSdO3fmRpWANT4+Unq3/Jc9LBoAAADIbbn21FIvLy8dOnQot6oDrPH0lOrVc3UUAAAAuElYTqTmz5/v9NoYo8OHD+utt97S3XffnWuBAQAAAEB+ZTmRatu2rdNrm82mYsWK6d5779Xrr7+eW3EB1qSkSD/+mDbeooXklWsnWwEAAIAMLP/atNvteREHcG2Sk6XRo9PGmzQhkQIAAECestxrHwAAAADc7Cz/bT906NAcl504caLV6gEAAAAg37OcSG3cuFEbN25UcnKyKlasKEn6888/5enpqVq1ajnK2Wy23IsSAAAAAPIRy4lU69atFRgYqJkzZ6pQoUKS0h7S26tXL9WvX19PPvlkrgcJAAAAAPmJ5XukXn/9dY0dO9aRRElSoUKF9NJLL9FrHwAAAICbguVE6syZMzp27FiG6ceOHdPZs2dzJSgAAAAAyM8sX9rXrl079erVS6+//rruuOMOSdLatWs1fPhwtW/fPtcDBHLEx0caN+7iOAAAAJCHLCdS06ZN07Bhw9S1a1clJyenVeLlpT59+mjChAm5HiCQI56eac+PAgAAAK4Dy4lUgQIFNHXqVE2YMEF79+6VJJUtW1YFCxbM9eAAAAAAID+66gfyHj58WIcPH1b58uVVsGBBGWNyMy7AmtRUadGitCE11dXRAAAAwM1ZPiN14sQJderUSUuXLpXNZtPu3btVpkwZ9enTR4UKFaLnPrhGUpL0zDNp4ytXSv7+ro0HAAAAbs3yGakhQ4bI29tbBw4cUIECBRzTH3zwQS1YsCBXgwMAAACA/MjyGamff/5ZP/30k0qUKOE0vXz58vr7779zLTAAAAAAyK8sn5E6d+6c05modCdPnpSvr2+uBAUAAAAA+ZnlRKp+/fqaNWuW47XNZpPdbtf48eN1zz335GpwAAAAAJAfWb60b/z48WrcuLHWr1+vpKQkPfXUU9q2bZtOnjypVatW5UWMAAAAAJCvWD4jVa1aNf3555+qV6+e2rRpo3Pnzql9+/bauHGjypYtmxcxAgAAAEC+YumMVHJyspo3b65p06bp2WefzauYAOu8vaWRIy+OAwAAAHnIUiLl7e2tLVu25FUswNXz8pJat3Z1FAAAALhJWL6076GHHtL//ve/vIgFAAAAAG4IljubSElJ0YcffqhFixapdu3aKliwoNP8iRMn5lpwQI6lpkpr1qSNR0dLnp6ujQcAAABuLUeJ1JYtW1StWjV5eHho69atqlWrliTpzz//dCpns9lyP0IgJ5KSpMGD08ZXrpT8/V0aDgAAANxbjhKp2267TYcPH1ZoaKj+/vtvrVu3TkWKFMnr2AAAAAAgX8rRPVIhISHat2+fJGn//v2y2+15GhQAAAAA5Gc5OiPVoUMHNWzYUBEREbLZbKpTp448s7gH5a+//srVAAEAAAAgv8lRIvXee++pffv22rNnjx5//HH169dPgYGBeR0bAAAAAORLOe61r3nz5pKkDRs26IknniCRAgAAAHDTstz9+fTp0/MiDgAAAAC4YVhOpIB8ydtbeuqpi+MAAABAHiKRgnvw8pI6dXJ1FAAAALhJ5Kj7cwAAAADARZyRgnuw26WNG9PGb7tN8uA/AgAAAOQdEim4h8RE6ZFH0sZXrpT8/V0bDwAAANwaf9sDAAAAgEUkUgAAAABgEYkUAAAAAFjk0kRqxYoVat26tYoXLy6bzaZ58+Y5zTfG6IUXXlBERIT8/f3VpEkT7d6926nMyZMn1a1bNwUFBSkkJER9+vRRfHz8ddwKAAAAADcblyZS586dU82aNfX2229nOn/8+PF64403NG3aNK1du1YFCxZUTEyMEhISHGW6deumbdu2aeHChfruu++0YsUK9e/f/3ptAgAAAICbkEt77WvRooVatGiR6TxjjCZPnqznnntObdq0kSTNmjVLYWFhmjdvnjp37qwdO3ZowYIFWrdunerUqSNJevPNN9WyZUu99tprKl68eKZ1JyYmKjEx0fH6zJkzubxlAAAAANxZvr1Hat++fYqNjVWTJk0c04KDg1W3bl2tWbNGkrRmzRqFhIQ4kihJatKkiTw8PLR27dos6x47dqyCg4MdQ2RkZN5tCK4PLy/p8cfTBi969QcAAEDeyreJVGxsrCQpLCzMaXpYWJhjXmxsrEJDQ53me3l5qXDhwo4ymRkxYoROnz7tGA4ePJjL0eO68/aWHn44bfD2dnU0AAAAcHM35V/3vr6+8vX1dXUYAAAAAG5Q+faMVHh4uCTpyJEjTtOPHDnimBceHq6jR486zU9JSdHJkycdZXCTsNul7dvTBrvd1dEAAADAzeXbRKp06dIKDw/X4sWLHdPOnDmjtWvXKjo6WpIUHR2tuLg4bdiwwVFmyZIlstvtqlu37nWPGS6UmHjx0r5LOhIBAAAA8oJLL+2Lj4/Xnj17HK/37dunTZs2qXDhwipZsqQGDx6sl156SeXLl1fp0qX1/PPPq3jx4mrbtq0kqXLlymrevLn69eunadOmKTk5WYMGDVLnzp2z7LEPAAAAAK6VSxOp9evX65577nG8Hjp0qCSpR48emjFjhp566imdO3dO/fv3V1xcnOrVq6cFCxbIz8/Pscwnn3yiQYMGqXHjxvLw8FCHDh30xhtvXPdtAQAAAHDzsBljjKuDcLUzZ84oODhYp0+fVlBQkKvDUalnvs+zuvePa5VndbvUhQtS/fpp4ytXSv7+ro0HAAAAN6Sc5gb59h4pAAAAAMivSKQAAAAAwCISKQAAAACw6KZ8IC/ckJeX1L//xXEAAAAgD/GLE+7B2/tiIgUAAADkMS7tAwAAAACLOCMF92C3S/v3p42XKiV58B8BAAAA8g6JFNxDYqLUqVPaOM+RAgAAQB7jb3sAAAAAsIhECgAAAAAsIpECAAAAAItIpAAAAADAIhIpAAAAALCIRAoAAAAALKL7c7gHLy+pe/eL4wAAAEAe4hcn3IO3t/TEE66OAgAAADcJLu0DAAAAAIs4IwX3YLdLsbFp4+Hhkgf/EQAAACDvkEjBPSQmSvffnza+cqXk7+/aeAAAAODW+NseAAAAACwikQIAAAAAi0ikAAAAAMAiEikAAAAAsIhECgAAAAAsIpECAAAAAIvo/hzuwdNTeuCBi+MAAABAHiKRgnvw8ZGeftrVUQAAAOAmwaV9AAAAAGARZ6TgHoyR4uLSxkNCJJvNldEAAADAzZFIwT0kJEhNm6aNr1wp+fu7Nh4AAAC4NS7tAwAAAACLSKQAAAAAwCISKQAAAACwiEQKAAAAACwikQIAAAAAi0ikAAAAAMAiuj+He/D0lO677+I4AAAAkIdIpOAefHykUaNcHQUAAABuElzaBwAAAAAWcUYK7sEYKSEhbdzPT7LZXBsPAAAA3BpnpOAeEhKk+vXThvSECgAAAMgjJFIAAAAAYBGJFAAAAABYRCIFAAAAABaRSAEAAACARSRSAAAAAGARiRQAAAAAWMRzpOAePD2lxo0vjgMAAAB5iEQK7sHHR3r1VVdHAQAAgJsEl/YBAAAAgEUkUgAAAABgEYkU3MOFC1KdOmnDhQuujgYAAABujkQKAAAAACwikQIAAAAAi0ikAAAAAMAiEikAAAAAsIhECgAAAAAsIpECAAAAAIu8XB0AkCs8PaW77744DgAAAOQhEim4Bx8facoUV0cBAACAm0S+vrRv1KhRstlsTkOlSpUc8xMSEjRw4EAVKVJEAQEB6tChg44cOeLCiAEAAADcDPJ1IiVJVatW1eHDhx3DL7/84pg3ZMgQffvtt5ozZ46WL1+uQ4cOqX379i6MFgAAAMDNIN9f2ufl5aXw8PAM00+fPq3//e9/+vTTT3XvvfdKkqZPn67KlSvr119/1Z133nm9Q4UrXbggNW2aNr5woeTv79p4AAAA4Nby/Rmp3bt3q3jx4ipTpoy6deumAwcOSJI2bNig5ORkNWnSxFG2UqVKKlmypNasWZNtnYmJiTpz5ozTADeQkJA2AAAAAHksXydSdevW1YwZM7RgwQK988472rdvn+rXr6+zZ88qNjZWPj4+CgkJcVomLCxMsbGx2dY7duxYBQcHO4bIyMg83AoAAAAA7iZfX9rXokULx3iNGjVUt25dRUVF6YsvvpD/NVy6NWLECA0dOtTx+syZMyRTAAAAAHIsX5+RulxISIgqVKigPXv2KDw8XElJSYqLi3Mqc+TIkUzvqbqUr6+vgoKCnAYAAAAAyKkbKpGKj4/X3r17FRERodq1a8vb21uLFy92zN+1a5cOHDig6OhoF0YJAAAAwN3l60v7hg0bptatWysqKkqHDh3SyJEj5enpqS5duig4OFh9+vTR0KFDVbhwYQUFBemxxx5TdHQ0PfYBAAAAyFP5OpH6559/1KVLF504cULFihVTvXr19Ouvv6pYsWKSpEmTJsnDw0MdOnRQYmKiYmJiNHXqVBdHDZfw8JBq1bo4DgAAAOQhmzHGuDoIVztz5oyCg4N1+vTpfHG/VKlnvs+zuvePa5VndQMAAAA3upzmBvx1DwAAAAAWkUgBAAAAgEUkUnAPFy5ITZqkDRcuuDoaAAAAuLl83dkEYMllzxQDAAAA8gpnpAAAAADAIhIpAAAAALCIRAoAAAAALCKRAgAAAACLSKQAAAAAwCJ67YN78PCQqlS5OA4AAADkIRIpuAdfX2nWLFdHAQAAgJsEf90DAAAAgEUkUgAAAABgEYkU3ENCgtS6ddqQkODqaAAAAODmuEcK7sEY6fDhi+MAAABAHuKMFAAAAABYRCIFAAAAABaRSAEAAACARSRSAAAAAGARnU3cZEo9832e1r9/XKs8rR8AAADID0ik4B5sNqlMmYvjAAAAQB4ikYJ78POTvvjC1VEAAADgJsE9UgAAAABgEYkUAAAAAFhEIgX3kJAgdeqUNiQkuDoaAAAAuDnukYJ7MEb666+L4wAAAEAe4owUAAAAAFhEIgUAAAAAFpFIAQAAAIBFJFIAAAAAYBGJFAAAAABYRK99cA82mxQRcXEcAAAAyEMkUnAPfn7St9+6OgoAAADcJEikkKtKPfN9ntW9f1yrPKsbAAAAsIJ7pAAAAADAIhIpuIfEROnhh9OGxERXRwMAAAA3x6V9cA92u7R9+8VxAAAAIA9xRgoAAAAALCKRAgAAAACLSKQAAAAAwCISKQAAAACwiEQKAAAAACyi1z64j5AQV0cAAACAmwSJFNyDv7+0aJGrowAAAMBNgkv7AAAAAMAiEikAAAAAsIhECu4hMVHq3z9tSEx0dTQAAABwc9wjBfdgt0u//35xHAAAAMhDnJECAAAAAItIpAAAAADAIhIpAAAAALCIRAoAAAAALKKzCdwwSj3zfZbzfJMTNeff05KkB577UYnevpbr3z+u1VXHBgAAgJsLiRTcRqKnj6tDAAAAwE2CRApuIdHbVw88NN7VYQAAAOAmwT1SAAAAAGARiRQAAAAAWMSlfXAL3qnJ+u/SDyVJr9zTW8me3i6OCAAAAO6MRApuwcNuV+1/dzjG5enigAAAAODWuLQPAAAAACwikQIAAAAAi9wmkXr77bdVqlQp+fn5qW7duvrtt99cHRIAAAAAN+UW90h9/vnnGjp0qKZNm6a6detq8uTJiomJ0a5duxQaGurq8ACVeuZ7V4dw1faPa+XqEAAAAPIdt0ikJk6cqH79+qlXr16SpGnTpun777/Xhx9+qGeeecbF0eFGcSMnOzeyG3m/k2QCAG4mefmdfSN+p97wiVRSUpI2bNigESNGOKZ5eHioSZMmWrNmTabLJCYmKjEx0fH69OnTkqQzZ87kbbA5ZE887+oQbjipyYmKt9vTxhPPy25PdXFE7iOvj4sbub3nl88MAACuh7z8zs5P36npsRhjsi13wydSx48fV2pqqsLCwpymh4WFaefOnZkuM3bsWI0ePTrD9MjIyDyJEdfH3ekjb3d3ZRhuJ3iyqyPIv9g3AADkjvz4nXr27FkFBwdnOf+GT6SuxogRIzR06FDHa7vdrpMnT6pIkSKy2WwujCwtA46MjNTBgwcVFBTk0ljgWrQFpKMt4FK0B6SjLeBStIfcY4zR2bNnVbx48WzL3fCJVNGiReXp6akjR444TT9y5IjCw8MzXcbX11e+vr5O00JCQvIqxKsSFBTEQQBJtAVcRFvApWgPSEdbwKVoD7kjuzNR6W747s99fHxUu3ZtLV682DHNbrdr8eLFio6OdmFkAAAAANzVDX9GSpKGDh2qHj16qE6dOrrjjjs0efJknTt3ztGLHwAAAADkJrdIpB588EEdO3ZML7zwgmJjY3XrrbdqwYIFGTqguBH4+vpq5MiRGS49xM2HtoB0tAVcivaAdLQFXIr2cP3ZzJX69QMAAAAAOLnh75ECAAAAgOuNRAoAAAAALCKRAgAAAACLSKQAAAAAwCISqXzk7bffVqlSpeTn56e6devqt99+c3VIuEajRo2SzWZzGipVquSYn5CQoIEDB6pIkSIKCAhQhw4dMjxc+sCBA2rVqpUKFCig0NBQDR8+XCkpKU5lli1bplq1asnX11flypXTjBkzrsfmIRsrVqxQ69atVbx4cdlsNs2bN89pvjFGL7zwgiIiIuTv768mTZpo9+7dTmVOnjypbt26KSgoSCEhIerTp4/i4+OdymzZskX169eXn5+fIiMjNX78+AyxzJkzR5UqVZKfn5+qV6+uH374Ide3F1m7Ulvo2bNnhs+J5s2bO5WhLbiHsWPH6vbbb1dgYKBCQ0PVtm1b7dq1y6nM9fxe4HeHa+WkPTRq1CjD58N//vMfpzK0BxcyyBdmz55tfHx8zIcffmi2bdtm+vXrZ0JCQsyRI0dcHRquwciRI03VqlXN4cOHHcOxY8cc8//zn/+YyMhIs3jxYrN+/Xpz5513mrvuussxPyUlxVSrVs00adLEbNy40fzwww+maNGiZsSIEY4yf/31lylQoIAZOnSo2b59u3nzzTeNp6enWbBgwXXdVjj74YcfzLPPPmvmzp1rJJmvv/7aaf64ceNMcHCwmTdvntm8ebO5//77TenSpc2FCxccZZo3b25q1qxpfv31V7Ny5UpTrlw506VLF8f806dPm7CwMNOtWzezdetW89lnnxl/f3/z7rvvOsqsWrXKeHp6mvHjx5vt27eb5557znh7e5s//vgjz/cB0lypLfTo0cM0b97c6XPi5MmTTmVoC+4hJibGTJ8+3WzdutVs2rTJtGzZ0pQsWdLEx8c7ylyv7wV+d7heTtpDw4YNTb9+/Zw+H06fPu2YT3twLRKpfOKOO+4wAwcOdLxOTU01xYsXN2PHjnVhVLhWI0eONDVr1sx0XlxcnPH29jZz5sxxTNuxY4eRZNasWWOMSfsB5uHhYWJjYx1l3nnnHRMUFGQSExONMcY89dRTpmrVqk51P/jggyYmJiaXtwZX6/Ifz3a73YSHh5sJEyY4psXFxRlfX1/z2WefGWOM2b59u5Fk1q1b5yjz448/GpvNZv79919jjDFTp041hQoVcrQFY4x5+umnTcWKFR2vO3XqZFq1auUUT926dc0jjzySq9uInMkqkWrTpk2Wy9AW3NfRo0eNJLN8+XJjzPX9XuB3R/5zeXswJi2ReuKJJ7JchvbgWlzalw8kJSVpw4YNatKkiWOah4eHmjRpojVr1rgwMuSG3bt3q3jx4ipTpoy6deumAwcOSJI2bNig5ORkp/e9UqVKKlmypON9X7NmjapXr+70cOmYmBidOXNG27Ztc5S5tI70MrSd/Gvfvn2KjY11et+Cg4NVt25dp/c+JCREderUcZRp0qSJPDw8tHbtWkeZBg0ayMfHx1EmJiZGu3bt0qlTpxxlaB/537JlyxQaGqqKFSvq0Ucf1YkTJxzzaAvu6/Tp05KkwoULS7p+3wv87sifLm8P6T755BMVLVpU1apV04gRI3T+/HnHPNqDa3m5OgBIx48fV2pqqtNBIElhYWHauXOni6JCbqhbt65mzJihihUr6vDhwxo9erTq16+vrVu3KjY2Vj4+PgoJCXFaJiwsTLGxsZKk2NjYTNtF+rzsypw5c0YXLlyQv79/Hm0drlb6e5fZ+3bp+xoaGuo038vLS4ULF3YqU7p06Qx1pM8rVKhQlu0jvQ64XvPmzdW+fXuVLl1ae/fu1X//+1+1aNFCa9askaenJ23BTdntdg0ePFh33323qlWrJknX7Xvh1KlT/O7IZzJrD5LUtWtXRUVFqXjx4tqyZYuefvpp7dq1S3PnzpVEe3A1EikgD7Vo0cIxXqNGDdWtW1dRUVH64osvSHAASJI6d+7sGK9evbpq1KihsmXLatmyZWrcuLELI0NeGjhwoLZu3apffvnF1aEgH8iqPfTv398xXr16dUVERKhx48bau3evypYte73DxGW4tC8fKFq0qDw9PTP0ynPkyBGFh4e7KCrkhZCQEFWoUEF79uxReHi4kpKSFBcX51Tm0vc9PDw803aRPi+7MkFBQSRr+VT6e5fdMR8eHq6jR486zU9JSdHJkydzpX3w2ZJ/lSlTRkWLFtWePXsk0Rbc0aBBg/Tdd99p6dKlKlGihGP69fpe4HdH/pJVe8hM3bp1Jcnp84H24DokUvmAj4+PateurcWLFzum2e12LV68WNHR0S6MDLktPj5ee/fuVUREhGrXri1vb2+n933Xrl06cOCA432Pjo7WH3/84fQjauHChQoKClKVKlUcZS6tI70MbSf/Kl26tMLDw53etzNnzmjt2rVO731cXJw2bNjgKLNkyRLZ7XbHF2l0dLRWrFih5ORkR5mFCxeqYsWKKlSokKMM7ePG8s8//+jEiROKiIiQRFtwJ8YYDRo0SF9//bWWLFmS4XLM6/W9wO+O/OFK7SEzmzZtkiSnzwfagwu5urcLpJk9e7bx9fU1M2bMMNu3bzf9+/c3ISEhTr2w4Mbz5JNPmmXLlpl9+/aZVatWmSZNmpiiRYuao0ePGmPSurktWbKkWbJkiVm/fr2Jjo420dHRjuXTuzVt1qyZ2bRpk1mwYIEpVqxYpt2aDh8+3OzYscO8/fbbdH+eD5w9e9Zs3LjRbNy40UgyEydONBs3bjR///23MSat+/OQkBDzzTffmC1btpg2bdpk2v35bbfdZtauXWt++eUXU758eacur+Pi4kxYWJjp3r272bp1q5k9e7YpUKBAhi6vvby8zGuvvWZ27NhhRo4cSZfX11l2beHs2bNm2LBhZs2aNWbfvn1m0aJFplatWqZ8+fImISHBUQdtwT08+uijJjg42CxbtsypO+vz5887ylyv7wV+d7jeldrDnj17zIsvvmjWr19v9u3bZ7755htTpkwZ06BBA0cdtAfXIpHKR958801TsmRJ4+PjY+644w7z66+/ujokXKMHH3zQREREGB8fH3PLLbeYBx980OzZs8cx/8KFC2bAgAGmUKFCpkCBAqZdu3bm8OHDTnXs37/ftGjRwvj7+5uiRYuaJ5980iQnJzuVWbp0qbn11luNj4+PKVOmjJk+ffr12DxkY+nSpUZShqFHjx7GmLQu0J9//nkTFhZmfH19TePGjc2uXbuc6jhx4oTp0qWLCQgIMEFBQaZXr17m7NmzTmU2b95s6tWrZ3x9fc0tt9xixo0blyGWL774wlSoUMH4+PiYqlWrmu+//z7PthsZZdcWzp8/b5o1a2aKFStmvL29TVRUlOnXr1+GHy+0BfeQWTuQ5PSZfT2/F/jd4VpXag8HDhwwDRo0MIULFza+vr6mXLlyZvjw4U7PkTKG9uBKNmOMuX7nvwAAAADgxsc9UgAAAABgEYkUAAAAAFhEIgUAAAAAFpFIAQAAAIBFJFIAAAAAYBGJFAAAAABYRCIFAAAAABaRSAEAAACARSRSAHCDstlsmjdvXpbzS5UqpcmTJ+e4vhkzZigkJCTbMqNGjdKtt96a4zrzM6v750bx/PPPq3///q4OI1sLFizQrbfeKrvd7upQAOCqkUgBQD507NgxPfrooypZsqR8fX0VHh6umJgYrVq1Ksd1rFu3Ll/+oG7UqJEGDx7s6jDy7f65FrGxsZoyZYqeffZZV4eSrebNm8vb21uffPKJq0MBgKvm5eoAAAAZdejQQUlJSZo5c6bKlCmjI0eOaPHixTpx4kSO6yhWrFgeRuh6SUlJ8vHxcZpmjFFqaqq8vK789eaK/WMlvqvxwQcf6K677lJUVFSe1C9lvt+vRs+ePfXGG2+oe/fuuRAVAFx/nJECgHwmLi5OK1eu1Kuvvqp77rlHUVFRuuOOOzRixAjdf//9WS43cuRIRUREaMuWLZIyXro2ceJEVa9eXQULFlRkZKQGDBig+Pj4DPXMmzdP5cuXl5+fn2JiYnTw4MFs4/3ggw9UuXJl+fn5qVKlSpo6dWqWZXv27Knly5drypQpstlsstls2r9/vyRp69atatGihQICAhQWFqbu3bvr+PHjjmUbNWqkQYMGafDgwSpatKhiYmK0bNky2Ww2/fjjj6pdu7Z8fX31yy+/aO/evWrTpo3CwsIUEBCg22+/XYsWLXKK5fL9Y7PZ9MEHH6hdu3YqUKCAypcvr/nz52e77R999JHq1KmjwMBAhYeHq2vXrjp69Khjflbx2e12jR07VqVLl5a/v79q1qypL7/80rFcamqq+vTp45hfsWJFTZkyJdtYJGn27Nlq3bq107QFCxaoXr16CgkJUZEiRXTfffdp7969TmX++ecfdenSRYULF1bBggVVp04drV27VtLFyzk/+OADlS5dWn5+fpLS2mnfvn1VrFgxBQUF6d5779XmzZsddW7evFn33HOPAgMDFRQUpNq1a2v9+vWO+a1bt9b69eszxAIANwoSKQDIZwICAhQQEKB58+YpMTHxiuWNMXrsscc0a9YsrVy5UjVq1Mi0nIeHh9544w1t27ZNM2fO1JIlS/TUU085lTl//rxefvllzZo1S6tWrVJcXJw6d+6c5bo/+eQTvfDCC3r55Ze1Y8cOvfLKK3r++ec1c+bMTMtPmTJF0dHR6tevnw4fPqzDhw8rMjJScXFxuvfee3Xbbbdp/fr1WrBggY4cOaJOnTo5LT9z5kz5+Pho1apVmjZtmmP6M888o3HjxmnHjh2qUaOG4uPj1bJlSy1evFgbN25U8+bN1bp1ax04cCDbfTl69Gh16tRJW7ZsUcuWLdWtWzedPHkyy/LJyckaM2aMNm/erHnz5mn//v3q2bNnhnKXxzd27FjNmjVL06ZN07Zt2zRkyBA99NBDWr58uSTJbrerRIkSmjNnjrZv364XXnhB//3vf/XFF19kGcvJkye1fft21alTx2n6uXPnNHToUK1fv16LFy+Wh4eH2rVr57g/KT4+Xg0bNtS///6r+fPna/PmzXrqqaec7l/as2ePvvrqK82dO1ebNm2SJD3wwAM6evSofvzxR23YsEG1atVS48aNHfurW7duKlGihNatW6cNGzbomWeekbe3t6POkiVLKiwsTCtXrsz2PQGAfMsAAPKdL7/80hQqVMj4+fmZu+66y4wYMcJs3rzZqYwkM2fOHNO1a1dTuXJl888//zjNj4qKMpMmTcpyHXPmzDFFihRxvJ4+fbqRZH799VfHtB07dhhJZu3atcYYY0aOHGlq1qzpmF+2bFnz6aefOtU7ZswYEx0dneV6GzZsaJ544okMyzRr1sxp2sGDB40ks2vXLsdyt912m1OZpUuXGklm3rx5Wa4vXdWqVc2bb77peH35/pFknnvuOcfr+Ph4I8n8+OOPV6w73bp164wkc/bs2SzjS0hIMAUKFDCrV692WrZPnz6mS5cuWdY9cOBA06FDhyznb9y40UgyBw4cyDbGY8eOGUnmjz/+MMYY8+6775rAwEBz4sSJTMuPHDnSeHt7m6NHjzqmrVy50gQFBZmEhASnsmXLljXvvvuuMcaYwMBAM2PGjGxjue2228yoUaOyLQMA+RVnpAAgH+rQoYMOHTqk+fPnq3nz5lq2bJlq1aqlGTNmOJUbMmSI1q5dqxUrVuiWW27Jts5FixapcePGuuWWWxQYGKju3bvrxIkTOn/+vKOMl5eXbr/9dsfrSpUqKSQkRDt27MhQ37lz57R371716dPHcRYtICBAL730kuXLtTZv3qylS5c61VOpUiVJcqqrdu3amS5/+VmY+Ph4DRs2TJUrV1ZISIgCAgK0Y8eOK56RuvRsXsGCBRUUFOR0qd7lNmzYoNatW6tkyZIKDAxUw4YNJSnDei6Nb8+ePTp//ryaNm3qtL2zZs1y2ta3335btWvXVrFixRQQEKD33nsv2/gvXLggSY5L79Lt3r1bXbp0UZkyZRQUFKRSpUo5xbhp0ybddtttKly4cJZ1R0VFOd1TtnnzZsXHx6tIkSJO27Bv3z7HNgwdOlR9+/ZVkyZNNG7cuEzbhL+/v1P7A4AbCZ1NAEA+5efnp6ZNm6pp06Z6/vnn1bdvX40cOdLp0rGmTZvqs88+008//aRu3bplWdf+/ft133336dFHH9XLL7+swoUL65dfflGfPn2UlJSkAgUKWI4v/f6q999/X3Xr1nWa5+npabmu1q1b69VXX80wLyIiwjFesGDBTJe/fPqwYcO0cOFCvfbaaypXrpz8/f3VsWNHJSUlZRvHpZeeSWn3TWXVRfe5c+cUExOjmJgYffLJJypWrJgOHDigmJiYDOu5NL70/fb9999nSH59fX0lpd3rNGzYML3++uuKjo5WYGCgJkyY4LhvKTNFixaVJJ06dcop6WndurWioqL0/vvvq3jx4rLb7apWrZojRn9//2z3yeXxp29DRESEli1blqFsehf6o0aNUteuXfX999/rxx9/1MiRIzV79my1a9fOUfbkyZNu3ykKAPdFIgUAN4gqVapkeG7U/fffr9atW6tr167y9PTM8n6mDRs2yG636/XXX5eHR9rFCJndb5OSkqL169frjjvukCTt2rVLcXFxqly5coayYWFhKl68uP76669sk7jL+fj4KDU11WlarVq19NVXX6lUqVK50qPdqlWr1LNnT8eP9vj4eEenFrll586dOnHihMaNG6fIyEhJcupMIStVqlSRr6+vDhw44DiDdblVq1bprrvu0oABAxzTrnSWr2zZsgoKCtL27dtVoUIFSdKJEye0a9cuvf/++6pfv74k6ZdffnFarkaNGvrggw908uTJbM9KXapWrVqKjY2Vl5eX4wxXZipUqKAKFSpoyJAh6tKli6ZPn+54TxISErR3717ddtttOVonAOQ3XNoHAPnMiRMndO+99+rjjz/Wli1btG/fPs2ZM0fjx49XmzZtMpRv166dPvroI/Xq1cup57dLlStXTsnJyXrzzTf1119/6aOPPnLqrCGdt7e3HnvsMa1du1YbNmxQz549deeddzoSq8uNHj1aY8eO1RtvvKE///xTf/zxh6ZPn66JEydmuX2lSpXS2rVrtX//fh0/flx2u10DBw7UyZMn1aVLF61bt0579+7VTz/9pF69emVIunKifPnyjo4RNm/erK5du+b6w19LliwpHx8fxz6dP3++xowZc8XlAgMDNWzYMA0ZMkQzZ87U3r179fvvv+vNN990dNJRvnx5rV+/Xj/99JP+/PNPPf/881q3bl229Xp4eKhJkyZOiVKhQoVUpEgRvffee9qzZ4+WLFmioUOHOi3XpUsXhYeHq23btlq1apX++usvffXVV1qzZk2W62rSpImio6PVtm1b/fzzz9q/f79Wr16tZ599VuvXr9eFCxc0aNAgLVu2TH///bdWrVqldevWOSXkv/76q3x9fRUdHX3FfQYA+RGJFADkMwEBAapbt64mTZqkBg0aqFq1anr++efVr18/vfXWW5ku07FjR82cOVPdu3fX3LlzM8yvWbOmJk6cqFdffVXVqlXTJ598orFjx2YoV6BAAT399NPq2rWr7r77bgUEBOjzzz/PMta+ffvqgw8+0PTp01W9enU1bNhQM2bMUOnSpbNcZtiwYfL09FSVKlUcl8MVL15cq1atUmpqqpo1a6bq1atr8ODBCgkJcZxBs2LixIkqVKiQ7rrrLrVu3VoxMTGqVauW5XqyU6xYMc2YMUNz5sxRlSpVNG7cOL322ms5WnbMmDF6/vnnNXbsWFWuXFnNmzfX999/79hvjzzyiNq3b68HH3xQdevW1YkTJ5zOTmWlb9++mj17tiNp9PDw0OzZs7VhwwZVq1ZNQ4YM0YQJE5yW8fHx0c8//6zQ0FC1bNlS1atX17hx47K9PNNms+mHH35QgwYN1KtXL1WoUEGdO3fW33//rbCwMHl6eurEiRN6+OGHVaFCBXXq1EktWrTQ6NGjHXV89tln6tat21VdVgoA+YHNGGNcHQQAALh2xhjVrVvXcSldfnX8+HFVrFhR69evzzbpBoD8jDNSAAC4CZvNpvfee08pKSmuDiVb+/fv19SpU0miANzQOCMFAAAAABZxRgoAAAAALCKRAgAAAACLSKQAAAAAwCISKQAAAACwiEQKAAAAACwikQIAAAAAi0ikAAAAAMAiEikAAAAAsIhECgAAAAAs+j8i65ieTvVYLQAAAABJRU5ErkJggg==\n"},"metadata":{}}],"source":["plot_compare('SkiableTerrain_ac', 'Skiable terrain area (acres)')"]},{"cell_type":"markdown","metadata":{"id":"n7dVUVERXvBg"},"source":["Big Mountain is amongst the resorts with the largest amount of skiable terrain."]},{"cell_type":"markdown","metadata":{"id":"-zJEVNCxXvBg"},"source":["## 5.9 Modeling scenarios"]},{"cell_type":"markdown","metadata":{"id":"ed-viljDXvBg"},"source":["Big Mountain Resort has been reviewing potential scenarios for either cutting costs or increasing revenue (from ticket prices). Ticket price is not determined by any set of parameters; the resort is free to set whatever price it likes. However, the resort operates within a market where people pay more for certain facilities, and less for others. Being able to sense how facilities support a given ticket price is valuable business intelligence. This is where the utility of our model comes in.\n","\n","The business has shortlisted some options:\n","1. Permanently closing down up to 10 of the least used runs. This doesn't impact any other resort statistics.\n","2. Increase the vertical drop by adding a run to a point 150 feet lower down but requiring the installation of an additional chair lift to bring skiers back up, without additional snow making coverage\n","3. Same as number 2, but adding 2 acres of snow making cover\n","4. Increase the longest run by 0.2 mile to boast 3.5 miles length, requiring an additional snow making coverage of 4 acres\n","\n","The expected number of visitors over the season is 350,000 and, on average, visitors ski for five days. Assume the provided data includes the additional lift that Big Mountain recently installed."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"yOJrHvlFXvBg"},"outputs":[],"source":["expected_visitors = 350_000"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":81},"id":"60cXoz4SXvBg","executionInfo":{"status":"ok","timestamp":1721139258813,"user_tz":240,"elapsed":176,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"69f60af2-3a54-46ea-dc1e-7a5c20428d80"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" vertical_drop Snow Making_ac total_chairs fastQuads Runs \\\n","151 2353 600.0 14 3 105.0 \n","\n"," LongestRun_mi trams SkiableTerrain_ac \n","151 3.3 0 3000.0 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
vertical_dropSnow Making_actotal_chairsfastQuadsRunsLongestRun_mitramsSkiableTerrain_ac
1512353600.0143105.03.303000.0
\n","
\n","
\n","\n","
\n"," \n","\n"," \n","\n"," \n","
\n","\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"big_mountain[all_feats]\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"vertical_drop\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 2353,\n \"max\": 2353,\n \"num_unique_values\": 1,\n \"samples\": [\n 2353\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Snow Making_ac\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 600.0,\n \"max\": 600.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 600.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_chairs\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 14,\n \"max\": 14,\n \"num_unique_values\": 1,\n \"samples\": [\n 14\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"fastQuads\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 3,\n \"max\": 3,\n \"num_unique_values\": 1,\n \"samples\": [\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Runs\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 105.0,\n \"max\": 105.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 105.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"LongestRun_mi\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 3.3,\n \"max\": 3.3,\n \"num_unique_values\": 1,\n \"samples\": [\n 3.3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"trams\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 0,\n \"max\": 0,\n \"num_unique_values\": 1,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"SkiableTerrain_ac\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 3000.0,\n \"max\": 3000.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 3000.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":28}],"source":["all_feats = ['vertical_drop', 'Snow Making_ac', 'total_chairs', 'fastQuads',\n"," 'Runs', 'LongestRun_mi', 'trams', 'SkiableTerrain_ac']\n","big_mountain[all_feats]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"xM898tUJXvBh"},"outputs":[],"source":["#Code task 2#\n","#In this function, copy the Big Mountain data into a new data frame\n","#(Note we use .copy()!)\n","#And then for each feature, and each of its deltas (changes from the original),\n","#create the modified scenario dataframe (bm2) and make a ticket price prediction\n","#for it. The difference between the scenario's prediction and the current\n","#prediction is then calculated and returned.\n","#Complete the code to increment each feature by the associated delta\n","def predict_increase(features, deltas):\n"," \"\"\"Increase in modelled ticket price by applying delta to feature.\n","\n"," Arguments:\n"," features - list, names of the features in the ski_data dataframe to change\n"," deltas - list, the amounts by which to increase the values of the features\n","\n"," Outputs:\n"," Amount of increase in the predicted ticket price\n"," \"\"\"\n","\n"," bm2 = X_bm.copy()\n"," for f, d in zip(features, deltas):\n"," # Check if the column exists in the DataFrame before accessing it\n"," if f in bm2.columns:\n"," bm2[f] += d\n"," else:\n"," print(f\"Warning: Column '{f}' not found in the DataFrame.\")\n"," return model.predict(bm2).item() - model.predict(X_bm).item()"]},{"cell_type":"markdown","metadata":{"id":"J39S2AGJXvBh"},"source":["### 5.9.1 Scenario 1"]},{"cell_type":"markdown","metadata":{"id":"pujZUdEfXvBh"},"source":["Close up to 10 of the least used runs. The number of runs is the only parameter varying."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"E53LAJ4dXvBh","executionInfo":{"status":"ok","timestamp":1721139279210,"user_tz":240,"elapsed":169,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"e591213a-365f-443f-8012-e78d61f5f60b"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["[-1, -2, -3, -4, -5, -6, -7, -8, -9, -10]"]},"metadata":{},"execution_count":30}],"source":["[i for i in range(-1, -11, -1)]"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"RmAEkll8XvBh","executionInfo":{"status":"ok","timestamp":1721139283517,"user_tz":240,"elapsed":187,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"74a24e5e-5d0f-4e68-866b-09e6eb50ab86"},"outputs":[{"output_type":"stream","name":"stdout","text":["Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'Runs' not found in the DataFrame.\n"]}],"source":["runs_delta = [i for i in range(-1, -11, -1)]\n","price_deltas = [predict_increase(['Runs'], [delta]) for delta in runs_delta]"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"X5rkyuC8XvBh","executionInfo":{"status":"ok","timestamp":1721139286426,"user_tz":240,"elapsed":174,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"3d37d9f4-c1a3-4b40-95ee-df9167e1daee"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]"]},"metadata":{},"execution_count":32}],"source":["price_deltas"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":487},"id":"-cPER_saXvBi","executionInfo":{"status":"ok","timestamp":1721139289677,"user_tz":240,"elapsed":409,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"42c6a2e0-77d8-4fbb-fcc0-65fbd71e83da"},"outputs":[{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}}],"source":["#Code task 3#\n","#Create two plots, side by side, for the predicted ticket price change (delta) for each\n","#condition (number of runs closed) in the scenario and the associated predicted revenue\n","#change on the assumption that each of the expected visitors buys 5 tickets\n","#There are two things to do here:\n","#1 - use a list comprehension to create a list of the number of runs closed from `runs_delta`\n","#2 - use a list comprehension to create a list of predicted revenue changes from `price_deltas`\n","runs_closed = [-1 * delta for delta in runs_delta] #1 Use delta instead of Runs\n","fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n","fig.subplots_adjust(wspace=0.5)\n","ax[0].plot(runs_closed, price_deltas, 'o-')\n","ax[0].set(xlabel='Runs closed', ylabel='Change ($)', title='Ticket price')\n","revenue_deltas = [5 * expected_visitors * price for price in price_deltas] #2\n","ax[1].plot(runs_closed, revenue_deltas, 'o-')\n","ax[1].set(xlabel='Runs closed', ylabel='Change ($)', title='Revenue');"]},{"cell_type":"markdown","metadata":{"id":"gSXB2Kz7XvBi"},"source":["The model says closing one run makes no difference. Closing 2 and 3 successively reduces support for ticket price and so revenue. If Big Mountain closes down 3 runs, it seems they may as well close down 4 or 5 as there's no further loss in ticket price. Increasing the closures down to 6 or more leads to a large drop."]},{"cell_type":"markdown","metadata":{"id":"peYj8ZQLXvBi"},"source":["### 5.9.2 Scenario 2"]},{"cell_type":"markdown","metadata":{"id":"P05t5BJ9XvBi"},"source":["In this scenario, Big Mountain is adding a run, increasing the vertical drop by 150 feet, and installing an additional chair lift."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"PqASbMB-XvBi","executionInfo":{"status":"ok","timestamp":1721139327457,"user_tz":240,"elapsed":151,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"5dc99106-384d-4731-8bd6-58827b6f21a2"},"outputs":[{"output_type":"stream","name":"stdout","text":["Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'total_chairs' not found in the DataFrame.\n"]}],"source":["#Code task 4#\n","#Call `predict_increase` with a list of the features 'Runs', 'vertical_drop', and 'total_chairs'\n","#and associated deltas of 1, 150, and 1\n","ticket2_increase = predict_increase(['Runs', 'vertical_drop', 'total_chairs'], [1, 150, 1])\n","revenue2_increase = 5 * expected_visitors * ticket2_increase"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"1Bec-XgAXvBi","executionInfo":{"status":"ok","timestamp":1721139333071,"user_tz":240,"elapsed":155,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"77ada47e-3fab-46ed-9cbc-7286735ce2be"},"outputs":[{"output_type":"stream","name":"stdout","text":["This scenario increases support for ticket price by $2.25\n","Over the season, this could be expected to amount to $3931729\n"]}],"source":["print(f'This scenario increases support for ticket price by ${ticket2_increase:.2f}')\n","print(f'Over the season, this could be expected to amount to ${revenue2_increase:.0f}')"]},{"cell_type":"markdown","metadata":{"id":"PgjuaZ6UXvBj"},"source":["### 5.9.3 Scenario 3"]},{"cell_type":"markdown","metadata":{"id":"msoJZPDuXvBj"},"source":["In this scenario, you are repeating the previous one but adding 2 acres of snow making."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"R7twETEIXvBj","executionInfo":{"status":"ok","timestamp":1721139339060,"user_tz":240,"elapsed":159,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"966c783a-8c45-4e20-8cc9-6ee80ecafc99"},"outputs":[{"output_type":"stream","name":"stdout","text":["Warning: Column 'Runs' not found in the DataFrame.\n","Warning: Column 'total_chairs' not found in the DataFrame.\n","Warning: Column 'Snow Making_ac' not found in the DataFrame.\n"]}],"source":["#Code task 5#\n","#Repeat scenario 2 conditions, but add an increase of 2 to `Snow Making_ac`\n","ticket3_increase = predict_increase(['Runs', 'vertical_drop', 'total_chairs', 'Snow Making_ac'], [1, 150, 1, 2])\n","revenue3_increase = 5 * expected_visitors * ticket3_increase"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"FFuGI-1_XvBj","executionInfo":{"status":"ok","timestamp":1721139354874,"user_tz":240,"elapsed":166,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"30ae6280-5833-4bb3-a928-5163162c71b8"},"outputs":[{"output_type":"stream","name":"stdout","text":["This scenario increases support for ticket price by $2.25\n","Over the season, this could be expected to amount to $3931729\n"]}],"source":["print(f'This scenario increases support for ticket price by ${ticket3_increase:.2f}')\n","print(f'Over the season, this could be expected to amount to ${revenue3_increase:.0f}')"]},{"cell_type":"markdown","metadata":{"id":"DJ9bJ8mRXvBj"},"source":["Such a small increase in the snow making area makes no difference!"]},{"cell_type":"markdown","metadata":{"id":"Ci4vvocBXvBj"},"source":["### 5.9.4 Scenario 4"]},{"cell_type":"markdown","metadata":{"id":"8FYua3GJXvBj"},"source":["This scenario calls for increasing the longest run by .2 miles and guaranteeing its snow coverage by adding 4 acres of snow making capability."]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"XoK9KurgXvBk","executionInfo":{"status":"ok","timestamp":1721139474439,"user_tz":240,"elapsed":173,"user":{"displayName":"Jesse Lindsey","userId":"09886266696052659215"}},"outputId":"fbcb4fac-74d6-43a7-f713-f14a8d7c43d2"},"outputs":[{"output_type":"stream","name":"stdout","text":["Warning: Column 'LongestRun_mi' not found in the DataFrame.\n","Warning: Column 'Snow Making_ac' not found in the DataFrame.\n"]},{"output_type":"execute_result","data":{"text/plain":["0.0"]},"metadata":{},"execution_count":38}],"source":["#Code task 6#\n","#Predict the increase from adding 0.2 miles to `LongestRun_mi` and 4 to `Snow Making_ac`\n","predict_increase(['LongestRun_mi', 'Snow Making_ac'], [0.2, 4])"]},{"cell_type":"markdown","metadata":{"id":"13ZQzUIFXvBk"},"source":["No difference whatsoever. Although the longest run feature was used in the linear model, the random forest model (the one we chose because of its better performance) only has longest run way down in the feature importance list."]},{"cell_type":"markdown","metadata":{"id":"AT_pOGyaXvBk"},"source":["## 5.10 Summary"]},{"cell_type":"markdown","metadata":{"id":"N9aEGkAUXvBk"},"source":["**Q: 1** Write a summary of the results of modeling these scenarios. Start by starting the current position; how much does Big Mountain currently charge? What does your modelling suggest for a ticket price that could be supported in the marketplace by Big Mountain's facilities? How would you approach suggesting such a change to the business leadership? Discuss the additional operating cost of the new chair lift per ticket (on the basis of each visitor on average buying 5 day tickets) in the context of raising prices to cover this. For future improvements, state which, if any, of the modeled scenarios you'd recommend for further consideration. Suggest how the business might test, and progress, with any run closures."]},{"cell_type":"markdown","metadata":{"id":"9kPWSwcbXvBk"},"source":["**A: 1** Big Mountain Resort currently charges 81 dollars average price per ticket. The price suggested per ticket, from modeling the data, indicates an average ticket price of 82 dollars and 53 cents with a mean absolute error of roughly 14 dollars and 31 cents. This model/estimate surely indicates room for a ticket price increase. The modeling also indicates that adding a new chair lift can increase support for ticket price increase by about 2 dollars and 25 cents, which could be expected to amount in about 3 million 931 thousand 729 dollars over the season. Modeling also indicates that an increase in snow making area makes no difference. It seems the model also indicates that closing one run makes no difference. Closing 2 or 3 runs successively reduces support for a ticket price increase and of course revenue. Closing 4 or 5 indicates no further loss/gain in ticket price. Any amount of closures after 6 indicates a large drop in support for ticket price increase. I would recommend modeled scenario # 2, which is an increase in the vertical drop by adding a run to a point 150 feet lower down but requiring the installation of an additional chair lift to bring skiers back up, without additional snow making coverage."]},{"cell_type":"markdown","metadata":{"id":"f1PtQXc1XvBk"},"source":["## 5.11 Further work"]},{"cell_type":"markdown","metadata":{"id":"ccmAjvTvXvBk"},"source":["**Q: 2** What next? Highlight any deficiencies in the data that hampered or limited this work. The only price data in our dataset were ticket prices. You were provided with information about the additional operating cost of the new chair lift, but what other cost information would be useful? Big Mountain was already fairly high on some of the league charts of facilities offered, but why was its modeled price so much higher than its current price? Would this mismatch come as a surprise to the business executives? How would you find out? Assuming the business leaders felt this model was useful, how would the business make use of it? Would you expect them to come to you every time they wanted to test a new combination of parameters in a scenario? We hope you would have better things to do, so how might this model be made available for business analysts to use and explore?"]},{"cell_type":"markdown","metadata":{"id":"ocyFtfSwXvBk"},"source":["**A: 2** The ‘Runs’ data not being found in the DataFrame hampered/limited the findings in this assignment. That information and cost of each specific Run would be useful information for data understanding. The modeled price estimating so high compared to the actual price could be because of the possibility that some of the competing resorts are overpriced and Big Mountain Resort could be underpricing. Based off of the data and comparison of what is offered at Big Mountain and what is offered at other resorts, as well as, the estimated revenue increase for Big Mountain, I think that the business executives would be surprised and pleased with this information. They could make use of this information by saving the file and altering the searches/information based on what findings are being requested."]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.9"},"toc":{"base_numbering":1,"nav_menu":{},"number_sections":true,"sideBar":true,"skip_h1_title":false,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":true,"toc_window_display":true},"varInspector":{"cols":{"lenName":16,"lenType":16,"lenVar":40},"kernels_config":{"python":{"delete_cmd_postfix":"","delete_cmd_prefix":"del ","library":"var_list.py","varRefreshCmd":"print(var_dic_list())"},"r":{"delete_cmd_postfix":") ","delete_cmd_prefix":"rm(","library":"var_list.r","varRefreshCmd":"cat(var_dic_list()) "}},"types_to_exclude":["module","function","builtin_function_or_method","instance","_Feature"],"window_display":false},"colab":{"provenance":[{"file_id":"1VZUWflc8NeSSWJlQIJH33SkG2SQ5JaLj","timestamp":1721146759548}]}},"nbformat":4,"nbformat_minor":0} \ No newline at end of file diff --git a/copy-of-04_preprocessing_and_training.ipynb b/copy-of-04_preprocessing_and_training.ipynb new file mode 100644 index 000000000..896ca7414 --- /dev/null +++ b/copy-of-04_preprocessing_and_training.ipynb @@ -0,0 +1,5568 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3CoGE_woGC5a" + }, + "source": [ + "# 4 Pre-Processing and Training Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pTx0aRgFGC5b" + }, + "source": [ + "## 4.1 Contents\n", + "* [4 Pre-Processing and Training Data](#4_Pre-Processing_and_Training_Data)\n", + " * [4.1 Contents](#4.1_Contents)\n", + " * [4.2 Introduction](#4.2_Introduction)\n", + " * [4.3 Imports](#4.3_Imports)\n", + " * [4.4 Load Data](#4.4_Load_Data)\n", + " * [4.5 Extract Big Mountain Data](#4.5_Extract_Big_Mountain_Data)\n", + " * [4.6 Train/Test Split](#4.6_Train/Test_Split)\n", + " * [4.7 Initial Not-Even-A-Model](#4.7_Initial_Not-Even-A-Model)\n", + " * [4.7.1 Metrics](#4.7.1_Metrics)\n", + " * [4.7.1.1 R-squared, or coefficient of determination](#4.7.1.1_R-squared,_or_coefficient_of_determination)\n", + " * [4.7.1.2 Mean Absolute Error](#4.7.1.2_Mean_Absolute_Error)\n", + " * [4.7.1.3 Mean Squared Error](#4.7.1.3_Mean_Squared_Error)\n", + " * [4.7.2 sklearn metrics](#4.7.2_sklearn_metrics)\n", + " * [4.7.2.0.1 R-squared](#4.7.2.0.1_R-squared)\n", + " * [4.7.2.0.2 Mean absolute error](#4.7.2.0.2_Mean_absolute_error)\n", + " * [4.7.2.0.3 Mean squared error](#4.7.2.0.3_Mean_squared_error)\n", + " * [4.7.3 Note On Calculating Metrics](#4.7.3_Note_On_Calculating_Metrics)\n", + " * [4.8 Initial Models](#4.8_Initial_Models)\n", + " * [4.8.1 Imputing missing feature (predictor) values](#4.8.1_Imputing_missing_feature_(predictor)_values)\n", + " * [4.8.1.1 Impute missing values with median](#4.8.1.1_Impute_missing_values_with_median)\n", + " * [4.8.1.1.1 Learn the values to impute from the train set](#4.8.1.1.1_Learn_the_values_to_impute_from_the_train_set)\n", + " * [4.8.1.1.2 Apply the imputation to both train and test splits](#4.8.1.1.2_Apply_the_imputation_to_both_train_and_test_splits)\n", + " * [4.8.1.1.3 Scale the data](#4.8.1.1.3_Scale_the_data)\n", + " * [4.8.1.1.4 Train the model on the train split](#4.8.1.1.4_Train_the_model_on_the_train_split)\n", + " * [4.8.1.1.5 Make predictions using the model on both train and test splits](#4.8.1.1.5_Make_predictions_using_the_model_on_both_train_and_test_splits)\n", + " * [4.8.1.1.6 Assess model performance](#4.8.1.1.6_Assess_model_performance)\n", + " * [4.8.1.2 Impute missing values with the mean](#4.8.1.2_Impute_missing_values_with_the_mean)\n", + " * [4.8.1.2.1 Learn the values to impute from the train set](#4.8.1.2.1_Learn_the_values_to_impute_from_the_train_set)\n", + " * [4.8.1.2.2 Apply the imputation to both train and test splits](#4.8.1.2.2_Apply_the_imputation_to_both_train_and_test_splits)\n", + " * [4.8.1.2.3 Scale the data](#4.8.1.2.3_Scale_the_data)\n", + " * [4.8.1.2.4 Train the model on the train split](#4.8.1.2.4_Train_the_model_on_the_train_split)\n", + " * [4.8.1.2.5 Make predictions using the model on both train and test splits](#4.8.1.2.5_Make_predictions_using_the_model_on_both_train_and_test_splits)\n", + " * [4.8.1.2.6 Assess model performance](#4.8.1.2.6_Assess_model_performance)\n", + " * [4.8.2 Pipelines](#4.8.2_Pipelines)\n", + " * [4.8.2.1 Define the pipeline](#4.8.2.1_Define_the_pipeline)\n", + " * [4.8.2.2 Fit the pipeline](#4.8.2.2_Fit_the_pipeline)\n", + " * [4.8.2.3 Make predictions on the train and test sets](#4.8.2.3_Make_predictions_on_the_train_and_test_sets)\n", + " * [4.8.2.4 Assess performance](#4.8.2.4_Assess_performance)\n", + " * [4.9 Refining The Linear Model](#4.9_Refining_The_Linear_Model)\n", + " * [4.9.1 Define the pipeline](#4.9.1_Define_the_pipeline)\n", + " * [4.9.2 Fit the pipeline](#4.9.2_Fit_the_pipeline)\n", + " * [4.9.3 Assess performance on the train and test set](#4.9.3_Assess_performance_on_the_train_and_test_set)\n", + " * [4.9.4 Define a new pipeline to select a different number of features](#4.9.4_Define_a_new_pipeline_to_select_a_different_number_of_features)\n", + " * [4.9.5 Fit the pipeline](#4.9.5_Fit_the_pipeline)\n", + " * [4.9.6 Assess performance on train and test data](#4.9.6_Assess_performance_on_train_and_test_data)\n", + " * [4.9.7 Assessing performance using cross-validation](#4.9.7_Assessing_performance_using_cross-validation)\n", + " * [4.9.8 Hyperparameter search using GridSearchCV](#4.9.8_Hyperparameter_search_using_GridSearchCV)\n", + " * [4.10 Random Forest Model](#4.10_Random_Forest_Model)\n", + " * [4.10.1 Define the pipeline](#4.10.1_Define_the_pipeline)\n", + " * [4.10.2 Fit and assess performance using cross-validation](#4.10.2_Fit_and_assess_performance_using_cross-validation)\n", + " * [4.10.3 Hyperparameter search using GridSearchCV](#4.10.3_Hyperparameter_search_using_GridSearchCV)\n", + " * [4.11 Final Model Selection](#4.11_Final_Model_Selection)\n", + " * [4.11.1 Linear regression model performance](#4.11.1_Linear_regression_model_performance)\n", + " * [4.11.2 Random forest regression model performance](#4.11.2_Random_forest_regression_model_performance)\n", + " * [4.11.3 Conclusion](#4.11.3_Conclusion)\n", + " * [4.12 Data quantity assessment](#4.12_Data_quantity_assessment)\n", + " * [4.13 Save best model object from pipeline](#4.13_Save_best_model_object_from_pipeline)\n", + " * [4.14 Summary](#4.14_Summary)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PK1m4ElWGC5d" + }, + "source": [ + "## 4.2 Introduction" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hu7BiJLzGC5d" + }, + "source": [ + "In preceding notebooks, performed preliminary assessments of data quality and refined the question to be answered. You found a small number of data values that gave clear choices about whether to replace values or drop a whole row. You determined that predicting the adult weekend ticket price was your primary aim. You threw away records with missing price data, but not before making the most of the other available data to look for any patterns between the states. You didn't see any and decided to treat all states equally; the state label didn't seem to be particularly useful.\n", + "\n", + "In this notebook you'll start to build machine learning models. Before even starting with learning a machine learning model, however, start by considering how useful the mean value is as a predictor. This is more than just a pedagogical device. You never want to go to stakeholders with a machine learning model only to have the CEO point out that it performs worse than just guessing the average! Your first model is a baseline performance comparitor for any subsequent model. You then build up the process of efficiently and robustly creating and assessing models against it. The development we lay out may be little slower than in the real world, but this step of the capstone is definitely more than just instructional. It is good practice to build up an understanding that the machine learning pipelines you build work as expected. You can validate steps with your own functions for checking expected equivalence between, say, pandas and sklearn implementations." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kws2GzlEGC5d" + }, + "source": [ + "## 4.3 Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3ErzW-67GC5e" + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import os\n", + "import pickle\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "from sklearn import __version__ as sklearn_version\n", + "from sklearn.decomposition import PCA\n", + "from sklearn.preprocessing import scale\n", + "from sklearn.model_selection import train_test_split, cross_validate, GridSearchCV, learning_curve\n", + "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n", + "from sklearn.dummy import DummyRegressor\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.feature_selection import SelectKBest, f_regression\n", + "import datetime\n", + "\n", + "from sb_utils import save_file" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v7uXe1oRGC5f" + }, + "source": [ + "## 4.4 Load Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "id": "hP4qHm72GC5f", + "outputId": "28209102-a56b-4908-9cb6-afd695066a40", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 896 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1 2 \\\n", + "Name Alyeska Resort Eaglecrest Ski Area Hilltop Ski Area \n", + "Region Alaska Alaska Alaska \n", + "state Alaska Alaska Alaska \n", + "summit_elev 3939 2600 2090 \n", + "vertical_drop 2500 1540 294 \n", + "base_elev 250 1200 1796 \n", + "trams 1 0 0 \n", + "fastEight 0.0 0.0 0.0 \n", + "fastSixes 0 0 0 \n", + "fastQuads 2 0 0 \n", + "quad 2 0 0 \n", + "triple 0 0 1 \n", + "double 0 4 0 \n", + "surface 2 0 2 \n", + "total_chairs 7 4 3 \n", + "Runs 76.0 36.0 13.0 \n", + "TerrainParks 2.0 1.0 1.0 \n", + "LongestRun_mi 1.0 2.0 1.0 \n", + "SkiableTerrain_ac 1610.0 640.0 30.0 \n", + "Snow Making_ac 113.0 60.0 30.0 \n", + "daysOpenLastYear 150.0 45.0 150.0 \n", + "yearsOpen 60.0 44.0 36.0 \n", + "averageSnowfall 669.0 350.0 69.0 \n", + "AdultWeekday 65.0 47.0 30.0 \n", + "AdultWeekend 85.0 53.0 34.0 \n", + "projectedDaysOpen 150.0 90.0 152.0 \n", + "NightSkiing_ac 550.0 NaN 30.0 \n", + "\n", + " 3 4 \n", + "Name Arizona Snowbowl Sunrise Park Resort \n", + "Region Arizona Arizona \n", + "state Arizona Arizona \n", + "summit_elev 11500 11100 \n", + "vertical_drop 2300 1800 \n", + "base_elev 9200 9200 \n", + "trams 0 0 \n", + "fastEight 0.0 NaN \n", + "fastSixes 1 0 \n", + "fastQuads 0 1 \n", + "quad 2 2 \n", + "triple 2 3 \n", + "double 1 1 \n", + "surface 2 0 \n", + "total_chairs 8 7 \n", + "Runs 55.0 65.0 \n", + "TerrainParks 4.0 2.0 \n", + "LongestRun_mi 2.0 1.2 \n", + "SkiableTerrain_ac 777.0 800.0 \n", + "Snow Making_ac 104.0 80.0 \n", + "daysOpenLastYear 122.0 115.0 \n", + "yearsOpen 81.0 49.0 \n", + "averageSnowfall 260.0 250.0 \n", + "AdultWeekday 89.0 74.0 \n", + "AdultWeekend 89.0 78.0 \n", + "projectedDaysOpen 122.0 104.0 \n", + "NightSkiing_ac NaN 80.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
NameAlyeska ResortEaglecrest Ski AreaHilltop Ski AreaArizona SnowbowlSunrise Park Resort
RegionAlaskaAlaskaAlaskaArizonaArizona
stateAlaskaAlaskaAlaskaArizonaArizona
summit_elev3939260020901150011100
vertical_drop2500154029423001800
base_elev2501200179692009200
trams10000
fastEight0.00.00.00.0NaN
fastSixes00010
fastQuads20001
quad20022
triple00123
double04011
surface20220
total_chairs74387
Runs76.036.013.055.065.0
TerrainParks2.01.01.04.02.0
LongestRun_mi1.02.01.02.01.2
SkiableTerrain_ac1610.0640.030.0777.0800.0
Snow Making_ac113.060.030.0104.080.0
daysOpenLastYear150.045.0150.0122.0115.0
yearsOpen60.044.036.081.049.0
averageSnowfall669.0350.069.0260.0250.0
AdultWeekday65.047.030.089.074.0
AdultWeekend85.053.034.089.078.0
projectedDaysOpen150.090.0152.0122.0104.0
NightSkiing_ac550.0NaN30.0NaN80.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "ski_data" + } + }, + "metadata": {}, + "execution_count": 3 + } + ], + "source": [ + "ski_data = pd.read_csv('https://raw.githubusercontent.com/springboard-curriculum/DataScienceGuidedCapstone/master/raw_data/ski_resort_data.csv')\n", + "ski_data.head().T" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DF3ErcM2GC5g" + }, + "source": [ + "## 4.5 Extract Big Mountain Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a1ztZZJaGC5g" + }, + "source": [ + "Big Mountain is your resort. Separate it from the rest of the data to use later." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Jx3b3TsqGC5g" + }, + "outputs": [], + "source": [ + "big_mountain = ski_data[ski_data.Name == 'Big Mountain Resort']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "C9EQfETrGC5g", + "outputId": "a35ecd1d-f6dd-4f24-fb1c-040731f9991f", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 896 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " 151\n", + "Name Big Mountain Resort\n", + "Region Montana\n", + "state Montana\n", + "summit_elev 6817\n", + "vertical_drop 2353\n", + "base_elev 4464\n", + "trams 0\n", + "fastEight 0.0\n", + "fastSixes 0\n", + "fastQuads 3\n", + "quad 2\n", + "triple 6\n", + "double 0\n", + "surface 3\n", + "total_chairs 14\n", + "Runs 105.0\n", + "TerrainParks 4.0\n", + "LongestRun_mi 3.3\n", + "SkiableTerrain_ac 3000.0\n", + "Snow Making_ac 600.0\n", + "daysOpenLastYear 123.0\n", + "yearsOpen 72.0\n", + "averageSnowfall 333.0\n", + "AdultWeekday 81.0\n", + "AdultWeekend 81.0\n", + "projectedDaysOpen 123.0\n", + "NightSkiing_ac 600.0" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
151
NameBig Mountain Resort
RegionMontana
stateMontana
summit_elev6817
vertical_drop2353
base_elev4464
trams0
fastEight0.0
fastSixes0
fastQuads3
quad2
triple6
double0
surface3
total_chairs14
Runs105.0
TerrainParks4.0
LongestRun_mi3.3
SkiableTerrain_ac3000.0
Snow Making_ac600.0
daysOpenLastYear123.0
yearsOpen72.0
averageSnowfall333.0
AdultWeekday81.0
AdultWeekend81.0
projectedDaysOpen123.0
NightSkiing_ac600.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"big_mountain\",\n \"rows\": 27,\n \"fields\": [\n {\n \"column\": 151,\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 19,\n \"samples\": [\n \"Big Mountain Resort\",\n 0,\n 4.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 5 + } + ], + "source": [ + "big_mountain.T" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UdrDpfazGC5h", + "outputId": "edde232b-3d0c-4ce7-e930-79f9196de32c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(330, 27)" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ], + "source": [ + "ski_data.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BHdNVSFJGC5h" + }, + "outputs": [], + "source": [ + "ski_data = ski_data[ski_data.Name != 'Big Mountain Resort']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qcegR2OiGC5h", + "outputId": "544eb919-2143-4917-ab57-df20ec44621a", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(329, 27)" + ] + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "ski_data.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w79lyK8DGC5i" + }, + "source": [ + "## 4.6 Train/Test Split" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yquy-0UNGC5i" + }, + "source": [ + "So far, you've treated ski resort data as a single entity. In machine learning, when you train your model on all of your data, you end up with no data set aside to evaluate model performance. You could keep making more and more complex models that fit the data better and better and not realise you were overfitting to that one set of samples. By partitioning the data into training and testing splits, without letting a model (or missing-value imputation) learn anything about the test split, you have a somewhat independent assessment of how your model might perform in the future. An often overlooked subtlety here is that people all too frequently use the test set to assess model performance _and then compare multiple models to pick the best_. This means their overall model selection process is fitting to one specific data set, now the test split. You could keep going, trying to get better and better performance on that one data set, but that's where cross-validation becomes especially useful. While training models, a test split is very useful as a final check on expected future performance." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vd2AHm-5GC5i" + }, + "source": [ + "What partition sizes would you have with a 70/30 train/test split?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "P117wBQ7GC5i", + "outputId": "f1241f44-b67f-4718-c469-5186b2e4c11c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(230.29999999999998, 98.7)" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "len(ski_data) * .7, len(ski_data) * .3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Vz_1LHNDGC5j" + }, + "outputs": [], + "source": [ + "X_train, X_test, y_train, y_test = train_test_split(ski_data.drop(columns='AdultWeekend'),\n", + " ski_data.AdultWeekend, test_size=0.3,\n", + " random_state=47)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BZLkeJtlGC5j", + "outputId": "383805da-9062-4f5a-8764-e75591a5bdf5", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "((230, 26), (99, 26))" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ], + "source": [ + "X_train.shape, X_test.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OzYM68KFGC5j", + "outputId": "99f064d0-41a5-4dcc-e096-0eaf131de96c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "((230,), (99,))" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ], + "source": [ + "y_train.shape, y_test.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kpTR4dddGC5j" + }, + "outputs": [], + "source": [ + "#Code task 1#\n", + "#Save the 'Name', 'state', and 'Region' columns from the train/test data into names_train and names_test\n", + "#Then drop those columns from `X_train` and `X_test`. Use 'inplace=True'\n", + "names_list = ['Name', 'state', 'Region']\n", + "names_train = X_train[names_list]\n", + "names_test = X_test[names_list]\n", + "X_train.drop(columns=names_list, inplace=True)\n", + "X_test.drop(columns=names_list, inplace=True)\n", + "X_train.shape, X_test.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IxoNlzjcGC5k", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "7abc004e-214d-42b3-849b-4e723355f750" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Name object\n", + "Region object\n", + "state object\n", + "summit_elev int64\n", + "vertical_drop int64\n", + "base_elev int64\n", + "trams int64\n", + "fastEight float64\n", + "fastSixes int64\n", + "fastQuads int64\n", + "quad int64\n", + "triple int64\n", + "double int64\n", + "surface int64\n", + "total_chairs int64\n", + "Runs float64\n", + "TerrainParks float64\n", + "LongestRun_mi float64\n", + "SkiableTerrain_ac float64\n", + "Snow Making_ac float64\n", + "daysOpenLastYear float64\n", + "yearsOpen float64\n", + "averageSnowfall float64\n", + "AdultWeekday float64\n", + "projectedDaysOpen float64\n", + "NightSkiing_ac float64\n", + "dtype: object" + ] + }, + "metadata": {}, + "execution_count": 13 + } + ], + "source": [ + "#Code task 2#\n", + "#Check the `dtypes` attribute of `X_train` to verify all features are numeric\n", + "X_train.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7r4MaiNiGC5k", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2c7d88d7-7408-466b-bb31-cbeff151d7bd" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Name object\n", + "Region object\n", + "state object\n", + "summit_elev int64\n", + "vertical_drop int64\n", + "base_elev int64\n", + "trams int64\n", + "fastEight float64\n", + "fastSixes int64\n", + "fastQuads int64\n", + "quad int64\n", + "triple int64\n", + "double int64\n", + "surface int64\n", + "total_chairs int64\n", + "Runs float64\n", + "TerrainParks float64\n", + "LongestRun_mi float64\n", + "SkiableTerrain_ac float64\n", + "Snow Making_ac float64\n", + "daysOpenLastYear float64\n", + "yearsOpen float64\n", + "averageSnowfall float64\n", + "AdultWeekday float64\n", + "projectedDaysOpen float64\n", + "NightSkiing_ac float64\n", + "dtype: object" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ], + "source": [ + "#Code task 3#\n", + "#Repeat this check for the test split in `X_test`\n", + "X_test.dtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Br24BVWSGC5k" + }, + "source": [ + "You have only numeric features in your X now!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d8gYY4UqGC5k" + }, + "source": [ + "## 4.7 Initial Not-Even-A-Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iaIhqjRYGC5k" + }, + "source": [ + "A good place to start is to see how good the mean is as a predictor. In other words, what if you simply say your best guess is the average price?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "DjCfyFmOGC5k", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "068d847a-5277-4cc3-94ff-4f0b6160f94f" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "64.5370618556701" + ] + }, + "metadata": {}, + "execution_count": 15 + } + ], + "source": [ + "#Code task 4#\n", + "#Calculate the mean of `y_train`\n", + "train_mean = y_train.mean()\n", + "train_mean" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QZ6EBcHMGC5l" + }, + "source": [ + "`sklearn`'s `DummyRegressor` easily does this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GmGGBUE8GC5l", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "571b0cfa-b00a-41e3-d592-a30b1436c272" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[64.53706186]])" + ] + }, + "metadata": {}, + "execution_count": 21 + } + ], + "source": [ + "#Code task 5#\n", + "#Fit the dummy regressor on the training data\n", + "#Hint, call its `.fit()` method with `X_train` and `y_train` as arguments\n", + "#Then print the object's `constant_` attribute and verify it's the same as the mean above\n", + "dumb_reg = DummyRegressor(strategy='mean')\n", + "y_train_clean = y_train.dropna() # Remove rows with NaN values in y_train\n", + "X_train_clean = X_train.loc[y_train_clean.index] # Subset X_train to match the non-NaN rows\n", + "dumb_reg.fit(X_train_clean, y_train_clean)\n", + "dumb_reg.constant_" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i-oafl2cGC5l" + }, + "source": [ + "How good is this? How closely does this match, or explain, the actual values? There are many ways of assessing how good one set of values agrees with another, which brings us to the subject of metrics." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K77RpsSsGC5l" + }, + "source": [ + "### 4.7.1 Metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o25DjTzvGC5l" + }, + "source": [ + "#### 4.7.1.1 R-squared, or coefficient of determination" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B6iM0dMFGC5l" + }, + "source": [ + "One measure is $R^2$, the [coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination). This is a measure of the proportion of variance in the dependent variable (our ticket price) that is predicted by our \"model\". The linked Wikipedia articles gives a nice explanation of how negative values can arise. This is frequently a cause of confusion for newcomers who, reasonably, ask how can a squared value be negative?\n", + "\n", + "Recall the mean can be denoted by $\\bar{y}$, where\n", + "\n", + "$$\\bar{y} = \\frac{1}{n}\\sum_{i=1}^ny_i$$\n", + "\n", + "and where $y_i$ are the individual values of the dependent variable.\n", + "\n", + "The total sum of squares (error), can be expressed as\n", + "\n", + "$$SS_{tot} = \\sum_i(y_i-\\bar{y})^2$$\n", + "\n", + "The above formula should be familiar as it's simply the variance without the denominator to scale (divide) by the sample size.\n", + "\n", + "The residual sum of squares is similarly defined to be\n", + "\n", + "$$SS_{res} = \\sum_i(y_i-\\hat{y})^2$$\n", + "\n", + "where $\\hat{y}$ are our predicted values for the depended variable.\n", + "\n", + "The coefficient of determination, $R^2$, here is given by\n", + "\n", + "$$R^2 = 1 - \\frac{SS_{res}}{SS_{tot}}$$\n", + "\n", + "Putting it into words, it's one minus the ratio of the residual variance to the original variance. Thus, the baseline model here, which always predicts $\\bar{y}$, should give $R^2=0$. A model that perfectly predicts the observed values would have no residual error and so give $R^2=1$. Models that do worse than predicting the mean will have increased the sum of squares of residuals and so produce a negative $R^2$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "34Q6oEucGC5m" + }, + "outputs": [], + "source": [ + "#Code task 6#\n", + "#Calculate the R^2 as defined above\n", + "def r_squared(y, ypred):\n", + " \"\"\"R-squared score.\n", + "\n", + " Calculate the R-squared, or coefficient of determination, of the input.\n", + "\n", + " Arguments:\n", + " y -- the observed values\n", + " ypred -- the predicted values\n", + " \"\"\"\n", + " ybar = np.sum(y) / len(y) #yes, we could use np.mean(y)\n", + " sum_sq_tot = np.mean((y - ybar)**2) #total sum of squares error\n", + " sum_sq_res = np.mean((y - ypred)**2) #residual sum of squares error\n", + " R2 = 1.0 - sum_sq_tot / sum_sq_res\n", + " return R2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n3_ad9CbGC5m" + }, + "source": [ + "Make your predictions by creating an array of length the size of the training set with the single value of the mean." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "L9YDw26-GC5m", + "outputId": "f94a92eb-565a-4fa2-9b69-914c5fd95a39", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([64.53706186, 64.53706186, 64.53706186, 64.53706186, 64.53706186])" + ] + }, + "metadata": {}, + "execution_count": 23 + } + ], + "source": [ + "y_tr_pred_ = train_mean * np.ones(len(y_train))\n", + "y_tr_pred_[:5]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WzwgNQGRGC5m" + }, + "source": [ + "Remember the `sklearn` dummy regressor?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oTjPeahxGC5s", + "outputId": "77dc6c54-b9a9-43bc-8b2f-54ca3fbf1b68", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([64.53706186, 64.53706186, 64.53706186, 64.53706186, 64.53706186])" + ] + }, + "metadata": {}, + "execution_count": 24 + } + ], + "source": [ + "y_tr_pred = dumb_reg.predict(X_train)\n", + "y_tr_pred[:5]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BOXnR9XrGC5s" + }, + "source": [ + "You can see that `DummyRegressor` produces exactly the same results and saves you having to mess about broadcasting the mean (or whichever other statistic we used - check out the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyRegressor.html) to see what's available) to an array of the appropriate length. It also gives you an object with `fit()` and `predict()` methods as well so you can use them as conveniently as any other `sklearn` estimator." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3Md1ngf1GC5s", + "outputId": "4d507e71-e552-4392-b419-ce4fa7ecebbc", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "-0.15644844476481623" + ] + }, + "metadata": {}, + "execution_count": 25 + } + ], + "source": [ + "r_squared(y_train, y_tr_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RPcxUSB1GC5t" + }, + "source": [ + "Exactly as expected, if you use the average value as your prediction, you get an $R^2$ of zero _on our training set_. What if you use this \"model\" to predict unseen values from the test set? Remember, of course, that your \"model\" is trained on the training set; you still use the training set mean as your prediction." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kWMVJH_DGC5t" + }, + "source": [ + "Make your predictions by creating an array of length the size of the test set with the single value of the (training) mean." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xmmYDSFVGC5t", + "outputId": "d90052bc-fe12-4907-e539-8fbc292f93e1", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "-0.18384542904650614" + ] + }, + "metadata": {}, + "execution_count": 26 + } + ], + "source": [ + "y_te_pred = train_mean * np.ones(len(y_test))\n", + "r_squared(y_test, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_iTc1viEGC5t" + }, + "source": [ + "Generally, you can expect performance on a test set to be slightly worse than on the training set. As you are getting an $R^2$ of zero on the training set, there's nowhere to go but negative!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w3p4qIgRGC5t" + }, + "source": [ + "$R^2$ is a common metric, and interpretable in terms of the amount of variance explained, it's less appealing if you want an idea of how \"close\" your predictions are to the true values. Metrics that summarise the difference between predicted and actual values are _mean absolute error_ and _mean squared error_." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N3_-S015GC5u" + }, + "source": [ + "#### 4.7.1.2 Mean Absolute Error" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "42lbTNmjGC5u" + }, + "source": [ + "This is very simply the average of the absolute errors:\n", + "\n", + "$$MAE = \\frac{1}{n}\\sum_i^n|y_i - \\hat{y}|$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0UYJMVAYGC5u" + }, + "outputs": [], + "source": [ + "#Code task 7#\n", + "#Calculate the MAE as defined above\n", + "def mae(y, ypred):\n", + " \"\"\"Mean absolute error.\n", + "\n", + " Calculate the mean absolute error of the arguments\n", + "\n", + " Arguments:\n", + " y -- the observed values\n", + " ypred -- the predicted values\n", + " \"\"\"\n", + " abs_error = np.abs(y_train - y_tr_pred)\n", + " mae = np.mean(abs_error)\n", + " return mae" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bG4wIiGcGC5u", + "outputId": "45284b45-714e-4d15-d64d-7962d701aa46", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "18.717135189711982" + ] + }, + "metadata": {}, + "execution_count": 28 + } + ], + "source": [ + "mae(y_train, y_tr_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rAj4GrUgGC5u", + "outputId": "8f846fae-f3b8-484b-a5c0-5d82123bcd54", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "18.717135189711982" + ] + }, + "metadata": {}, + "execution_count": 29 + } + ], + "source": [ + "mae(y_test, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6WXCrqh5GC5v" + }, + "source": [ + "Mean absolute error is arguably the most intuitive of all the metrics, this essentially tells you that, on average, you might expect to be off by around \\\\$19 if you guessed ticket price based on an average of known values." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-2a1BlzHGC5v" + }, + "source": [ + "#### 4.7.1.3 Mean Squared Error" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xOmD0rstGC5y" + }, + "source": [ + "Another common metric (and an important one internally for optimizing machine learning models) is the mean squared error. This is simply the average of the square of the errors:\n", + "\n", + "$$MSE = \\frac{1}{n}\\sum_i^n(y_i - \\hat{y})^2$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true, + "id": "fRvXO1RnGC5y" + }, + "outputs": [], + "source": [ + "#Code task 8#\n", + "#Calculate the MSE as defined above\n", + "def mse(y, ypred):\n", + " \"\"\"Mean square error.\n", + "\n", + " Calculate the mean square error of the arguments\n", + "\n", + " Arguments:\n", + " y -- the observed values\n", + " ypred -- the predicted values\n", + " \"\"\"\n", + " sq_error = (y_train - y_tr_pred)**2\n", + " mse = np.mean(sq_error)\n", + " return mse" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "szd0zqumGC5y", + "outputId": "1b337fd0-b2d3-417b-8ff5-ffa5614046f6", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "652.2235238415348" + ] + }, + "metadata": {}, + "execution_count": 31 + } + ], + "source": [ + "mse(y_train, y_tr_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "x5jEgX73GC5y", + "outputId": "77ffde96-527e-43bd-ad2c-746d4dfa4728", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "652.2235238415348" + ] + }, + "metadata": {}, + "execution_count": 32 + } + ], + "source": [ + "mse(y_test, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iNzlDw8oGC5z" + }, + "source": [ + "So here, you get a slightly better MSE on the test set than you did on the train set. And what does a squared error mean anyway? To convert this back to our measurement space, we often take the square root, to form the _root mean square error_ thus:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4OkH5_MfGC5z", + "outputId": "f1cb4d24-3222-497e-977f-8325d4620d2e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([25.53866723, 25.53866723])" + ] + }, + "metadata": {}, + "execution_count": 33 + } + ], + "source": [ + "np.sqrt([mse(y_train, y_tr_pred), mse(y_test, y_te_pred)])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hJTztWA2GC5z" + }, + "source": [ + "### 4.7.2 sklearn metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kdqzOj0LGC5z" + }, + "source": [ + "Functions are good, but you don't want to have to define functions every time we want to assess performance. `sklearn.metrics` provides many commonly used metrics, included the ones above." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tuxkdsw3GC5z" + }, + "source": [ + "##### 4.7.2.0.1 R-squared" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yOKXPO87GC5z", + "outputId": "b7cf05bf-f703-4ce4-e7d5-6e2358a6d153", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "NaNs in y_train: True\n", + "NaNs in y_test: True\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.0, -0.0049471305923192155)" + ] + }, + "metadata": {}, + "execution_count": 37 + } + ], + "source": [ + "# Check for NaN values in y_train and y_test\n", + "print(\"NaNs in y_train:\", np.isnan(y_train).any())\n", + "print(\"NaNs in y_test:\", np.isnan(y_test).any())\n", + "\n", + "# If NaNs are present, handle them (e.g., impute or remove) before calling r2_score\n", + "# Example: Impute NaNs with the mean\n", + "y_train_imputed = np.nan_to_num(y_train, nan=np.nanmean(y_train))\n", + "y_test_imputed = np.nan_to_num(y_test, nan=np.nanmean(y_test))\n", + "\n", + "# Calculate R^2 scores using the imputed arrays\n", + "r2_score(y_train_imputed, y_tr_pred), r2_score(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BSO4v_ILGC50" + }, + "source": [ + "##### 4.7.2.0.2 Mean absolute error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3TmASnF0GC50", + "outputId": "340a6005-60dd-4046-ab22-db6ef2fd9524", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(15.787496638278801, 15.071073585635439)" + ] + }, + "metadata": {}, + "execution_count": 40 + } + ], + "source": [ + "mean_absolute_error(y_train_imputed, y_tr_pred), mean_absolute_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nzd8tmhaGC50" + }, + "source": [ + "##### 4.7.2.0.3 Mean squared error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "G4KfHJbVGC50", + "outputId": "503956c8-4983-437a-ce6a-0b1c1741d71e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(550.1363635880772, 412.9315066406178)" + ] + }, + "metadata": {}, + "execution_count": 41 + } + ], + "source": [ + "mean_squared_error(y_train_imputed, y_tr_pred), mean_squared_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DSHt-LEaGC50" + }, + "source": [ + "### 4.7.3 Note On Calculating Metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aPA8rBvJGC50" + }, + "source": [ + "When calling functions to calculate metrics, it is important to take care in the order of the arguments. Two of the metrics above actually don't care if the arguments are reversed; one does. Which one cares?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UBA1zcqdGC51" + }, + "source": [ + "In a Jupyter code cell, running `r2_score?` will bring up the docstring for the function, and `r2_score??` will bring up the actual code of the function! Try them and compare the source for `sklearn`'s function with yours. Feel free to explore what happens when you reverse the order of the arguments and compare behaviour of `sklearn`'s function and yours." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cn2uK2SrGC51", + "outputId": "264287b3-795f-41c4-fc54-838606355222", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.0, 0.0)" + ] + }, + "metadata": {}, + "execution_count": 42 + } + ], + "source": [ + "# train set - sklearn\n", + "# correct order, incorrect order\n", + "r2_score(y_train_imputed, y_tr_pred), r2_score(y_tr_pred, y_train_imputed)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "e8dq_UtmGC51", + "outputId": "1d362c42-485e-413d-c9fd-c863de943117", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(-0.0049471305923192155, 0.0)" + ] + }, + "metadata": {}, + "execution_count": 43 + } + ], + "source": [ + "# test set - sklearn\n", + "# correct order, incorrect order\n", + "r2_score(y_test_imputed, y_te_pred), r2_score(y_te_pred, y_test_imputed)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hlS9kMhiGC51", + "outputId": "bd4d2a20-138a-42b6-990b-8f3ffd64aad7", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.0, 1.0)" + ] + }, + "metadata": {}, + "execution_count": 44 + } + ], + "source": [ + "# train set - using our homebrew function\n", + "# correct order, incorrect order\n", + "r_squared(y_train_imputed, y_tr_pred), r_squared(y_tr_pred, y_train_imputed)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "258N3g1PGC51", + "outputId": "b0ec29e1-ba4f-464f-9a7d-229a991b7205", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.004922776971763021, 1.0)" + ] + }, + "metadata": {}, + "execution_count": 45 + } + ], + "source": [ + "# test set - using our homebrew function\n", + "# correct order, incorrect order\n", + "r_squared(y_test_imputed, y_te_pred), r_squared(y_te_pred, y_test_imputed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2HtwxjroGC52" + }, + "source": [ + "You can get very different results swapping the argument order. It's worth highlighting this because data scientists do this too much in the real world! Don't be one of them! Frequently the argument order doesn't matter, but it will bite you when you do it with a function that does care. It's sloppy, bad practice and if you don't make a habit of putting arguments in the right order, you will forget!\n", + "\n", + "Remember:\n", + "* argument order matters,\n", + "* check function syntax with `func?` in a code cell" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "57JQhd7NGC52" + }, + "source": [ + "## 4.8 Initial Models" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B1o4nWmUGC52" + }, + "source": [ + "### 4.8.1 Imputing missing feature (predictor) values" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GY-n5SFZGC52" + }, + "source": [ + "Recall when performing EDA, you imputed (filled in) some missing values in pandas. You did this judiciously for exploratory/visualization purposes. You left many missing values in the data. You can impute missing values using scikit-learn, but note that you should learn values to impute from a train split and apply that to the test split to then assess how well your imputation worked." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O3T0zWoWGC52" + }, + "source": [ + "#### 4.8.1.1 Impute missing values with median" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MRXqJO8AGC52" + }, + "source": [ + "There's missing values. Recall from your data exploration that many distributions were skewed. Your first thought might be to impute missing values using the median." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8AUeJw1cGC52" + }, + "source": [ + "##### 4.8.1.1.1 Learn the values to impute from the train set" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a4sqUn1qGC53", + "outputId": "42bf60f8-022f-402d-dfac-104c14a0583e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "summit_elev 3075.0\n", + "vertical_drop 1000.0\n", + "base_elev 1491.5\n", + "trams 0.0\n", + "fastEight 0.0\n", + "fastSixes 0.0\n", + "fastQuads 0.0\n", + "quad 0.0\n", + "triple 1.0\n", + "double 1.0\n", + "surface 2.0\n", + "total_chairs 7.0\n", + "Runs 33.0\n", + "TerrainParks 2.0\n", + "LongestRun_mi 1.0\n", + "SkiableTerrain_ac 200.0\n", + "Snow Making_ac 102.5\n", + "daysOpenLastYear 110.0\n", + "yearsOpen 58.0\n", + "averageSnowfall 145.0\n", + "AdultWeekday 50.0\n", + "projectedDaysOpen 120.0\n", + "NightSkiing_ac 70.0\n", + "dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 49 + } + ], + "source": [ + "# These are the values we'll use to fill in any missing values\n", + "# Handle non-numerical columns before calculating the median\n", + "X_train_numeric = X_train.select_dtypes(include=['number']) # Select only columns with numerical data\n", + "X_defaults_median = X_train_numeric.median()\n", + "X_defaults_median" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2NjA4zUsGC53" + }, + "source": [ + "##### 4.8.1.1.2 Apply the imputation to both train and test splits" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bNzmyKunGC53" + }, + "outputs": [], + "source": [ + "#Code task 9#\n", + "#Call `X_train` and `X_test`'s `fillna()` method, passing `X_defaults_median` as the values to use\n", + "#Assign the results to `X_tr` and `X_te`, respectively\n", + "X_tr = X_train.fillna(X_defaults_median)\n", + "X_te = X_test.fillna(X_defaults_median)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pnTMgMmcGC53" + }, + "source": [ + "##### 4.8.1.1.3 Scale the data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PsZuEipQGC53" + }, + "source": [ + "As you have features measured in many different units, with numbers that vary by orders of magnitude, start off by scaling them to put them all on a consistent scale. The [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) scales each feature to zero mean and unit variance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vr8sSgU7GC53" + }, + "outputs": [], + "source": [ + "#Code task 10#\n", + "#Call the StandardScaler`s fit method on `X_tr` to fit the scaler\n", + "#then use it's `transform()` method to apply the scaling to both the train and test split\n", + "#data (`X_tr` and `X_te`), naming the results `X_tr_scaled` and `X_te_scaled`, respectively\n", + "# Drop the 'Region' column if it exists, as it is not a numerical feature. 'Name' has likely been dropped previously.\n", + "if 'Region' in X_tr.columns:\n", + " X_tr = X_tr.drop(['Region'], axis=1)\n", + "if 'Region' in X_te.columns:\n", + " X_te = X_te.drop(['Region'], axis=1)\n", + "\n", + "# Drop the 'state' column as it contains non-numerical values\n", + "if 'state' in X_tr.columns:\n", + " X_tr = X_tr.drop(['state'], axis=1)\n", + "if 'state' in X_te.columns:\n", + " X_te = X_te.drop(['state'], axis=1)\n", + "\n", + "scaler = StandardScaler()\n", + "scaler.fit(X_tr)\n", + "X_tr_scaled = scaler.transform(X_tr)\n", + "X_te_scaled = scaler.transform(X_te)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mez4eqv8GC54" + }, + "source": [ + "##### 4.8.1.1.4 Train the model on the train split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OHtcgRuEGC54" + }, + "outputs": [], + "source": [ + "lm = LinearRegression().fit(X_tr_scaled, y_train_imputed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ry_-iW7-GC54" + }, + "source": [ + "##### 4.8.1.1.5 Make predictions using the model on both train and test splits" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ohyybousGC54" + }, + "outputs": [], + "source": [ + "#Code task 11#\n", + "#Call the `predict()` method of the model (`lm`) on both the (scaled) train and test data\n", + "#Assign the predictions to `y_tr_pred` and `y_te_pred`, respectively\n", + "y_tr_pred = lm.predict(X_tr_scaled)\n", + "y_te_pred = lm.predict(X_te_scaled)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3nUrhDpPGC54" + }, + "source": [ + "##### 4.8.1.1.6 Assess model performance" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mAiKt_jgGC54", + "outputId": "ee12a613-0664-4564-aefa-d68e06bda398", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.9041522057232028, 0.7976456921768976)" + ] + }, + "metadata": {}, + "execution_count": 61 + } + ], + "source": [ + "# r^2 - train, test\n", + "median_r2 = r2_score(y_train_imputed, y_tr_pred), r2_score(y_test_imputed, y_te_pred)\n", + "median_r2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hPEux0TIGC55" + }, + "source": [ + "Recall that you estimated ticket price by simply using a known average. As expected, this produced an $R^2$ of zero for both the training and test set, because $R^2$ tells us how much of the variance you're explaining beyond that of using just the mean, and you were using just the mean. Here we see that our simple linear regression model explains over 80% of the variance on the train set and over 70% on the test set. Clearly you are onto something, although the much lower value for the test set suggests you're overfitting somewhat. This isn't a surprise as you've made no effort to select a parsimonious set of features or deal with multicollinearity in our data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "S8w76dzZGC55", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "67b799c5-fcf5-48de-c706-9c66257dfe22" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(5.926090522538955, 6.3024374323465935)" + ] + }, + "metadata": {}, + "execution_count": 64 + } + ], + "source": [ + "#Code task 12#\n", + "#Now calculate the mean absolute error scores using `sklearn`'s `mean_absolute_error` function\n", + "# as we did above for R^2\n", + "# MAE - train, test\n", + "mae_score = mean_absolute_error\n", + "median_mae = mae_score(y_train_imputed, y_tr_pred), mae_score(y_test_imputed, y_te_pred)\n", + "median_mae" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rGvbl4uIGC55" + }, + "source": [ + "Using this model, then, on average you'd expect to estimate a ticket price within \\\\$9 or so of the real price. This is much, much better than the \\\\$19 from just guessing using the average. There may be something to this machine learning lark after all!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BxzkETKcGC55", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "a6d1bceb-1592-4640-dab8-e4922a72416d" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(52.72935700137537, 83.14712949661677)" + ] + }, + "metadata": {}, + "execution_count": 66 + } + ], + "source": [ + "#Code task 13#\n", + "#And also do the same using `sklearn`'s `mean_squared_error`\n", + "# MSE - train, test\n", + "mse_score = mean_squared_error\n", + "median_mse = mse_score(y_train_imputed, y_tr_pred), mse_score(y_test_imputed, y_te_pred)\n", + "median_mse" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jSszozHiGC55" + }, + "source": [ + "#### 4.8.1.2 Impute missing values with the mean" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y56LkOw4GC55" + }, + "source": [ + "You chose to use the median for filling missing values because of the skew of many of our predictor feature distributions. What if you wanted to try something else, such as the mean?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KgKF4R3hGC56" + }, + "source": [ + "##### 4.8.1.2.1 Learn the values to impute from the train set" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "O2oS_cDXGC56", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ed0262c8-7e23-4d8e-e455-7f0255ea8971" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "summit_elev 4592.652174\n", + "vertical_drop 1202.478261\n", + "base_elev 3390.656522\n", + "trams 0.169565\n", + "fastEight 0.008772\n", + "fastSixes 0.178261\n", + "fastQuads 1.008696\n", + "quad 0.913043\n", + "triple 1.500000\n", + "double 1.895652\n", + "surface 2.678261\n", + "total_chairs 8.347826\n", + "Runs 49.244541\n", + "TerrainParks 2.927835\n", + "LongestRun_mi 1.491150\n", + "SkiableTerrain_ac 642.676856\n", + "Snow Making_ac 188.234694\n", + "daysOpenLastYear 113.866667\n", + "yearsOpen 57.528384\n", + "averageSnowfall 177.734234\n", + "AdultWeekday 57.231526\n", + "projectedDaysOpen 120.944444\n", + "NightSkiing_ac 93.221374\n", + "dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 69 + } + ], + "source": [ + "#Code task 14#\n", + "#As we did for the median above, calculate mean values for imputing missing values\n", + "# These are the values we'll use to fill in any missing values\n", + "X_defaults_mean = X_train.select_dtypes(include='number').mean() # Select only numeric columns\n", + "X_defaults_mean" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aHLjx3vNGC56" + }, + "source": [ + "By eye, you can immediately tell that your replacement values are much higher than those from using the median." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "69UcNSjGGC56" + }, + "source": [ + "##### 4.8.1.2.2 Apply the imputation to both train and test splits" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "P_EGtIXFGC56" + }, + "outputs": [], + "source": [ + "X_tr = X_train.fillna(X_defaults_mean)\n", + "X_te = X_test.fillna(X_defaults_mean)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kX_wvi3CGC56" + }, + "source": [ + "##### 4.8.1.2.3 Scale the data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "VFlUjIBYGC57", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "7a9a608c-bc30-40f6-af0b-0c188e6804cd" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "String columns in X_tr: Index(['Name', 'Region', 'state'], dtype='object')\n", + "String columns in X_te: Index(['Name', 'Region', 'state'], dtype='object')\n" + ] + } + ], + "source": [ + "# Identify columns with string values\n", + "string_columns_tr = X_tr.select_dtypes(include='object').columns\n", + "string_columns_te = X_te.select_dtypes(include='object').columns\n", + "\n", + "print(\"String columns in X_tr:\", string_columns_tr)\n", + "print(\"String columns in X_te:\", string_columns_te)\n", + "\n", + "# Decide how to handle these string columns:\n", + "# 1. Drop them if they are not relevant for scaling.\n", + "# 2. Encode them numerically using techniques like one-hot encoding or label encoding if they are relevant.\n", + "\n", + "# Example: Dropping string columns\n", + "X_tr_numeric = X_tr.drop(columns=string_columns_tr)\n", + "X_te_numeric = X_te.drop(columns=string_columns_te)\n", + "\n", + "# Now, try scaling again with the numeric data:\n", + "scaler = StandardScaler()\n", + "scaler.fit(X_tr_numeric)\n", + "X_tr_scaled = scaler.transform(X_tr_numeric)\n", + "X_te_scaled = scaler.transform(X_te_numeric)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8tr7aO_QGC57" + }, + "source": [ + "##### 4.8.1.2.4 Train the model on the train split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tuFgNbUFGC57" + }, + "outputs": [], + "source": [ + "lm = LinearRegression().fit(X_tr_scaled, y_train_imputed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i2IZ9qfpGC57" + }, + "source": [ + "##### 4.8.1.2.5 Make predictions using the model on both train and test splits" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qpCTuFArGC57" + }, + "outputs": [], + "source": [ + "y_tr_pred = lm.predict(X_tr_scaled)\n", + "y_te_pred = lm.predict(X_te_scaled)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "C6Kp-72xGC57" + }, + "source": [ + "##### 4.8.1.2.6 Assess model performance" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gYlCy35UGC58", + "outputId": "0afb35e8-af23-4379-ff2a-f82f64ab8354", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.90228298680991, 0.8300429694208216)" + ] + }, + "metadata": {}, + "execution_count": 76 + } + ], + "source": [ + "r2_score(y_train_imputed, y_tr_pred), r2_score(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8Qw2218pGC58", + "outputId": "62b76561-332b-4972-e0d0-0f10ec64f171", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(5.758652911088192, 5.990687179884243)" + ] + }, + "metadata": {}, + "execution_count": 77 + } + ], + "source": [ + "mean_absolute_error(y_train_imputed, y_tr_pred), mean_absolute_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c3j287ojGC58", + "outputId": "fc0a455a-3d2e-4ccb-f7ba-1793ae181db1", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(53.75768229708425, 69.8351291971559)" + ] + }, + "metadata": {}, + "execution_count": 78 + } + ], + "source": [ + "mean_squared_error(y_train_imputed, y_tr_pred), mean_squared_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EsNxnJt6GC59" + }, + "source": [ + "These results don't seem very different to when you used the median for imputing missing values. Perhaps it doesn't make much difference here. Maybe your overtraining dominates. Maybe other feature transformations, such as taking the log, would help. You could try with just a subset of features rather than using all of them as inputs.\n", + "\n", + "To perform the median/mean comparison, you copied and pasted a lot of code just to change the function for imputing missing values. It would make more sense to write a function that performed the sequence of steps:\n", + "1. impute missing values\n", + "2. scale the features\n", + "3. train a model\n", + "4. calculate model performance\n", + "\n", + "But these are common steps and `sklearn` provides something much better than writing custom functions." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xgBxLYLzGC59" + }, + "source": [ + "### 4.8.2 Pipelines" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "waQ90iXvGC59" + }, + "source": [ + "One of the most important and useful components of `sklearn` is the [pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html). In place of `panda`'s `fillna` DataFrame method, there is `sklearn`'s `SimpleImputer`. Remember the first linear model above performed the steps:\n", + "\n", + "1. replace missing values with the median for each feature\n", + "2. scale the data to zero mean and unit variance\n", + "3. train a linear regression model\n", + "\n", + "and all these steps were trained on the train split and then applied to the test split for assessment.\n", + "\n", + "The pipeline below defines exactly those same steps. Crucially, the resultant `Pipeline` object has a `fit()` method and a `predict()` method, just like the `LinearRegression()` object itself. Just as you might create a linear regression model and train it with `.fit()` and predict with `.predict()`, you can wrap the entire process of imputing and feature scaling and regression in a single object you can train with `.fit()` and predict with `.predict()`. And that's basically a pipeline: a model on steroids." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IQkam7dWGC59" + }, + "source": [ + "#### 4.8.2.1 Define the pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MnDG5M6yGC59" + }, + "outputs": [], + "source": [ + "pipe = make_pipeline(\n", + " SimpleImputer(strategy='median'),\n", + " StandardScaler(),\n", + " LinearRegression()\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3QeiOGYmGC59", + "outputId": "229268dc-e7a9-4a23-952d-825d375dd6f2", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "sklearn.pipeline.Pipeline" + ], + "text/html": [ + "
\n", + "
sklearn.pipeline.Pipeline
def __init__(steps, *, memory=None, verbose=False)
/usr/local/lib/python3.10/dist-packages/sklearn/pipeline.pyPipeline of transforms with a final estimator.\n",
+              "\n",
+              "Sequentially apply a list of transforms and a final estimator.\n",
+              "Intermediate steps of the pipeline must be 'transforms', that is, they\n",
+              "must implement `fit` and `transform` methods.\n",
+              "The final estimator only needs to implement `fit`.\n",
+              "The transformers in the pipeline can be cached using ``memory`` argument.\n",
+              "\n",
+              "The purpose of the pipeline is to assemble several steps that can be\n",
+              "cross-validated together while setting different parameters. For this, it\n",
+              "enables setting parameters of the various steps using their names and the\n",
+              "parameter name separated by a `'__'`, as in the example below. A step's\n",
+              "estimator may be replaced entirely by setting the parameter with its name\n",
+              "to another estimator, or a transformer removed by setting it to\n",
+              "`'passthrough'` or `None`.\n",
+              "\n",
+              "Read more in the :ref:`User Guide <pipeline>`.\n",
+              "\n",
+              ".. versionadded:: 0.5\n",
+              "\n",
+              "Parameters\n",
+              "----------\n",
+              "steps : list of tuple\n",
+              "    List of (name, transform) tuples (implementing `fit`/`transform`) that\n",
+              "    are chained in sequential order. The last transform must be an\n",
+              "    estimator.\n",
+              "\n",
+              "memory : str or object with the joblib.Memory interface, default=None\n",
+              "    Used to cache the fitted transformers of the pipeline. By default,\n",
+              "    no caching is performed. If a string is given, it is the path to\n",
+              "    the caching directory. Enabling caching triggers a clone of\n",
+              "    the transformers before fitting. Therefore, the transformer\n",
+              "    instance given to the pipeline cannot be inspected\n",
+              "    directly. Use the attribute ``named_steps`` or ``steps`` to\n",
+              "    inspect estimators within the pipeline. Caching the\n",
+              "    transformers is advantageous when fitting is time consuming.\n",
+              "\n",
+              "verbose : bool, default=False\n",
+              "    If True, the time elapsed while fitting each step will be printed as it\n",
+              "    is completed.\n",
+              "\n",
+              "Attributes\n",
+              "----------\n",
+              "named_steps : :class:`~sklearn.utils.Bunch`\n",
+              "    Dictionary-like object, with the following attributes.\n",
+              "    Read-only attribute to access any step parameter by user given name.\n",
+              "    Keys are step names and values are steps parameters.\n",
+              "\n",
+              "classes_ : ndarray of shape (n_classes,)\n",
+              "    The classes labels. Only exist if the last step of the pipeline is a\n",
+              "    classifier.\n",
+              "\n",
+              "n_features_in_ : int\n",
+              "    Number of features seen during :term:`fit`. Only defined if the\n",
+              "    underlying first estimator in `steps` exposes such an attribute\n",
+              "    when fit.\n",
+              "\n",
+              "    .. versionadded:: 0.24\n",
+              "\n",
+              "feature_names_in_ : ndarray of shape (`n_features_in_`,)\n",
+              "    Names of features seen during :term:`fit`. Only defined if the\n",
+              "    underlying estimator exposes such an attribute when fit.\n",
+              "\n",
+              "    .. versionadded:: 1.0\n",
+              "\n",
+              "See Also\n",
+              "--------\n",
+              "make_pipeline : Convenience function for simplified pipeline construction.\n",
+              "\n",
+              "Examples\n",
+              "--------\n",
+              ">>> from sklearn.svm import SVC\n",
+              ">>> from sklearn.preprocessing import StandardScaler\n",
+              ">>> from sklearn.datasets import make_classification\n",
+              ">>> from sklearn.model_selection import train_test_split\n",
+              ">>> from sklearn.pipeline import Pipeline\n",
+              ">>> X, y = make_classification(random_state=0)\n",
+              ">>> X_train, X_test, y_train, y_test = train_test_split(X, y,\n",
+              "...                                                     random_state=0)\n",
+              ">>> pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])\n",
+              ">>> # The pipeline can be used as any other estimator\n",
+              ">>> # and avoids leaking the test set into the train set\n",
+              ">>> pipe.fit(X_train, y_train)\n",
+              "Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])\n",
+              ">>> pipe.score(X_test, y_test)\n",
+              "0.88
\n", + " \n", + "
" + ] + }, + "metadata": {}, + "execution_count": 113 + } + ], + "source": [ + "type(pipe)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sItLPdQvGC59", + "outputId": "03ecb1c2-febf-493d-e222-70141b78265e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(True, True)" + ] + }, + "metadata": {}, + "execution_count": 114 + } + ], + "source": [ + "hasattr(pipe, 'fit'), hasattr(pipe, 'predict')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1eHYXg9xGC5-" + }, + "source": [ + "#### 4.8.2.2 Fit the pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N5I0TCA8GC5-" + }, + "source": [ + "Here, a single call to the pipeline's `fit()` method combines the steps of learning the imputation (determining what values to use to fill the missing ones), the scaling (determining the mean to subtract and the variance to divide by), and then training the model. It does this all in the one call with the training data as arguments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mB3tTsf6GC5-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 158 + }, + "outputId": "25865cff-6b0f-4c5d-fd83-534511494ab0" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Pipeline(steps=[('preprocessor',\n", + " ColumnTransformer(transformers=[('cat', OneHotEncoder(),\n", + " ['Name'])]))])" + ], + "text/html": [ + "
Pipeline(steps=[('preprocessor',\n",
+              "                 ColumnTransformer(transformers=[('cat', OneHotEncoder(),\n",
+              "                                                  ['Name'])]))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ] + }, + "metadata": {}, + "execution_count": 121 + } + ], + "source": [ + "#Code task 15#\n", + "#Call the pipe's `fit()` method with `X_train` and `y_train` as arguments\n", + "# Assuming 'name' is the column with non-numeric values\n", + "\n", + "# Verify if 'Name' is a column in X_train (Note capitalization)\n", + "if 'Name' not in X_train.columns:\n", + " # If 'Name' is not found, identify the correct column name\n", + " print(\"Column 'Name' not found. Available columns are:\", X_train.columns)\n", + " # Replace 'Name' with the correct column name below\n", + " categorical_features = ['replace_with_correct_column_name']\n", + "else:\n", + " categorical_features = ['Name'] # Use 'Name' if it exists\n", + "\n", + "numerical_features = X_train.columns.difference(categorical_features)\n", + "\n", + "# ... (rest of your code remains unchanged)\n", + "\n", + "# Change the 'name' reference in the ColumnTransformer to 'Name'\n", + "from sklearn.compose import ColumnTransformer\n", + "from sklearn.preprocessing import OneHotEncoder\n", + "\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('cat', OneHotEncoder(), categorical_features), # Use categorical_features here\n", + " # ... other transformers ...\n", + " ])\n", + "\n", + "pipe = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " # ... other steps ...\n", + "])\n", + "\n", + "pipe.fit(X_train, y_train_imputed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "69GWYGPVGC5-" + }, + "source": [ + "#### 4.8.2.3 Make predictions on the train and test sets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lTi5LLlsGC5-" + }, + "outputs": [], + "source": [ + "# Add an estimator as the last step in the pipeline\n", + "from sklearn.linear_model import LinearRegression # Or any other suitable estimator\n", + "from sklearn.preprocessing import OneHotEncoder\n", + "\n", + "# ... (rest of your code remains unchanged)\n", + "\n", + "# Handle unknown categories in the OneHotEncoder\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features), # Handle unknown categories\n", + " # ... other transformers ...\n", + " ])\n", + "\n", + "pipe = Pipeline(steps=[\n", + " ('preprocessor', preprocessor),\n", + " ('estimator', LinearRegression()) # Add an estimator here\n", + "])\n", + "\n", + "pipe.fit(X_train, y_train_imputed)\n", + "\n", + "# Now you can make predictions\n", + "y_tr_pred = pipe.predict(X_train)\n", + "y_te_pred = pipe.predict(X_test) # This should now work without raising an error" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QZykrRkMGC5-" + }, + "source": [ + "#### 4.8.2.4 Assess performance" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TvxwVwloGC5_", + "outputId": "3b04ed61-b6de-46f8-ad86-2eac24719c0b", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(1.0, -0.005864211086616056)" + ] + }, + "metadata": {}, + "execution_count": 125 + } + ], + "source": [ + "r2_score(y_train_imputed, y_tr_pred), r2_score(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qT9h-qLXGC5_" + }, + "source": [ + "And compare with your earlier (non-pipeline) result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "08zPBAeyGC5_", + "outputId": "23ab1c24-df6c-486c-b0db-b35f7b37a2b1", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.9041522057232028, 0.7976456921768976)" + ] + }, + "metadata": {}, + "execution_count": 126 + } + ], + "source": [ + "median_r2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eq9x2JbTGC5_", + "outputId": "6edc077f-0a40-42fc-f77f-5ca2c1829cce", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(4.38682906425801e-15, 15.076498452864431)" + ] + }, + "metadata": {}, + "execution_count": 127 + } + ], + "source": [ + "mean_absolute_error(y_train_imputed, y_tr_pred), mean_absolute_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PIlfrCaOGC5_" + }, + "outputs": [], + "source": [ + "#Compare with your earlier result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RZsDnEJkGC5_", + "outputId": "09de590d-1dcc-4401-dc9e-46e28a59b269", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(5.926090522538955, 6.3024374323465935)" + ] + }, + "metadata": {}, + "execution_count": 130 + } + ], + "source": [ + "median_mae" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gljUh-4dGC6A", + "outputId": "def6ded4-342c-44fe-fa86-aa6dfc5b367a", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(8.209641142334843e-29, 413.3083338573864)" + ] + }, + "metadata": {}, + "execution_count": 132 + } + ], + "source": [ + "mean_squared_error(y_train_imputed, y_tr_pred), mean_squared_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WWiA1XlEGC6A" + }, + "source": [ + "Compare with your earlier result:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "A3KXSq96GC6A", + "outputId": "e1db5807-3883-41e8-a96d-85b73d486ce2", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(52.72935700137537, 83.14712949661677)" + ] + }, + "metadata": {}, + "execution_count": 133 + } + ], + "source": [ + "median_mse" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bq9mGKVvGC6A" + }, + "source": [ + "These results confirm the pipeline is doing exactly what's expected, and results are identical to your earlier steps. This allows you to move faster but with confidence." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4b35Kt4XGC6A" + }, + "source": [ + "## 4.9 Refining The Linear Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Un3ecRdXGC6A" + }, + "source": [ + "You suspected the model was overfitting. This is no real surprise given the number of features you blindly used. It's likely a judicious subset of features would generalize better. `sklearn` has a number of feature selection functions available. The one you'll use here is `SelectKBest` which, as you might guess, selects the k best features. You can read about SelectKBest\n", + "[here](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest). `f_regression` is just the [score function](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html#sklearn.feature_selection.f_regression) you're using because you're performing regression. It's important to choose an appropriate one for your machine learning task." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mMLJatIUGC6B" + }, + "source": [ + "### 4.9.1 Define the pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hxv4Ir53GC6B" + }, + "source": [ + "Redefine your pipeline to include this feature selection step:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XQ88OJX9GC6B" + }, + "outputs": [], + "source": [ + "#Code task 16#\n", + "#Add `SelectKBest` as a step in the pipeline between `StandardScaler()` and `LinearRegression()`\n", + "#Don't forget to tell it to use `f_regression` as its score function\n", + "from sklearn.feature_selection import SelectKBest, f_regression\n", + "from sklearn.pipeline import make_pipeline\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.preprocessing import StandardScaler, OneHotEncoder # Import OneHotEncoder\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.compose import ColumnTransformer # Import ColumnTransformer\n", + "\n", + "# Identify numerical and categorical features\n", + "numerical_features = X_train.select_dtypes(include=['number']).columns\n", + "categorical_features = X_train.select_dtypes(exclude=['number']).columns\n", + "\n", + "# Create transformers for numerical and categorical features\n", + "numerical_transformer = make_pipeline(\n", + " SimpleImputer(strategy='median'),\n", + " StandardScaler()\n", + ")\n", + "\n", + "categorical_transformer = make_pipeline(\n", + " SimpleImputer(strategy='most_frequent'), # Use most_frequent for categorical features\n", + " OneHotEncoder(handle_unknown='ignore') # One-hot encode categorical features\n", + ")\n", + "\n", + "# Combine transformers using ColumnTransformer\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('num', numerical_transformer, numerical_features),\n", + " ('cat', categorical_transformer, categorical_features)\n", + " ]\n", + ")\n", + "\n", + "# Create the final pipeline\n", + "pipe = make_pipeline(\n", + " preprocessor,\n", + " SelectKBest(score_func=f_regression), # Use SelectKBest with f_regression as the scoring function\n", + " LinearRegression()\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qb-CNIVHGC6B" + }, + "source": [ + "### 4.9.2 Fit the pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "svCBhQ59GC6B", + "outputId": "6937c6f7-d1ca-41c6-c99b-1c5ae27d41d5", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 262 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Pipeline(steps=[('columntransformer',\n", + " ColumnTransformer(transformers=[('num',\n", + " Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='median')),\n", + " ('standardscaler',\n", + " StandardScaler())]),\n", + " Index(['summit_elev', 'vertical_drop', 'base_elev', 'trams', 'fastEight',\n", + " 'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',\n", + " 'total_chairs', 'Runs', 'TerrainParks', 'Lon...\n", + " 'averageSnowfall', 'AdultWeekday', 'projectedDaysOpen',\n", + " 'NightSkiing_ac'],\n", + " dtype='object')),\n", + " ('cat',\n", + " Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='most_frequent')),\n", + " ('onehotencoder',\n", + " OneHotEncoder(handle_unknown='ignore'))]),\n", + " Index(['Name', 'Region', 'state'], dtype='object'))])),\n", + " ('selectkbest',\n", + " SelectKBest(score_func=)),\n", + " ('linearregression', LinearRegression())])" + ], + "text/html": [ + "
Pipeline(steps=[('columntransformer',\n",
+              "                 ColumnTransformer(transformers=[('num',\n",
+              "                                                  Pipeline(steps=[('simpleimputer',\n",
+              "                                                                   SimpleImputer(strategy='median')),\n",
+              "                                                                  ('standardscaler',\n",
+              "                                                                   StandardScaler())]),\n",
+              "                                                  Index(['summit_elev', 'vertical_drop', 'base_elev', 'trams', 'fastEight',\n",
+              "       'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',\n",
+              "       'total_chairs', 'Runs', 'TerrainParks', 'Lon...\n",
+              "       'averageSnowfall', 'AdultWeekday', 'projectedDaysOpen',\n",
+              "       'NightSkiing_ac'],\n",
+              "      dtype='object')),\n",
+              "                                                 ('cat',\n",
+              "                                                  Pipeline(steps=[('simpleimputer',\n",
+              "                                                                   SimpleImputer(strategy='most_frequent')),\n",
+              "                                                                  ('onehotencoder',\n",
+              "                                                                   OneHotEncoder(handle_unknown='ignore'))]),\n",
+              "                                                  Index(['Name', 'Region', 'state'], dtype='object'))])),\n",
+              "                ('selectkbest',\n",
+              "                 SelectKBest(score_func=<function f_regression at 0x7a49d3039ab0>)),\n",
+              "                ('linearregression', LinearRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ] + }, + "metadata": {}, + "execution_count": 138 + } + ], + "source": [ + "pipe.fit(X_train, y_train_imputed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fMJISkvYGC6B" + }, + "source": [ + "### 4.9.3 Assess performance on the train and test set" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "T6Je_FN4GC6B" + }, + "outputs": [], + "source": [ + "y_tr_pred = pipe.predict(X_train)\n", + "y_te_pred = pipe.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WicXqbcCGC6C", + "outputId": "ab1527c4-be50-4ae0-dd66-25575bc52a68", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.8957999130320693, 0.8383615057013177)" + ] + }, + "metadata": {}, + "execution_count": 140 + } + ], + "source": [ + "r2_score(y_train_imputed, y_tr_pred), r2_score(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "w_OV3kTDGC6C", + "outputId": "5fb5e706-f77e-488c-fcb2-c09b4de20b3b", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(6.208308232381622, 5.912436455224302)" + ] + }, + "metadata": {}, + "execution_count": 141 + } + ], + "source": [ + "mean_absolute_error(y_train_imputed, y_tr_pred), mean_absolute_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "65YIuKJJGC6C" + }, + "source": [ + "This has made things worse! Clearly selecting a subset of features has an impact on performance. `SelectKBest` defaults to k=10. You've just seen that 10 is worse than using all features. What is the best k? You could create a new pipeline with a different value of k:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P740pWUvGC6C" + }, + "source": [ + "### 4.9.4 Define a new pipeline to select a different number of features" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KehdI6w8GC6C" + }, + "outputs": [], + "source": [ + "# Create a transformer for numerical features\n", + "numerical_transformer = make_pipeline(\n", + " SimpleImputer(strategy='median'),\n", + " StandardScaler()\n", + ")\n", + "\n", + "# Create a transformer for categorical features\n", + "categorical_transformer = make_pipeline(\n", + " SimpleImputer(strategy='most_frequent'), # Use a suitable strategy for categorical data\n", + " OneHotEncoder(handle_unknown='ignore') # Handle unknown categories during testing\n", + ")\n", + "\n", + "# Combine transformers using ColumnTransformer\n", + "# Explicitly specify numerical features instead of relying on exclusion\n", + "numerical_features = X_train.select_dtypes(include=['float', 'int']).columns # Dynamically select numerical columns\n", + "preprocessor = ColumnTransformer(\n", + " transformers=[\n", + " ('num', numerical_transformer, numerical_features), # Pass the list of numerical features\n", + " ('cat', categorical_transformer, categorical_features)\n", + " ]\n", + ")\n", + "\n", + "# Update the pipeline with the preprocessor\n", + "pipe15 = make_pipeline(\n", + " preprocessor,\n", + " SelectKBest(f_regression, k=15),\n", + " LinearRegression()\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y5dNWCkeGC6C" + }, + "source": [ + "### 4.9.5 Fit the pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a5JNDYW_GC6C", + "outputId": "25f25ece-fff3-4b12-c476-223d80848202", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 262 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Pipeline(steps=[('columntransformer',\n", + " ColumnTransformer(transformers=[('num',\n", + " Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='median')),\n", + " ('standardscaler',\n", + " StandardScaler())]),\n", + " Index(['summit_elev', 'vertical_drop', 'base_elev', 'trams', 'fastEight',\n", + " 'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',\n", + " 'total_chairs', 'Runs', 'TerrainParks', 'Lon...\n", + " 'averageSnowfall', 'AdultWeekday', 'projectedDaysOpen',\n", + " 'NightSkiing_ac'],\n", + " dtype='object')),\n", + " ('cat',\n", + " Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='most_frequent')),\n", + " ('onehotencoder',\n", + " OneHotEncoder(handle_unknown='ignore'))]),\n", + " ['Name'])])),\n", + " ('selectkbest',\n", + " SelectKBest(k=15,\n", + " score_func=)),\n", + " ('linearregression', LinearRegression())])" + ], + "text/html": [ + "
Pipeline(steps=[('columntransformer',\n",
+              "                 ColumnTransformer(transformers=[('num',\n",
+              "                                                  Pipeline(steps=[('simpleimputer',\n",
+              "                                                                   SimpleImputer(strategy='median')),\n",
+              "                                                                  ('standardscaler',\n",
+              "                                                                   StandardScaler())]),\n",
+              "                                                  Index(['summit_elev', 'vertical_drop', 'base_elev', 'trams', 'fastEight',\n",
+              "       'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',\n",
+              "       'total_chairs', 'Runs', 'TerrainParks', 'Lon...\n",
+              "       'averageSnowfall', 'AdultWeekday', 'projectedDaysOpen',\n",
+              "       'NightSkiing_ac'],\n",
+              "      dtype='object')),\n",
+              "                                                 ('cat',\n",
+              "                                                  Pipeline(steps=[('simpleimputer',\n",
+              "                                                                   SimpleImputer(strategy='most_frequent')),\n",
+              "                                                                  ('onehotencoder',\n",
+              "                                                                   OneHotEncoder(handle_unknown='ignore'))]),\n",
+              "                                                  ['Name'])])),\n",
+              "                ('selectkbest',\n",
+              "                 SelectKBest(k=15,\n",
+              "                             score_func=<function f_regression at 0x7a49d3039ab0>)),\n",
+              "                ('linearregression', LinearRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ] + }, + "metadata": {}, + "execution_count": 154 + } + ], + "source": [ + "pipe15.fit(X_train, y_train_imputed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9iJedhR4GC6D" + }, + "source": [ + "### 4.9.6 Assess performance on train and test data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BIukKDDnGC6D" + }, + "outputs": [], + "source": [ + "y_tr_pred = pipe15.predict(X_train)\n", + "y_te_pred = pipe15.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2aF_4Hp-GC6D", + "outputId": "65fc1b66-82f5-46f7-aa14-1f4a82566d93", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.9020218697734047, 0.8357076120278937)" + ] + }, + "metadata": {}, + "execution_count": 157 + } + ], + "source": [ + "r2_score(y_train_imputed, y_tr_pred), r2_score(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rjdpOxR3GC6D", + "outputId": "608e478a-7d07-486d-e687-303a6c93d7ce", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(6.016721660114301, 5.924525833393587)" + ] + }, + "metadata": {}, + "execution_count": 158 + } + ], + "source": [ + "mean_absolute_error(y_train_imputed, y_tr_pred), mean_absolute_error(y_test_imputed, y_te_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TzB0p9PqGC6D" + }, + "source": [ + "You could keep going, trying different values of k, training a model, measuring performance on the test set, and then picking the model with the best test set performance. There's a fundamental problem with this approach: _you're tuning the model to the arbitrary test set_! If you continue this way you'll end up with a model works well on the particular quirks of our test set _but fails to generalize to new data_. The whole point of keeping a test set is for it to be a set of that new data, to check how well our model might perform on data it hasn't seen.\n", + "\n", + "The way around this is a technique called _cross-validation_. You partition the training set into k folds, train our model on k-1 of those folds, and calculate performance on the fold not used in training. This procedure then cycles through k times with a different fold held back each time. Thus you end up building k models on k sets of data with k estimates of how the model performs on unseen data but without having to touch the test set." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BLJNDobLGC6D" + }, + "source": [ + "### 4.9.7 Assessing performance using cross-validation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wKpJO_EHGC6D" + }, + "outputs": [], + "source": [ + "cv_results = cross_validate(pipe15, X_train, y_train_imputed, cv=5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "67HH6DYcGC6E", + "outputId": "5d35ce26-83ee-456d-ba2b-eb401d3d68ba", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.86636716, 0.86388936, 0.83936053, 0.93953111, 0.85081919])" + ] + }, + "metadata": {}, + "execution_count": 160 + } + ], + "source": [ + "cv_scores = cv_results['test_score']\n", + "cv_scores" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QPxe3b0tGC6E" + }, + "source": [ + "Without using the same random state for initializing the CV folds, your actual numbers will be different." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0R56F1U4GC6E", + "outputId": "58b880dd-acc2-43ce-d41b-61956f2ffe57", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.8719934714224109, 0.03513381019464692)" + ] + }, + "metadata": {}, + "execution_count": 161 + } + ], + "source": [ + "np.mean(cv_scores), np.std(cv_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5yTYSRO3GC6E" + }, + "source": [ + "These results highlight that assessing model performance in inherently open to variability. You'll get different results depending on the quirks of which points are in which fold. An advantage of this is that you can also obtain an estimate of the variability, or uncertainty, in your performance estimate." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "weupfQrqGC6E", + "outputId": "04d2d147-bfee-4ffd-bb1b-b22ad5b82efc", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.8 , 0.94])" + ] + }, + "metadata": {}, + "execution_count": 162 + } + ], + "source": [ + "np.round((np.mean(cv_scores) - 2 * np.std(cv_scores), np.mean(cv_scores) + 2 * np.std(cv_scores)), 2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JE9rhbdjGC6E" + }, + "source": [ + "### 4.9.8 Hyperparameter search using GridSearchCV" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4xFSt1NgGC6F" + }, + "source": [ + "Pulling the above together, we have:\n", + "* a pipeline that\n", + " * imputes missing values\n", + " * scales the data\n", + " * selects the k best features\n", + " * trains a linear regression model\n", + "* a technique (cross-validation) for estimating model performance\n", + "\n", + "Now you want to use cross-validation for multiple values of k and use cross-validation to pick the value of k that gives the best performance. `make_pipeline` automatically names each step as the lowercase name of the step and the parameters of the step are then accessed by appending a double underscore followed by the parameter name. You know the name of the step will be 'selectkbest' and you know the parameter is 'k'.\n", + "\n", + "You can also list the names of all the parameters in a pipeline like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5K3LrM7QGC6F", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8d9c1a69-c463-4507-e0bf-692e1d525981" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "dict_keys(['memory', 'steps', 'verbose', 'columntransformer', 'selectkbest', 'linearregression', 'columntransformer__n_jobs', 'columntransformer__remainder', 'columntransformer__sparse_threshold', 'columntransformer__transformer_weights', 'columntransformer__transformers', 'columntransformer__verbose', 'columntransformer__verbose_feature_names_out', 'columntransformer__num', 'columntransformer__cat', 'columntransformer__num__memory', 'columntransformer__num__steps', 'columntransformer__num__verbose', 'columntransformer__num__simpleimputer', 'columntransformer__num__standardscaler', 'columntransformer__num__simpleimputer__add_indicator', 'columntransformer__num__simpleimputer__copy', 'columntransformer__num__simpleimputer__fill_value', 'columntransformer__num__simpleimputer__keep_empty_features', 'columntransformer__num__simpleimputer__missing_values', 'columntransformer__num__simpleimputer__strategy', 'columntransformer__num__simpleimputer__verbose', 'columntransformer__num__standardscaler__copy', 'columntransformer__num__standardscaler__with_mean', 'columntransformer__num__standardscaler__with_std', 'columntransformer__cat__memory', 'columntransformer__cat__steps', 'columntransformer__cat__verbose', 'columntransformer__cat__simpleimputer', 'columntransformer__cat__onehotencoder', 'columntransformer__cat__simpleimputer__add_indicator', 'columntransformer__cat__simpleimputer__copy', 'columntransformer__cat__simpleimputer__fill_value', 'columntransformer__cat__simpleimputer__keep_empty_features', 'columntransformer__cat__simpleimputer__missing_values', 'columntransformer__cat__simpleimputer__strategy', 'columntransformer__cat__simpleimputer__verbose', 'columntransformer__cat__onehotencoder__categories', 'columntransformer__cat__onehotencoder__drop', 'columntransformer__cat__onehotencoder__dtype', 'columntransformer__cat__onehotencoder__handle_unknown', 'columntransformer__cat__onehotencoder__max_categories', 'columntransformer__cat__onehotencoder__min_frequency', 'columntransformer__cat__onehotencoder__sparse', 'columntransformer__cat__onehotencoder__sparse_output', 'selectkbest__k', 'selectkbest__score_func', 'linearregression__copy_X', 'linearregression__fit_intercept', 'linearregression__n_jobs', 'linearregression__positive'])" + ] + }, + "metadata": {}, + "execution_count": 163 + } + ], + "source": [ + "#Code task 18#\n", + "#Call `pipe`'s `get_params()` method to get a dict of available parameters and print their names\n", + "#using dict's `keys()` method\n", + "pipe.get_params().keys()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xszf93zgGC6F" + }, + "source": [ + "The above can be particularly useful as your pipelines becomes more complex (you can even nest pipelines within pipelines)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9VGINeR5GC6F" + }, + "outputs": [], + "source": [ + "k = [k+1 for k in range(len(X_train.columns))]\n", + "grid_params = {'selectkbest__k': k}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PPkUIh_rGC6F" + }, + "source": [ + "Now you have a range of `k` to investigate. Is 1 feature best? 2? 3? 4? All of them? You could write a for loop and iterate over each possible value, doing all the housekeeping oyurselves to track the best value of k. But this is a common task so there's a built in function in `sklearn`. This is [`GridSearchCV`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).\n", + "This takes the pipeline object, in fact it takes anything with a `.fit()` and `.predict()` method. In simple cases with no feature selection or imputation or feature scaling etc. you may see the classifier or regressor object itself directly passed into `GridSearchCV`. The other key input is the parameters and values to search over. Optional parameters include the cross-validation strategy and number of CPUs to use." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dpO48JYDGC6F" + }, + "outputs": [], + "source": [ + "lr_grid_cv = GridSearchCV(pipe, param_grid=grid_params, cv=5, n_jobs=-1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rpVenczFGC6G", + "outputId": "134b3eb3-704a-4906-e462-c144943a2d73", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 288 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "GridSearchCV(cv=5,\n", + " estimator=Pipeline(steps=[('columntransformer',\n", + " ColumnTransformer(transformers=[('num',\n", + " Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='median')),\n", + " ('standardscaler',\n", + " StandardScaler())]),\n", + " Index(['summit_elev', 'vertical_drop', 'base_elev', 'trams', 'fastEight',\n", + " 'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',\n", + " 'total_chairs...\n", + " SimpleImputer(strategy='most_frequent')),\n", + " ('onehotencoder',\n", + " OneHotEncoder(handle_unknown='ignore'))]),\n", + " Index(['Name', 'Region', 'state'], dtype='object'))])),\n", + " ('selectkbest',\n", + " SelectKBest(score_func=)),\n", + " ('linearregression',\n", + " LinearRegression())]),\n", + " n_jobs=-1,\n", + " param_grid={'selectkbest__k': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,\n", + " 12, 13, 14, 15, 16, 17, 18, 19, 20,\n", + " 21, 22, 23, 24, 25, 26]})" + ], + "text/html": [ + "
GridSearchCV(cv=5,\n",
+              "             estimator=Pipeline(steps=[('columntransformer',\n",
+              "                                        ColumnTransformer(transformers=[('num',\n",
+              "                                                                         Pipeline(steps=[('simpleimputer',\n",
+              "                                                                                          SimpleImputer(strategy='median')),\n",
+              "                                                                                         ('standardscaler',\n",
+              "                                                                                          StandardScaler())]),\n",
+              "                                                                         Index(['summit_elev', 'vertical_drop', 'base_elev', 'trams', 'fastEight',\n",
+              "       'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',\n",
+              "       'total_chairs...\n",
+              "                                                                                          SimpleImputer(strategy='most_frequent')),\n",
+              "                                                                                         ('onehotencoder',\n",
+              "                                                                                          OneHotEncoder(handle_unknown='ignore'))]),\n",
+              "                                                                         Index(['Name', 'Region', 'state'], dtype='object'))])),\n",
+              "                                       ('selectkbest',\n",
+              "                                        SelectKBest(score_func=<function f_regression at 0x7a49d3039ab0>)),\n",
+              "                                       ('linearregression',\n",
+              "                                        LinearRegression())]),\n",
+              "             n_jobs=-1,\n",
+              "             param_grid={'selectkbest__k': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,\n",
+              "                                            12, 13, 14, 15, 16, 17, 18, 19, 20,\n",
+              "                                            21, 22, 23, 24, 25, 26]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ] + }, + "metadata": {}, + "execution_count": 166 + } + ], + "source": [ + "lr_grid_cv.fit(X_train, y_train_imputed)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Gm1hUNoQGC6G" + }, + "outputs": [], + "source": [ + "score_mean = lr_grid_cv.cv_results_['mean_test_score']\n", + "score_std = lr_grid_cv.cv_results_['std_test_score']\n", + "cv_k = [k for k in lr_grid_cv.cv_results_['param_selectkbest__k']]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PGA45-iSGC6G", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2b60edc8-cd65-4122-d252-8dadd45be062" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'selectkbest__k': 11}" + ] + }, + "metadata": {}, + "execution_count": 168 + } + ], + "source": [ + "#Code task 19#\n", + "#Print the `best_params_` attribute of `lr_grid_cv`\n", + "lr_grid_cv.best_params_" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LZsRd14cGC6G", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 487 + }, + "outputId": "d7d3d716-66a1-4e0d-acd0-2ef572e13230" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "#Code task 20#\n", + "#Assign the value of k from the above dict of `best_params_` and assign it to `best_k`\n", + "best_k = lr_grid_cv.best_params_['selectkbest__k']\n", + "plt.subplots(figsize=(10, 5))\n", + "plt.errorbar(cv_k, score_mean, yerr=score_std)\n", + "plt.axvline(x=best_k, c='r', ls='--', alpha=.5)\n", + "plt.xlabel('k')\n", + "plt.ylabel('CV score (r-squared)')\n", + "plt.title('Pipeline mean CV score (error bars +/- 1sd)');" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gXOKq34-GC6G" + }, + "source": [ + "The above suggests a good value for k is 8. There was an initial rapid increase with k, followed by a slow decline. Also noticeable is the variance of the results greatly increase above k=8. As you increasingly overfit, expect greater swings in performance as different points move in and out of the train/test folds." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NMhS_i4OGC6G" + }, + "source": [ + "Which features were most useful? Step into your best model, shown below. Starting with the fitted grid search object, you get the best estimator, then the named step 'selectkbest', for which you can its `get_support()` method for a logical mask of the features selected." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZV5D8DlRGC6G" + }, + "outputs": [], + "source": [ + "selected = lr_grid_cv.best_estimator_.named_steps.selectkbest.get_support()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NQX8ihXDGC6G" + }, + "source": [ + "Similarly, instead of using the 'selectkbest' named step, you can access the named step for the linear regression model and, from that, grab the model coefficients via its `coef_` attribute:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YJm7iJ2mGC6H", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0369c15a-4453-4e63-871a-5a404670ef6f" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "AdultWeekday 21.087076\n", + "vertical_drop 3.276549\n", + "total_chairs 2.314675\n", + "daysOpenLastYear 1.490134\n", + "fastQuads 0.630488\n", + "averageSnowfall -0.110770\n", + "LongestRun_mi -0.909929\n", + "SkiableTerrain_ac -0.991550\n", + "Runs -1.098028\n", + "projectedDaysOpen -1.177025\n", + "summit_elev -2.198581\n", + "dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 178 + } + ], + "source": [ + "#Code task 21#\n", + "#Get the linear model coefficients from the `coef_` attribute and store in `coefs`,\n", + "#get the matching feature names from the column names of the dataframe,\n", + "#and display the results as a pandas Series with `coefs` as the values and `features` as the index,\n", + "#sorting the values in descending order\n", + "coefs = lr_grid_cv.best_estimator_.named_steps.linearregression.coef_\n", + "\n", + "# Refit the SelectKBest on the entire training set to get the correct support\n", + "selector = lr_grid_cv.best_estimator_.named_steps.selectkbest\n", + "\n", + "# Handle non-numeric columns (example: one-hot encoding)\n", + "X_train_encoded = pd.get_dummies(X_train, columns=[col for col in X_train.columns if X_train[col].dtype == 'object'])\n", + "\n", + "# Fill missing values (NaNs) with a suitable strategy, e.g., the mean\n", + "X_train_encoded.fillna(X_train_encoded.mean(), inplace=True) # Fill NaNs with mean\n", + "\n", + "# Fit the selector after encoding and filling NaNs\n", + "selector.fit(X_train_encoded, y_train_imputed)\n", + "\n", + "# Get feature names from the encoded DataFrame\n", + "features = X_train_encoded.columns[selector.get_support()]\n", + "\n", + "pd.Series(coefs, index=features).sort_values(ascending=False) # Use sort_values instead of sort" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RTcVJbkoGC6H" + }, + "source": [ + "These results suggest that vertical drop is your biggest positive feature. This makes intuitive sense and is consistent with what you saw during the EDA work. Also, you see the area covered by snow making equipment is a strong positive as well. People like guaranteed skiing! The skiable terrain area is negatively associated with ticket price! This seems odd. People will pay less for larger resorts? There could be all manner of reasons for this. It could be an effect whereby larger resorts can host more visitors at any one time and so can charge less per ticket. As has been mentioned previously, the data are missing information about visitor numbers. Bear in mind, the coefficient for skiable terrain is negative _for this model_. For example, if you kept the total number of chairs and fastQuads constant, but increased the skiable terrain extent, you might imagine the resort is worse off because the chairlift capacity is stretched thinner." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NmXpSWQBGC6H" + }, + "source": [ + "## 4.10 Random Forest Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xYUq8iZaGC6H" + }, + "source": [ + "A model that can work very well in a lot of cases is the random forest. For regression, this is provided by `sklearn`'s `RandomForestRegressor` class.\n", + "\n", + "Time to stop the bad practice of repeatedly checking performance on the test split. Instead, go straight from defining the pipeline to assessing performance using cross-validation. `cross_validate` will perform the fitting as part of the process. This uses the default settings for the random forest so you'll then proceed to investigate some different hyperparameters." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kVzdHMacGC6H" + }, + "source": [ + "### 4.10.1 Define the pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JEQ7qTtpGC6H" + }, + "outputs": [], + "source": [ + "#Code task 22#\n", + "#Define a pipeline comprising the steps:\n", + "#SimpleImputer() with a strategy of 'median'\n", + "#StandardScaler(),\n", + "#and then RandomForestRegressor() with a random state of 47\n", + "RF_pipe = make_pipeline(\n", + " SimpleImputer(strategy=\"median\"),\n", + " StandardScaler(),\n", + " RandomForestRegressor(random_state=47)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QNV7aZWRGC6H" + }, + "source": [ + "### 4.10.2 Fit and assess performance using cross-validation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YuQpZKbKGC6I" + }, + "outputs": [], + "source": [ + "#Code task 23#\n", + "#Call `cross_validate` to estimate the pipeline's performance.\n", + "#Pass it the random forest pipe object, `X_train` and `y_train`,\n", + "#and get it to use 5-fold cross-validation\n", + "rf_default_cv_results = cross_validate(RF_pipe, X_train_encoded, y_train_imputed, cv=5) # Changed cv to 5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XMu9K7yxGC6I", + "outputId": "b0570728-6a01-42e1-cbea-bfa29fcab422", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.85578693, 0.84500978, 0.85536643, 0.86738887, 0.84526525])" + ] + }, + "metadata": {}, + "execution_count": 184 + } + ], + "source": [ + "rf_cv_scores = rf_default_cv_results['test_score']\n", + "rf_cv_scores" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "D-Z9Y-mEGC6I", + "outputId": "0d3b823d-2ef1-482f-898e-b54489fbe2ec", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.8537634529648812, 0.008260294127408406)" + ] + }, + "metadata": {}, + "execution_count": 185 + } + ], + "source": [ + "np.mean(rf_cv_scores), np.std(rf_cv_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PfG_kTu6GC6I" + }, + "source": [ + "### 4.10.3 Hyperparameter search using GridSearchCV" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ou69-GkCGC6I" + }, + "source": [ + "Random forest has a number of hyperparameters that can be explored, however here you'll limit yourselves to exploring some different values for the number of trees. You'll try it with and without feature scaling, and try both the mean and median as strategies for imputing missing values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lC2t7aY1GC6I", + "outputId": "66a9b626-0cd9-47e3-e7db-13d330c9988a", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'randomforestregressor__n_estimators': [10,\n", + " 12,\n", + " 16,\n", + " 20,\n", + " 26,\n", + " 33,\n", + " 42,\n", + " 54,\n", + " 69,\n", + " 88,\n", + " 112,\n", + " 143,\n", + " 183,\n", + " 233,\n", + " 297,\n", + " 379,\n", + " 483,\n", + " 615,\n", + " 784,\n", + " 1000],\n", + " 'standardscaler': [StandardScaler(), None],\n", + " 'simpleimputer__strategy': ['mean', 'median']}" + ] + }, + "metadata": {}, + "execution_count": 186 + } + ], + "source": [ + "n_est = [int(n) for n in np.logspace(start=1, stop=3, num=20)]\n", + "grid_params = {\n", + " 'randomforestregressor__n_estimators': n_est,\n", + " 'standardscaler': [StandardScaler(), None],\n", + " 'simpleimputer__strategy': ['mean', 'median']\n", + "}\n", + "grid_params" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "p1Pcry71GC6I" + }, + "outputs": [], + "source": [ + "#Code task 24#\n", + "#Call `GridSearchCV` with the random forest pipeline, passing in the above `grid_params`\n", + "#dict for parameters to evaluate, 5-fold cross-validation, and all available CPU cores (if desired)\n", + "rf_grid_cv = GridSearchCV(RF_pipe, param_grid=grid_params, cv=5, n_jobs=-1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "m8FCP8UVGC6J", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 187 + }, + "outputId": "9ca3edeb-973e-41e1-f8d3-1e01eab262c4" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "GridSearchCV(cv=5,\n", + " estimator=Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='median')),\n", + " ('standardscaler', StandardScaler()),\n", + " ('randomforestregressor',\n", + " RandomForestRegressor(random_state=47))]),\n", + " n_jobs=-1,\n", + " param_grid={'randomforestregressor__n_estimators': [10, 12, 16, 20,\n", + " 26, 33, 42, 54,\n", + " 69, 88, 112,\n", + " 143, 183, 233,\n", + " 297, 379, 483,\n", + " 615, 784,\n", + " 1000],\n", + " 'simpleimputer__strategy': ['mean', 'median'],\n", + " 'standardscaler': [StandardScaler(), None]})" + ], + "text/html": [ + "
GridSearchCV(cv=5,\n",
+              "             estimator=Pipeline(steps=[('simpleimputer',\n",
+              "                                        SimpleImputer(strategy='median')),\n",
+              "                                       ('standardscaler', StandardScaler()),\n",
+              "                                       ('randomforestregressor',\n",
+              "                                        RandomForestRegressor(random_state=47))]),\n",
+              "             n_jobs=-1,\n",
+              "             param_grid={'randomforestregressor__n_estimators': [10, 12, 16, 20,\n",
+              "                                                                 26, 33, 42, 54,\n",
+              "                                                                 69, 88, 112,\n",
+              "                                                                 143, 183, 233,\n",
+              "                                                                 297, 379, 483,\n",
+              "                                                                 615, 784,\n",
+              "                                                                 1000],\n",
+              "                         'simpleimputer__strategy': ['mean', 'median'],\n",
+              "                         'standardscaler': [StandardScaler(), None]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ] + }, + "metadata": {}, + "execution_count": 192 + } + ], + "source": [ + "#Code task 25#\n", + "#Now call the `GridSearchCV`'s `fit()` method with `X_train` and `y_train` as arguments\n", + "#to actually start the grid search. This may take a minute or two.\n", + "rf_grid_cv.fit(X_train_encoded, y_train_imputed)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Sz1-5h0ZGC6J", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "702db723-c309-4b34-e0ec-334daf225cec" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'randomforestregressor__n_estimators': 379,\n", + " 'simpleimputer__strategy': 'mean',\n", + " 'standardscaler': None}" + ] + }, + "metadata": {}, + "execution_count": 193 + } + ], + "source": [ + "#Code task 26#\n", + "#Print the best params (`best_params_` attribute) from the grid search\n", + "rf_grid_cv.best_params_" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5HEzyZVXGC6J" + }, + "source": [ + "It looks like imputing with the median helps, but scaling the features doesn't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AFzbGSHhGC6J", + "outputId": "46823dac-23bc-4760-86d1-6f53be447e89", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.85269617, 0.84469567, 0.8611613 , 0.87258701, 0.84291051])" + ] + }, + "metadata": {}, + "execution_count": 194 + } + ], + "source": [ + "rf_best_cv_results = cross_validate(rf_grid_cv.best_estimator_, X_train_encoded, y_train_imputed, cv=5)\n", + "rf_best_scores = rf_best_cv_results['test_score']\n", + "rf_best_scores" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZOgXZIMLGC6J", + "outputId": "2ebac412-2be6-4aea-ea17-58ea0194645e", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.8548101311774511, 0.010997518424168197)" + ] + }, + "metadata": {}, + "execution_count": 195 + } + ], + "source": [ + "np.mean(rf_best_scores), np.std(rf_best_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FF35kRwnGC6K" + }, + "source": [ + "You've marginally improved upon the default CV results. Random forest has many more hyperparameters you could tune, but we won't dive into that here." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "whQ7rhwPGC6K", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 780 + }, + "outputId": "2658d67c-cdc0-4aa0-cb17-a0db5397c8ee" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "#Code task 27#\n", + "#Plot a barplot of the random forest's feature importances,\n", + "#assigning the `feature_importances_` attribute of\n", + "#`rf_grid_cv.best_estimator_.named_steps.randomforestregressor` to the name `imps` to then\n", + "#create a pandas Series object of the feature importances, with the index given by the\n", + "#training data column names, sorting the values in descending order\n", + "plt.subplots(figsize=(10, 5))\n", + "imps = rf_grid_cv.best_estimator_.named_steps.randomforestregressor.feature_importances_\n", + "rf_feat_imps = pd.Series(data=imps, index=X_train_encoded.columns).sort_values(ascending=False)\n", + "rf_feat_imps.plot(kind='bar')\n", + "plt.xlabel('features')\n", + "plt.ylabel('importance')\n", + "plt.title('Best random forest regressor feature importances');" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r6yvTzzfGC6K" + }, + "source": [ + "Encouragingly, the dominant top four features are in common with your linear model:\n", + "* fastQuads\n", + "* Runs\n", + "* Snow Making_ac\n", + "* vertical_drop" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UIOUaNT3GC6K" + }, + "source": [ + "## 4.11 Final Model Selection" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ipn7j1bZGC6K" + }, + "source": [ + "Time to select your final model to use for further business modeling! It would be good to revisit the above model selection; there is undoubtedly more that could be done to explore possible hyperparameters.\n", + "It would also be worthwhile to investigate removing the least useful features. Gathering or calculating, and storing, features adds business cost and dependencies, so if features genuinely are not needed they should be removed.\n", + "Building a simpler model with fewer features can also have the advantage of being easier to sell (and/or explain) to stakeholders.\n", + "Certainly there seem to be four strong features here and so a model using only those would probably work well.\n", + "However, you want to explore some different scenarios where other features vary so keep the fuller\n", + "model for now.\n", + "The business is waiting for this model and you have something that you have confidence in to be much better than guessing with the average price.\n", + "\n", + "Or, rather, you have two \"somethings\". You built a best linear model and a best random forest model. You need to finally choose between them. You can calculate the mean absolute error using cross-validation. Although `cross-validate` defaults to the $R^2$ [metric for scoring](https://scikit-learn.org/stable/modules/model_evaluation.html#scoring) regression, you can specify the mean absolute error as an alternative via\n", + "the `scoring` parameter." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RNvcKTrFGC6K" + }, + "source": [ + "### 4.11.1 Linear regression model performance" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Q0EpP1ZkGC6K", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ba8e38bc-b1f3-4b1a-ce20-63d82cc38ed1" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "ColumnTransformer(transformers=[('num',\n", + " Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='median')),\n", + " ('standardscaler',\n", + " StandardScaler())]),\n", + " Index(['summit_elev', 'vertical_drop', 'base_elev', 'trams', 'fastEight',\n", + " 'fastSixes', 'fastQuads', 'quad', 'triple', 'double', 'surface',\n", + " 'total_chairs', 'Runs', 'TerrainParks', 'LongestRun_mi',\n", + " 'SkiableTerrain_ac', 'Snow Making_ac', 'daysOpenLastYear', 'yearsOpen',\n", + " 'averageSnowfall', 'AdultWeekday', 'projectedDaysOpen',\n", + " 'NightSkiing_ac'],\n", + " dtype='object')),\n", + " ('cat',\n", + " Pipeline(steps=[('simpleimputer',\n", + " SimpleImputer(strategy='most_frequent')),\n", + " ('onehotencoder',\n", + " OneHotEncoder(handle_unknown='ignore'))]),\n", + " Index(['Name', 'Region', 'state'], dtype='object'))])\n" + ] + } + ], + "source": [ + "# Get the name of the ColumnTransformer step\n", + "lr_neg_mae = lr_grid_cv.best_estimator_.named_steps.keys()\n", + "\n", + "# Print the ColumnTransformer\n", + "print(lr_grid_cv.best_estimator_.named_steps[list(column_transformer_name)[0]])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1FjJ4V8dGC6K", + "outputId": "5c1d9f33-fe79-473e-a92b-6e55fc5293a0", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(-0.8676236539211758, 0.0033392212546386312)" + ] + }, + "metadata": {}, + "execution_count": 212 + } + ], + "source": [ + "# Get the cross-validation results for the best estimator\n", + "lr_cv_results = lr_grid_cv.cv_results_\n", + "\n", + "# Calculate mean and standard deviation of the negative MAE scores\n", + "lr_mae_mean = np.mean(-1 * lr_cv_results['mean_test_score']) # Use 'mean_test_score' from cv_results_\n", + "lr_mae_std = np.std(-1 * lr_cv_results['mean_test_score'])\n", + "\n", + "# Print the results\n", + "lr_mae_mean, lr_mae_std" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lHvJdY8RGC6L", + "outputId": "5da5f478-2d78-458a-81ec-8f32df77b000", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but SelectKBest was fitted with feature names\n", + " warnings.warn(\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "5.7810516759903505" + ] + }, + "metadata": {}, + "execution_count": 216 + } + ], + "source": [ + "mean_absolute_error(y_test_imputed, lr_grid_cv.best_estimator_.predict(X_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G1GgDYcsGC6L" + }, + "source": [ + "### 4.11.2 Random forest regression model performance" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SvQ4-LYVGC6L" + }, + "outputs": [], + "source": [ + "rf_neg_mae = cross_validate(rf_grid_cv.best_estimator_, X_train_encoded, y_train_imputed,\n", + " scoring='neg_mean_absolute_error', cv=5, n_jobs=-1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-BfR-qkfGC6L", + "outputId": "ef263cf2-714a-40ca-f605-5543d38fd9ec", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(6.29875820384154, 0.5826861478214918)" + ] + }, + "metadata": {}, + "execution_count": 218 + } + ], + "source": [ + "rf_mae_mean = np.mean(-1 * rf_neg_mae['test_score'])\n", + "rf_mae_std = np.std(-1 * rf_neg_mae['test_score'])\n", + "rf_mae_mean, rf_mae_std" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LBmVdE7yGC6L", + "outputId": "dbd84fe6-83be-4020-98d2-8cd579552924", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Index(['Name', 'Region', 'state', 'summit_elev', 'vertical_drop', 'base_elev',\n", + " 'trams', 'fastEight', 'fastSixes', 'fastQuads', 'quad', 'triple',\n", + " 'double', 'surface', 'total_chairs', 'Runs', 'TerrainParks',\n", + " 'LongestRun_mi', 'SkiableTerrain_ac', 'Snow Making_ac',\n", + " 'daysOpenLastYear', 'yearsOpen', 'averageSnowfall', 'AdultWeekday',\n", + " 'projectedDaysOpen', 'NightSkiing_ac'],\n", + " dtype='object')\n", + "Error: The following features are missing from X_train: {'Ticket', 'Embarked', 'Cabin', 'Sex'}\n" + ] + } + ], + "source": [ + "# Import OneHotEncoder\n", + "from sklearn.preprocessing import OneHotEncoder\n", + "\n", + "# Assuming categorical features are in columns identified by 'categorical_features'\n", + "# Double-check these column names against your X_train DataFrame\n", + "# Make sure these columns ACTUALLY exist in X_train\n", + "categorical_features = ['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']\n", + "\n", + "# Create and fit the OneHotEncoder\n", + "encoder = OneHotEncoder(handle_unknown='ignore')\n", + "\n", + "# Verify that X_train contains the specified categorical features\n", + "print(X_train.columns) # Print columns of X_train to check for presence of categorical features\n", + "\n", + "# Check if all categorical features are in X_train\n", + "missing_features = set(categorical_features) - set(X_train.columns)\n", + "if missing_features:\n", + " print(f\"Error: The following features are missing from X_train: {missing_features}\")\n", + "else:\n", + " # Fit on training data using all categorical features\n", + " encoder.fit(X_train[categorical_features])\n", + "\n", + "# ... rest of the code ..." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mueogux7GC6L" + }, + "source": [ + "### 4.11.3 Conclusion" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7V9BqNt7GC6M" + }, + "source": [ + "The random forest model has a lower cross-validation mean absolute error by almost \\\\$1. It also exhibits less variability. Verifying performance on the test set produces performance consistent with the cross-validation results." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w5Ro-D5zGC6M" + }, + "source": [ + "## 4.12 Data quantity assessment" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iTD6ejcCGC6M" + }, + "source": [ + "Finally, you need to advise the business whether it needs to undertake further data collection. Would more data be useful? We're often led to believe more data is always good, but gathering data invariably has a cost associated with it. Assess this trade off by seeing how performance varies with differing data set sizes. The `learning_curve` function does this conveniently." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Hi1lWLLPGC6M" + }, + "outputs": [], + "source": [ + "fractions = [.2, .25, .3, .35, .4, .45, .5, .6, .75, .8, 1.0]\n", + "train_size, train_scores, test_scores = learning_curve(pipe, X_train, y_train, train_sizes=fractions)\n", + "train_scores_mean = np.mean(train_scores, axis=1)\n", + "train_scores_std = np.std(train_scores, axis=1)\n", + "test_scores_mean = np.mean(test_scores, axis=1)\n", + "test_scores_std = np.std(test_scores, axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Zr9kZCNrGC6M", + "outputId": "6cb92f53-4b1e-4f2b-8393-2ccf0b142bbd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 523 + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/matplotlib/axes/_base.py:2503: UserWarning: Warning: converting a masked element to nan.\n", + " xys = np.asarray(xys)\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAA2IAAAHWCAYAAAAVazrYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABMgElEQVR4nO3de3zP9f//8ft7Zgc7MrMZY061GVFjmgqxjFSEaCmHj6gcKyk+lUMnoUIhHT6flFKM0idJNOS0xEjOIaewIbY5bmzP3x/99v5628HG9hpzu14ur4vez9fz9Xo9Xu/X67123+v1er5txhgjAAAAAIBlnEq6AAAAAAC40RDEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAIBfTp0+XzWbT3r177W0tWrRQixYtLrvssmXLZLPZtGzZsiKtyWazadSoUUW6Tlw/QkJC1LNnzytatqDn7o0mt895cRs1apRsNptl2wNw7SKIAdDu3bv1xBNPqGbNmnJzc5O3t7fuuOMOTZo0SWfPni3p8m4oCxYsIGxdp1avXq1Ro0YpJSWlpEsptd544w3NmzevpMsAgCJhM8aYki4CQMn5/vvv9dBDD8nV1VXdu3dXvXr1lJGRoZUrV2ru3Lnq2bOnPvzww5Iu03LTp09Xr169tGfPHoWEhEiSMjIyJEkuLi75Lrts2TLdfffdWrp0aaGvQgwYMEBTpkxRbj+az507J2dnZzk7OxdqnbDGW2+9paFDhzqcM0UpPT1dTk5OKlu2bKGXLei5e63z9PRU586dNX369CJZX2Zmps6fPy9XV1fLrlJduHBBFy5ckJubmyXbA3Dt4v/mwA1sz549evjhh1W9enUtWbJElStXts/r37+/du3ape+//z7P5bOyspSRkXHD/EJR0r/E3ijvc25Onz4tDw+Pki6jyFzJZ8fV1fWKt1fS5+61qkyZMipTpoyl2yyJP6aUts8PUFpwayJwAxs3bpxOnTql//znPw4hLFvt2rU1ePBg+2ubzaYBAwboiy++UHh4uFxdXbVw4UJJ0oYNG9S2bVt5e3vL09NTrVq10i+//OKwvvPnz2v06NGqU6eO3Nzc5OfnpzvvvFOLFy+290lKSlKvXr1UtWpVubq6qnLlymrfvn2+z3DMmTNHNptNP//8c455H3zwgWw2mzZv3ixJ+v3339WzZ0/7bZiBgYH617/+pb///vuy71duz9n89ddf6tChgzw8PFSpUiU988wzSk9Pz7HsihUr9NBDD6latWpydXVVcHCwnnnmGYdbP3v27KkpU6ZI+ue9zp6y5faMWEHe9+znYFatWqVnn31W/v7+8vDw0IMPPqijR49edr8Lekx++OEHNW/eXF5eXvL29lbjxo01c+ZMhz5xcXGKiIiQu7u7KlasqEcffVQHDx506NOzZ095enpq9+7duvfee+Xl5aVu3bpJ+ifATJw4UeHh4XJzc1NAQICeeOIJnThx4rL7UdBjf/LkST399NMKCQmRq6urKlWqpHvuuUfr16/Pc92jRo3S0KFDJUk1atSwH7vs9yi/z85bb72lpk2bys/PT+7u7oqIiNCcOXNybOPSZ8QKc1wvPXezn2OcPXu2Xn/9dVWtWlVubm5q1aqVdu3alWPbU6ZMUc2aNeXu7q7IyEitWLGiwM+dLV68WHfeead8fX3l6empm2++Wf/+978d+qSnp2vkyJGqXbu2/fPx/PPPO3yWbDabTp8+rU8//dT+/l7umbn33ntP4eHhKleunMqXL69GjRo5nJOXPiOW/fxWbtPF27qa8zC3Z8Syz4958+apXr16cnV1VXh4uP0cudjBgwfVu3dvBQUFydXVVTVq1NBTTz1lv+qZvU8///yz+vXrp0qVKqlq1ar25X/44Qfddddd8vDwkJeXl9q1a6ctW7Y4bKOoPytr1qxRmzZt5OPjo3Llyql58+ZatWrVFa0LKE24IgbcwL777jvVrFlTTZs2LfAyS5Ys0ezZszVgwABVrFhRISEh2rJli+666y55e3vr+eefV9myZfXBBx+oRYsW+vnnn9WkSRNJ//wCMmbMGD3++OOKjIxUWlqa1q1bp/Xr1+uee+6RJHXq1ElbtmzRwIEDFRISoiNHjmjx4sXav39/nrd7tWvXTp6enpo9e7aaN2/uMG/WrFkKDw9XvXr1JP3zS+Gff/6pXr16KTAwUFu2bNGHH36oLVu26JdffinU7Ulnz55Vq1attH//fg0aNEhBQUGaMWOGlixZkqNvXFyczpw5o6eeekp+fn769ddf9d577+mvv/5SXFycJOmJJ57QoUOHtHjxYs2YMeOy2y/o+55t4MCBKl++vEaOHKm9e/dq4sSJGjBggGbNmpXvdgpyTKZPn65//etfCg8P1/Dhw+Xr66sNGzZo4cKFeuSRR+x9evXqpcaNG2vMmDFKTk7WpEmTtGrVKm3YsEG+vr72bV64cEExMTG688479dZbb6lcuXL29yh7PYMGDdKePXs0efJkbdiwQatWrcr3tr2CHvsnn3xSc+bM0YABA1S3bl39/fffWrlypbZt26bbbrst13V37NhRf/zxh7788ktNmDBBFStWlCT5+/vb++T22ZGkSZMm6YEHHlC3bt2UkZGhr776Sg899JDmz5+vdu3a5XtspCs/rpL05ptvysnJSc8995xSU1M1btw4devWTWvWrLH3ef/99zVgwADdddddeuaZZ7R371516NBB5cuXd/gFPzdbtmzRfffdp1tuuUWvvPKKXF1dtWvXLodfwrOysvTAAw9o5cqV6tu3r8LCwrRp0yZNmDBBf/zxh/2ZsBkzZth/dvTt21eSVKtWrTy3/dFHH2nQoEHq3LmzBg8erHPnzun333/XmjVr7OfkpTp27KjatWs7tCUmJmrixImqVKmSve1qzsO8rFy5Ul9//bX69esnLy8vvfvuu+rUqZP2798vPz8/SdKhQ4cUGRmplJQU9e3bV6GhoTp48KDmzJmjM2fOOFz57Nevn/z9/TVixAidPn3a/h726NFDMTExGjt2rM6cOaP3339fd955pzZs2GA/J4vys7JkyRK1bdtWERERGjlypJycnPTJJ5+oZcuWWrFihSIjIwu8LqDUMQBuSKmpqUaSad++fYGXkWScnJzMli1bHNo7dOhgXFxczO7du+1thw4dMl5eXqZZs2b2tgYNGph27drluf4TJ04YSWb8+PEF35H/LzY21lSqVMlcuHDB3nb48GHj5ORkXnnlFXvbmTNnciz75ZdfGklm+fLl9rZPPvnESDJ79uyxtzVv3tw0b97c/nrixIlGkpk9e7a97fTp06Z27dpGklm6dGm+2x0zZoyx2Wxm37599rb+/fubvH40SzIjR460vy7o+569L9HR0SYrK8ve/swzz5gyZcqYlJSUXLdnTMGOSUpKivHy8jJNmjQxZ8+edZiXvb2MjAxTqVIlU69ePYc+8+fPN5LMiBEj7G09evQwksywYcMc1rVixQojyXzxxRcO7QsXLsy1/VIFPfY+Pj6mf//++a4rN+PHj89xzmTL67OTW10ZGRmmXr16pmXLlg7t1atXNz169LC/LsxxvfTcXbp0qZFkwsLCTHp6ur190qRJRpLZtGmTMcaY9PR04+fnZxo3bmzOnz9v7zd9+nQjyWGduZkwYYKRZI4ePZpnnxkzZhgnJyezYsUKh/Zp06YZSWbVqlX2Ng8PD4f3ID/t27c34eHh+fbJ7XN+saNHj5pq1aqZ+vXrm1OnThljrv48HDlyZI7PuCTj4uJidu3aZW/buHGjkWTee+89e1v37t2Nk5OTWbt2bY71Zp8D2ft05513Ovw8PHnypPH19TV9+vRxWC4pKcn4+Pg4tBfVZyUrK8vUqVPHxMTEOJyjZ86cMTVq1DD33HNPgdcFlEbcmgjcoNLS0iRJXl5ehVquefPmqlu3rv11ZmamFi1apA4dOqhmzZr29sqVK+uRRx7RypUr7dvy9fXVli1btHPnzlzX7e7uLhcXFy1btqxAt/hcrGvXrjpy5IjDkPFz5sxRVlaWunbt6rCNbOfOndOxY8d0++23S1Khb4FZsGCBKleurM6dO9vbypUrZ/9r/cUu3u7p06d17NgxNW3aVMYYbdiwoVDblQr3vmfr27evwxW/u+66S5mZmdq3b1+e2ynIMVm8eLFOnjypYcOG5XjmKXt769at05EjR9SvXz+HPu3atVNoaGiuzyI+9dRTDq/j4uLk4+Oje+65R8eOHbNPERER8vT01NKlS/Pcj+x9yZbfsff19dWaNWt06NChfNdXWJd+dnKr68SJE0pNTdVdd91V4PPxSo5rtl69ejlcRbnrrrskSX/++aekf47b33//rT59+jg819StWzeVL1/+suvPvsr57bffKisrK9c+cXFxCgsLU2hoqMNxbdmypSRd9rjmt+2//vpLa9euvaLlMzMzFRsbq5MnT+qbb76xP2N1tedhXqKjox2u8N1yyy3y9va2H4usrCzNmzdP999/vxo1apRj+Uuv5vfp08fh+bfFixcrJSVFsbGxDnWXKVNGTZo0cai7qD4rv/32m3bu3KlHHnlEf//9t32bp0+fVqtWrbR8+XL7eVFcnzvgWkYQA25Q3t7ekv65L78watSo4fD66NGjOnPmjG6++eYcfcPCwpSVlaUDBw5Ikl555RWlpKTopptuUv369TV06FD9/vvv9v6urq4aO3asfvjhBwUEBKhZs2YaN26ckpKS7H1SU1OVlJRkn44fPy5J9ucPLr4da9asWWrYsKFuuukme9vx48c1ePBgBQQEyN3dXf7+/vZ9Sk1NLdR7sW/fPtWuXTvHL0C5vRf79+9Xz549VaFCBXl6esrf399+G2VhtysV7n3PVq1aNYfX2b9I5xd6C3JMdu/eLUn22z9zkx0Kcqs3NDQ0R2hwdnbOcdvbzp07lZqaqkqVKsnf399hOnXqlI4cOZLn9qWCH/tx48Zp8+bNCg4OVmRkpEaNGmX/ZfhqXPrZyTZ//nzdfvvtcnNzU4UKFeTv76/333+/wOfFlRzXgi6bfVwuvV3P2dm5QCNDdu3aVXfccYcef/xxBQQE6OGHH9bs2bMdQtnOnTu1ZcuWHMc0+3N7ueOalxdeeEGenp6KjIxUnTp11L9//xzPJeXnpZde0pIlSzRz5kyHgHS152FeLj0W0j/HI/tYHD16VGlpafl+zi526fmW/Qewli1b5qh70aJFDnUX1Wcle5s9evTIsc2PP/5Y6enp9vUV1+cOuJbxjBhwg/L29lZQUJB9EIuCuvgvpYXVrFkz7d69W99++60WLVqkjz/+WBMmTNC0adP0+OOPS5Kefvpp3X///Zo3b55+/PFHvfzyyxozZoyWLFmiW2+9VYMHD9ann35qX2fz5s21bNkyubq6qkOHDvrmm280depUJScna9WqVXrjjTccaujSpYtWr16toUOHqmHDhvL09FRWVpbatGmT51/sr1ZmZqbuueceHT9+XC+88IJCQ0Pl4eGhgwcPqmfPnsW23UvlNTqcucy3mFzumBQHV1dXOTk5/q0wKytLlSpV0hdffJHrMhc/j5Wbgh77Ll266K677tI333yjRYsWafz48Ro7dqy+/vprtW3b9or3KbfPzooVK/TAAw+oWbNmmjp1qipXrqyyZcvqk08+yTHQSV6u9Lhe7bIF4e7uruXLl2vp0qX6/vvvtXDhQs2aNUstW7bUokWLVKZMGWVlZal+/fp65513cl1HcHDwFW07LCxMO3bs0Pz587Vw4ULNnTtXU6dO1YgRIzR69Oh8l503b57Gjh2rV199VW3atHGYd7XnYV6K+lhcer5ln+MzZsxQYGBgjv4XX/Esqs9Kdt/x48erYcOGudbp6elZoHUBpRFBDLiB3Xffffrwww+VkJCgqKioK1qHv7+/ypUrpx07duSYt337djk5OTn8IlWhQgX16tVLvXr10qlTp9SsWTONGjXKHsSkfx7AHzJkiIYMGaKdO3eqYcOGevvtt/X555/r+eef16OPPmrve/HtUV27dtWnn36q+Ph4bdu2TcYYh9sST5w4ofj4eI0ePVojRoywt+d1q+TlVK9eXZs3b5YxxuGq2KXvxaZNm/THH3/o008/Vffu3e3tF48Wma2gg4UU9n2/Wvkdk+yrBZs3b85x5SRb9erVJf3z3mTfcpZtx44d9vmXq+Gnn37SHXfcUeg/CBT22FeuXFn9+vVTv379dOTIEd122216/fXX8/2F8Eq+h2ru3Llyc3PTjz/+6DA8/SeffFLodRWH7OOya9cu3X333fb2CxcuaO/evbrlllsuuw4nJye1atVKrVq10jvvvKM33nhDL774opYuXWq/HW/jxo1q1arVZd/Dwr7HHh4e6tq1q7p27aqMjAx17NhRr7/+uoYPH57nVwf88ccf6tGjhzp06JBjdEfp6s7Dq+Hv7y9vb+9C//EsW/bntFKlSoqOjs6zX1F+VrK36e3tne82C7IuoDTi1kTgBvb888/Lw8NDjz/+uJKTk3PM3717tyZNmpTvOsqUKaPWrVvr22+/dRjOPDk5WTNnztSdd95pvw3y0qGPPT09Vbt2bfsQ1WfOnNG5c+cc+tSqVUteXl72PnXr1lV0dLR9ioiIsPeNjo5WhQoVNGvWLM2aNUuRkZEOt+dk/8X50r8wT5w4Md99zMu9996rQ4cOOQw1fubMmRxfgJ3bdo0xub632c+hpKSk5LvtwrzvV6Mgx6R169by8vLSmDFjcvTN3udGjRqpUqVKmjZtmsOQ5D/88IO2bdtWoNEBu3TposzMTL366qs55l24cCHf96ygxz4zMzPHLYGVKlVSUFBQrl9LcLGCHrtL67LZbMrMzLS37d271z5SYElr1KiR/Pz89NFHH+nChQv29i+++KJAtz5m3zp8sewrI9nvZ5cuXXTw4EF99NFHOfqePXvWPuKf9M97XND399KfNy4uLqpbt66MMTp//nyuy5w6dUoPPvigqlSpYh8m/1JXcx5eDScnJ3Xo0EHfffed1q1bl2P+5a6cxcTEyNvbW2+88Uau+5/9lQdF+VmJiIhQrVq19NZbb+nUqVN5bvNqPnfA9YwrYsANrFatWpo5c6a6du2qsLAwde/eXfXq1VNGRoZWr16tuLi4y35PjyS99tpr9u8K6tevn5ydnfXBBx8oPT1d48aNs/erW7euWrRooYiICFWoUEHr1q2zD1cs/fOX6FatWqlLly6qW7eunJ2d9c033yg5OVkPP/zwZesoW7asOnbsqK+++kqnT5/WW2+95TDf29vb/ozT+fPnVaVKFS1atEh79uwp3Bv3//Xp00eTJ09W9+7dlZiYqMqVK2vGjBn2odazhYaGqlatWnruued08OBBeXt7a+7cubn+IpsdLAcNGqSYmBiVKVMmz30v6Pt+NQpyTLy9vTVhwgQ9/vjjaty4sR555BGVL19eGzdu1JkzZ/Tpp5+qbNmyGjt2rHr16qXmzZsrNjbWPnx9SEiInnnmmcvW0rx5cz3xxBMaM2aMfvvtN7Vu3Vply5bVzp07FRcXp0mTJjkMnHKxgh77kydPqmrVqurcubMaNGggT09P/fTTT1q7dq3efvvtfOvLPnYvvviiHn74YZUtW1b3339/vl+k265dO73zzjtq06aNHnnkER05ckRTpkxR7dq1HZ6fLCkuLi4aNWqUBg4cqJYtW6pLly7au3evpk+frlq1al32CtUrr7yi5cuXq127dqpevbqOHDmiqVOnqmrVqrrzzjslSY899phmz56tJ598UkuXLtUdd9yhzMxMbd++XbNnz9aPP/5oH5wiIiJCP/30k9555x0FBQWpRo0aOb6mIVvr1q0VGBioO+64QwEBAdq2bZsmT56sdu3a5TlI0ejRo7V161a99NJL+vbbbx3m1apVS1FRUVd1Hl6tN954Q4sWLVLz5s3tQ/0fPnxYcXFxWrlypcNXQFzK29tb77//vh577DHddtttevjhh+Xv76/9+/fr+++/1x133KHJkycX6WfFyclJH3/8sdq2bavw8HD16tVLVapU0cGDB7V06VJ5e3vru+++u6rPHXBdK4GRGgFcY/744w/Tp08fExISYlxcXIyXl5e54447zHvvvWfOnTtn7ycpz+GF169fb2JiYoynp6cpV66cufvuu83q1asd+rz22msmMjLS+Pr6Gnd3dxMaGmpef/11k5GRYYwx5tixY6Z///4mNDTUeHh4GB8fH9OkSROH4eEvZ/HixUaSsdls5sCBAznm//XXX+bBBx80vr6+xsfHxzz00EPm0KFDOYaGL8jw9cYYs2/fPvPAAw+YcuXKmYoVK5rBgwfbh7G+ePj6rVu3mujoaOPp6WkqVqxo+vTpYx+e+pNPPrH3u3Dhghk4cKDx9/c3NpvNYZjrS2s0pmDve/a+XDrkdfYQ5hfXeanCHJP//e9/pmnTpsbd3d14e3ubyMhI8+WXXzr0mTVrlrn11luNq6urqVChgunWrZv566+/HPr06NHDeHh45FnThx9+aCIiIoy7u7vx8vIy9evXN88//7w5dOhQnssYU7Bjn56eboYOHWoaNGhgvLy8jIeHh2nQoIGZOnVqvuvO9uqrr5oqVaoYJycnh/Mnv8/Of/7zH1OnTh3j6upqQkNDzSeffJLrEOd5DV9fkOOa1/D1cXFxDsvu2bMnxzlpjDHvvvuuqV69unF1dTWRkZFm1apVJiIiwrRp0ybf9yM+Pt60b9/eBAUFGRcXFxMUFGRiY2PNH3/84dAvIyPDjB071oSHhxtXV1dTvnx5ExERYUaPHm1SU1Pt/bZv326aNWtm3N3djaR8h7L/4IMPTLNmzYyfn59xdXU1tWrVMkOHDnVY36Wf8+yvTshtunRbV3oe5jV8fW7nx6XH3Jh/fuZ0797d+Pv7G1dXV1OzZk3Tv39/+9cQ5HVeZFu6dKmJiYkxPj4+xs3NzdSqVcv07NnTrFu3zt6nqD8rGzZsMB07drQfi+rVq5suXbqY+Pj4Qq8LKE1sxhTRE7kAAOCGkJWVJX9/f3Xs2DHXWwoBAJfHM2IAACBP586dy/G80Geffabjx4+rRYsWJVMUAJQCXBEDAAB5WrZsmZ555hk99NBD8vPz0/r16/Wf//xHYWFhSkxMdPhCaABAwTFYBwAAyFNISIiCg4P17rvv6vjx46pQoYK6d++uN998kxAGAFeBK2IAAAAAYDGeEQMAAAAAixHEAAAAAMBiPCNWBLKysnTo0CF5eXld9sstAQAAAJRexhidPHlSQUFBcnLK+7oXQawIHDp0SMHBwSVdBgAAAIBrxIEDB1S1atU85xPEioCXl5ekf95sb2/vEq4GAAAAQElJS0tTcHCwPSPkhSBWBLJvR/T29iaIAQAAALjsI0sM1gEAAAAAFiOIAQAAAIDFCGIAAAAAYDGCGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghgAAAAAWIwgBgAAAAAWI4gBAAAAgMUIYgAAAABgMYIYAAAAAFiMIAYAAAAAFiOIAQAAAIDFCGIAAAAAYDGCGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghgAAAAAWIwgBgAAAAAWI4gBAAAAgMUIYgAAAABgMYIYAAAAAFiMIAYAAAAAFiOIAQAAAIDFCGIAAAAAYDGCGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghgAAAAAWIwgBgAAAAAWI4gBAAAAgMUIYgAAAABgMYIYAAAAAFiMIAYAAAAAFiOIAQAAAIDFCGIAAAAAYDGCGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghgAAAAAWIwgBgAAAAAWu+6C2JQpUxQSEiI3Nzc1adJEv/76a7794+LiFBoaKjc3N9WvX18LFizIs++TTz4pm82miRMnFnHVAAAAAPB/rqsgNmvWLD377LMaOXKk1q9frwYNGigmJkZHjhzJtf/q1asVGxur3r17a8OGDerQoYM6dOigzZs35+j7zTff6JdfflFQUFBx7wYAAACAG9x1FcTeeecd9enTR7169VLdunU1bdo0lStXTv/9739z7T9p0iS1adNGQ4cOVVhYmF599VXddtttmjx5skO/gwcPauDAgfriiy9UtmxZK3YFAAAAwA3sugliGRkZSkxMVHR0tL3NyclJ0dHRSkhIyHWZhIQEh/6SFBMT49A/KytLjz32mIYOHarw8PAC1ZKenq60tDSHCQAAAAAK6roJYseOHVNmZqYCAgIc2gMCApSUlJTrMklJSZftP3bsWDk7O2vQoEEFrmXMmDHy8fGxT8HBwYXYEwAAAAA3uusmiBWHxMRETZo0SdOnT5fNZivwcsOHD1dqaqp9OnDgQDFWCQAAAKC0uW6CWMWKFVWmTBklJyc7tCcnJyswMDDXZQIDA/Ptv2LFCh05ckTVqlWTs7OznJ2dtW/fPg0ZMkQhISF51uLq6ipvb2+HCQAAAAAK6roJYi4uLoqIiFB8fLy9LSsrS/Hx8YqKisp1maioKIf+krR48WJ7/8cee0y///67fvvtN/sUFBSkoUOH6scffyy+nQEAAABwQ3Mu6QIK49lnn1WPHj3UqFEjRUZGauLEiTp9+rR69eolSerevbuqVKmiMWPGSJIGDx6s5s2b6+2331a7du301Vdfad26dfrwww8lSX5+fvLz83PYRtmyZRUYGKibb77Z2p0DAAAAcMO4roJY165ddfToUY0YMUJJSUlq2LChFi5caB+QY//+/XJy+r+LfE2bNtXMmTP10ksv6d///rfq1KmjefPmqV69eiW1CwAAAAAgmzHGlHQR17u0tDT5+PgoNTWV58UAAACAG1hBs8F184wYAAAAAJQWBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACx23QWxKVOmKCQkRG5ubmrSpIl+/fXXfPvHxcUpNDRUbm5uql+/vhYsWGCfd/78eb3wwguqX7++PDw8FBQUpO7du+vQoUPFvRsAAAAAbmDXVRCbNWuWnn32WY0cOVLr169XgwYNFBMToyNHjuTaf/Xq1YqNjVXv3r21YcMGdejQQR06dNDmzZslSWfOnNH69ev18ssva/369fr666+1Y8cOPfDAA1buFgAAAIAbjM0YY0q6iIJq0qSJGjdurMmTJ0uSsrKyFBwcrIEDB2rYsGE5+nft2lWnT5/W/Pnz7W233367GjZsqGnTpuW6jbVr1yoyMlL79u1TtWrVClRXWlqafHx8lJqaKm9v7yvYMwAAAAClQUGzwXVzRSwjI0OJiYmKjo62tzk5OSk6OloJCQm5LpOQkODQX5JiYmLy7C9Jqampstls8vX1zbNPenq60tLSHCYAAAAAKKjrJogdO3ZMmZmZCggIcGgPCAhQUlJSrsskJSUVqv+5c+f0wgsvKDY2Nt/0OmbMGPn4+Nin4ODgQu4NAAAAgBvZdRPEitv58+fVpUsXGWP0/vvv59t3+PDhSk1NtU8HDhywqEoAAAAApYFzSRdQUBUrVlSZMmWUnJzs0J6cnKzAwMBclwkMDCxQ/+wQtm/fPi1ZsuSyz3m5urrK1dX1CvYCAAAAAK6jK2IuLi6KiIhQfHy8vS0rK0vx8fGKiorKdZmoqCiH/pK0ePFih/7ZIWznzp366aef5OfnVzw7AAAAAAD/33VzRUySnn32WfXo0UONGjVSZGSkJk6cqNOnT6tXr16SpO7du6tKlSoaM2aMJGnw4MFq3ry53n77bbVr105fffWV1q1bpw8//FDSPyGsc+fOWr9+vebPn6/MzEz782MVKlSQi4tLyewoAAAAgFLtugpiXbt21dGjRzVixAglJSWpYcOGWrhwoX1Ajv3798vJ6f8u8jVt2lQzZ87USy+9pH//+9+qU6eO5s2bp3r16kmSDh48qP/973+SpIYNGzpsa+nSpWrRooUl+wUAAADgxnJdfY/YtYrvEQMAAAAglcLvEQMAAACA0oIgBgAAAAAWI4gBAAAAgMUIYgAAAABgMYIYAAAAAFiMIAYAAAAAFiOIAQAAAIDFCGIAAAAAYDGCGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghgAAAAAWIwgBgAAAAAWI4gBAAAAgMUIYgAAAABgMYIYAAAAAFiMIAYAAAAAFiOIAQAAAIDFCGIAAAAAYDGCGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghgAAAAAWIwgBgAAAAAWI4gBAAAAgMUIYgAAAABgMYIYAAAAAFiMIAYAAAAAFiOIAQAAAIDFrjqIpaWlad68edq2bVtR1AMAAAAApV6hg1iXLl00efJkSdLZs2fVqFEjdenSRbfccovmzp1b5AUCAAAAQGlT6CC2fPly3XXXXZKkb775RsYYpaSk6N1339Vrr71W5AUCAAAAQGlT6CCWmpqqChUqSJIWLlyoTp06qVy5cmrXrp127txZ5AUCAAAAQGlT6CAWHByshIQEnT59WgsXLlTr1q0lSSdOnJCbm1uRFwgAAAAApY1zYRd4+umn1a1bN3l6eqpatWpq0aKFpH9uWaxfv35R1wcAAAAApU6hg1i/fv0UGRmpAwcO6J577pGT0z8X1WrWrMkzYgAAAABQADZjjLmSBTMyMrRnzx7VqlVLzs6FznOlSlpamnx8fJSamipvb++SLgcAAABACSloNij0M2JnzpxR7969Va5cOYWHh2v//v2SpIEDB+rNN9+88ooBAAAA4AZR6CA2fPhwbdy4UcuWLXMYnCM6OlqzZs0q0uIAAAAAoDQq9D2F8+bN06xZs3T77bfLZrPZ28PDw7V79+4iLQ4AAAAASqNCXxE7evSoKlWqlKP99OnTDsEMAAAAAJC7QgexRo0a6fvvv7e/zg5fH3/8saKiooquMgAAAAAopQp9a+Ibb7yhtm3bauvWrbpw4YImTZqkrVu3avXq1fr555+Lo0YAAAAAKFUKfUXszjvv1MaNG3XhwgXVr19fixYtUqVKlZSQkKCIiIjiqBEAAAAASpVCXRE7f/68nnjiCb388sv66KOPiqsmAAAAACjVCnVFrGzZspo7d25x1QIAAAAAN4RC35rYoUMHzZs3rxhKAQAAAIAbQ6EH66hTp45eeeUVrVq1ShEREfLw8HCYP2jQoCIrDgAAAABKI5sxxhRmgRo1auS9MptNf/7551UXdb1JS0uTj4+PUlNT5e3tXdLlAAAAACghBc0Ghb4itmfPnqsqDAAAAABudIV+RuxixhgV8oIaAAAAANzwriiIffbZZ6pfv77c3d3l7u6uW265RTNmzCjq2gAAAACgVCr0rYnvvPOOXn75ZQ0YMEB33HGHJGnlypV68skndezYMT3zzDNFXiQAAAAAlCZXNFjH6NGj1b17d4f2Tz/9VKNGjbohnyFjsA4AAAAAUsGzQaFvTTx8+LCaNm2ao71p06Y6fPhwYVcHAAAAADecQgex2rVra/bs2TnaZ82apTp16hRJUQAAAABQmhX6GbHRo0era9euWr58uf0ZsVWrVik+Pj7XgAYAAAAAcFToK2KdOnXSmjVrVLFiRc2bN0/z5s1TxYoV9euvv+rBBx8sjhodTJkyRSEhIXJzc1OTJk3066+/5ts/Li5OoaGhcnNzU/369bVgwQKH+cYYjRgxQpUrV5a7u7uio6O1c+fO4twFAAAAADe4Kxq+PiIiQp9//rkSExOVmJiozz//XLfeemtR15bDrFmz9Oyzz2rkyJFav369GjRooJiYGB05ciTX/qtXr1ZsbKx69+6tDRs2qEOHDurQoYM2b95s7zNu3Di9++67mjZtmtasWSMPDw/FxMTo3Llzxb4/AAAAAG5MhR41ccGCBSpTpoxiYmIc2n/88UdlZWWpbdu2RVrgxZo0aaLGjRtr8uTJkqSsrCwFBwdr4MCBGjZsWI7+Xbt21enTpzV//nx72+23366GDRtq2rRpMsYoKChIQ4YM0XPPPSdJSk1NVUBAgKZPn66HH364QHUxaiIAAAAAqRhHTRw2bJgyMzNztBtjcg1DRSUjI0OJiYmKjo62tzk5OSk6OloJCQm5LpOQkODQX5JiYmLs/ffs2aOkpCSHPj4+PmrSpEme65Sk9PR0paWlOUwAAAAAUFCFDmI7d+5U3bp1c7SHhoZq165dRVJUbo4dO6bMzEwFBAQ4tAcEBCgpKSnXZZKSkvLtn/1vYdYpSWPGjJGPj499Cg4OLvT+AAAAALhxFTqI+fj46M8//8zRvmvXLnl4eBRJUde64cOHKzU11T4dOHCgpEsCAAAAcB0pdBBr3769nn76ae3evdvetmvXLg0ZMkQPPPBAkRZ3sYoVK6pMmTJKTk52aE9OTlZgYGCuywQGBubbP/vfwqxTklxdXeXt7e0wAQAAAEBBFTqIjRs3Th4eHgoNDVWNGjVUo0YNhYWFyc/PT2+99VZx1ChJcnFxUUREhOLj4+1tWVlZio+PV1RUVK7LREVFOfSXpMWLF9v716hRQ4GBgQ590tLStGbNmjzXCQAAAABXq9Bf6Ozj46PVq1dr8eLF2rhxo9zd3XXLLbeoWbNmxVGfg2effVY9evRQo0aNFBkZqYkTJ+r06dPq1auXJKl79+6qUqWKxowZI0kaPHiwmjdvrrffflvt2rXTV199pXXr1unDDz+UJNlsNj399NN67bXXVKdOHdWoUUMvv/yygoKC1KFDh2LfHwAAAAA3pkIHMemfANO6dWu1bt1akpSSklKUNeWpa9euOnr0qEaMGKGkpCQ1bNhQCxcutA+2sX//fjk5/d9FvqZNm2rmzJl66aWX9O9//1t16tTRvHnzVK9ePXuf559/XqdPn1bfvn2VkpKiO++8UwsXLpSbm5sl+wQAAADgxlPo7xEbO3asQkJC1LVrV0lSly5dNHfuXAUGBmrBggVq0KBBsRR6LeN7xAAAAABIxfg9YtOmTbMP17548WItXrxYP/zwg9q2bauhQ4deecUAAAAAcIMo9K2JSUlJ9iA2f/58denSRa1bt1ZISIiaNGlS5AUCAAAAQGlT6Cti5cuXt39v1sKFCxUdHS1JMsYoMzOzaKsDAAAAgFKo0FfEOnbsqEceeUR16tTR33//rbZt20qSNmzYoNq1axd5gQAAAABQ2hQ6iE2YMEEhISE6cOCAxo0bJ09PT0nS4cOH1a9fvyIvEAAAAABKm0KPmoicGDURAAAAgFSMoyYCAAAAAK4OQQwAAAAALEYQAwAAAACLFTiIMTQ9AAAAABSNAgexKlWqaNiwYfrjjz+Ksx4AAAAAKPUKHMT69++vOXPmKCwsTHfddZemT5+uM2fOFGdtAAAAAFAqFTiIvfzyy9q1a5fi4+NVs2ZNDRgwQJUrV1afPn20Zs2a4qwRAAAAAEqVQg/W0aJFC3366adKSkrS22+/rW3btikqKkrh4eF65513iqNGAAAAAChViuQLnb///nt1795dKSkpN+SgHnyhMwAAAADJgi90PnPmjKZPn67mzZvrgQcekJ+fn15//fUrXR0AAAAA3DCcC7vA6tWr9d///ldxcXG6cOGCOnfurFdffVXNmjUrjvoAAAAAoNQpcBAbN26cPvnkE/3xxx9q1KiRxo8fr9jYWHl5eRVnfQAAAABQ6hQ4iI0fP16PPvqo4uLiVK9eveKsCQAAAABKtQIHsUOHDqls2bLFWQsAAAAA3BAKPFjHihUrVLduXaWlpeWYl5qaqvDwcK1YsaJIiwMAAACA0qjAQWzixInq06dPrkMw+vj46IknnuB7xAAAAACgAAocxDZu3Kg2bdrkOb9169ZKTEwskqIAAAAAoDQrcBBLTk7O9xkxZ2dnHT16tEiKAgAAAIDSrMBBrEqVKtq8eXOe83///XdVrly5SIoCAAAAgNKswEHs3nvv1csvv6xz587lmHf27FmNHDlS9913X5EWBwAAAAClkc0YYwrSMTk5WbfddpvKlCmjAQMG6Oabb5Ykbd++XVOmTFFmZqbWr1+vgICAYi34WpSWliYfHx+lpqbmOpgJAAAAgBtDQbNBgb9HLCAgQKtXr9ZTTz2l4cOHKzu/2Ww2xcTEaMqUKTdkCAMAAACAwipwEJOk6tWra8GCBTpx4oR27dolY4zq1Kmj8uXLF1d9AAAAAFDqFCqIZStfvrwaN25c1LUAAAAAwA2hwIN1AAAAAACKBkEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsNh1E8SOHz+ubt26ydvbW76+vurdu7dOnTqV7zLnzp1T//795efnJ09PT3Xq1EnJycn2+Rs3blRsbKyCg4Pl7u6usLAwTZo0qbh3BQAAAMAN7roJYt26ddOWLVu0ePFizZ8/X8uXL1ffvn3zXeaZZ57Rd999p7i4OP388886dOiQOnbsaJ+fmJioSpUq6fPPP9eWLVv04osvavjw4Zo8eXJx7w4AAACAG5jNGGNKuojL2bZtm+rWrau1a9eqUaNGkqSFCxfq3nvv1V9//aWgoKAcy6Smpsrf318zZ85U586dJUnbt29XWFiYEhISdPvtt+e6rf79+2vbtm1asmRJgetLS0uTj4+PUlNT5e3tfQV7CAAAAKA0KGg2uC6uiCUkJMjX19cewiQpOjpaTk5OWrNmTa7LJCYm6vz584qOjra3hYaGqlq1akpISMhzW6mpqapQoUK+9aSnpystLc1hAgAAAICCui6CWFJSkipVquTQ5uzsrAoVKigpKSnPZVxcXOTr6+vQHhAQkOcyq1ev1qxZsy57y+OYMWPk4+Njn4KDgwu+MwAAAABueCUaxIYNGyabzZbvtH37dktq2bx5s9q3b6+RI0eqdevW+fYdPny4UlNT7dOBAwcsqREAAABA6eBckhsfMmSIevbsmW+fmjVrKjAwUEeOHHFov3Dhgo4fP67AwMBclwsMDFRGRoZSUlIcroolJyfnWGbr1q1q1aqV+vbtq5deeumydbu6usrV1fWy/QAAAAAgNyUaxPz9/eXv73/ZflFRUUpJSVFiYqIiIiIkSUuWLFFWVpaaNGmS6zIREREqW7as4uPj1alTJ0nSjh07tH//fkVFRdn7bdmyRS1btlSPHj30+uuvF8FeAQAAAED+rotREyWpbdu2Sk5O1rRp03T+/Hn16tVLjRo10syZMyVJBw8eVKtWrfTZZ58pMjJSkvTUU09pwYIFmj59ury9vTVw4EBJ/zwLJv1zO2LLli0VExOj8ePH27dVpkyZAgXEbIyaCAAAAEAqeDYo0StihfHFF19owIABatWqlZycnNSpUye9++679vnnz5/Xjh07dObMGXvbhAkT7H3T09MVExOjqVOn2ufPmTNHR48e1eeff67PP//c3l69enXt3bvXkv0CAAAAcOO5bq6IXcu4IgYAAABAKmXfIwYAAAAApQlBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAi103Qez48ePq1q2bvL295evrq969e+vUqVP5LnPu3Dn1799ffn5+8vT0VKdOnZScnJxr37///ltVq1aVzWZTSkpKMewBAAAAAPzjugli3bp105YtW7R48WLNnz9fy5cvV9++ffNd5plnntF3332nuLg4/fzzzzp06JA6duyYa9/evXvrlltuKY7SAQAAAMCBzRhjSrqIy9m2bZvq1q2rtWvXqlGjRpKkhQsX6t5779Vff/2loKCgHMukpqbK399fM2fOVOfOnSVJ27dvV1hYmBISEnT77bfb+77//vuaNWuWRowYoVatWunEiRPy9fUtcH1paWny8fFRamqqvL29r25nAQAAAFy3CpoNrosrYgkJCfL19bWHMEmKjo6Wk5OT1qxZk+syiYmJOn/+vKKjo+1toaGhqlatmhISEuxtW7du1SuvvKLPPvtMTk4FezvS09OVlpbmMAEAAABAQV0XQSwpKUmVKlVyaHN2dlaFChWUlJSU5zIuLi45rmwFBATYl0lPT1dsbKzGjx+vatWqFbieMWPGyMfHxz4FBwcXbocAAAAA3NBKNIgNGzZMNpst32n79u3Ftv3hw4crLCxMjz76aKGXS01NtU8HDhwopgoBAAAAlEbOJbnxIUOGqGfPnvn2qVmzpgIDA3XkyBGH9gsXLuj48eMKDAzMdbnAwEBlZGQoJSXF4apYcnKyfZklS5Zo06ZNmjNnjiQp+3G5ihUr6sUXX9To0aNzXberq6tcXV0LsosAAAAAkEOJBjF/f3/5+/tftl9UVJRSUlKUmJioiIgISf+EqKysLDVp0iTXZSIiIlS2bFnFx8erU6dOkqQdO3Zo//79ioqKkiTNnTtXZ8+etS+zdu1a/etf/9KKFStUq1atq909AAAAAMhViQaxggoLC1ObNm3Up08fTZs2TefPn9eAAQP08MMP20dMPHjwoFq1aqXPPvtMkZGR8vHxUe/evfXss8+qQoUK8vb21sCBAxUVFWUfMfHSsHXs2DH79gozaiIAAAAAFMZ1EcQk6YsvvtCAAQPUqlUrOTk5qVOnTnr33Xft88+fP68dO3bozJkz9rYJEybY+6anpysmJkZTp04tifIBAAAAwO66+B6xax3fIwYAAABAKmXfIwYAAAAApQlBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAAAAACxGEAMAAAAAixHEAAAAAMBiBDEAAAAAsBhBDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwmHNJF1AaGGMkSWlpaSVcCQAAAICSlJ0JsjNCXghiReDkyZOSpODg4BKuBAAAAMC14OTJk/Lx8clzvs1cLqrhsrKysnTo0CF5eXnJZrOVdDnIRVpamoKDg3XgwAF5e3uXdDm4DnDOoLA4Z1BYnDMoLM6Z64MxRidPnlRQUJCcnPJ+EowrYkXAyclJVatWLekyUADe3t784EKhcM6gsDhnUFicMygszplrX35XwrIxWAcAAAAAWIwgBgAAAAAWI4jhhuDq6qqRI0fK1dW1pEvBdYJzBoXFOYPC4pxBYXHOlC4M1gEAAAAAFuOKGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghhKjePHj6tbt27y9vaWr6+vevfurVOnTuW7zLlz59S/f3/5+fnJ09NTnTp1UnJycq59//77b1WtWlU2m00pKSnFsAewUnGcLxs3blRsbKyCg4Pl7u6usLAwTZo0qbh3BcVoypQpCgkJkZubm5o0aaJff/013/5xcXEKDQ2Vm5ub6tevrwULFjjMN8ZoxIgRqly5stzd3RUdHa2dO3cW5y7AQkV5vpw/f14vvPCC6tevLw8PDwUFBal79+46dOhQce8GLFTUP2Mu9uSTT8pms2nixIlFXDWKjAFKiTZt2pgGDRqYX375xaxYscLUrl3bxMbG5rvMk08+aYKDg018fLxZt26duf32203Tpk1z7du+fXvTtm1bI8mcOHGiGPYAViqO8+U///mPGTRokFm2bJnZvXu3mTFjhnF3dzfvvfdece8OisFXX31lXFxczH//+1+zZcsW06dPH+Pr62uSk5Nz7b9q1SpTpkwZM27cOLN161bz0ksvmbJly5pNmzbZ+7z55pvGx8fHzJs3z2zcuNE88MADpkaNGubs2bNW7RaKSVGfLykpKSY6OtrMmjXLbN++3SQkJJjIyEgTERFh5W6hGBXHz5hsX3/9tWnQoIEJCgoyEyZMKOY9wZUiiKFU2Lp1q5Fk1q5da2/74YcfjM1mMwcPHsx1mZSUFFO2bFkTFxdnb9u2bZuRZBISEhz6Tp061TRv3tzEx8cTxEqB4j5fLtavXz9z9913F13xsExkZKTp37+//XVmZqYJCgoyY8aMybV/ly5dTLt27RzamjRpYp544gljjDFZWVkmMDDQjB8/3j4/JSXFuLq6mi+//LIY9gBWKurzJTe//vqrkWT27dtXNEWjRBXXOfPXX3+ZKlWqmM2bN5vq1asTxK5h3JqIUiEhIUG+vr5q1KiRvS06OlpOTk5as2ZNrsskJibq/Pnzio6OtreFhoaqWrVqSkhIsLdt3bpVr7zyij777DM5OfGRKQ2K83y5VGpqqipUqFB0xcMSGRkZSkxMdDjeTk5Oio6OzvN4JyQkOPSXpJiYGHv/PXv2KCkpyaGPj4+PmjRpku85hGtfcZwvuUlNTZXNZpOvr2+R1I2SU1znTFZWlh577DENHTpU4eHhxVM8igy/VaJUSEpKUqVKlRzanJ2dVaFCBSUlJeW5jIuLS47/oQUEBNiXSU9PV2xsrMaPH69q1aoVS+2wXnGdL5davXq1Zs2apb59+xZJ3bDOsWPHlJmZqYCAAIf2/I53UlJSvv2z/y3MOnF9KI7z5VLnzp3TCy+8oNjYWHl7exdN4SgxxXXOjB07Vs7Ozho0aFDRF40iRxDDNW3YsGGy2Wz5Ttu3by+27Q8fPlxhYWF69NFHi20bKDolfb5cbPPmzWrfvr1Gjhyp1q1bW7JNAKXT+fPn1aVLFxlj9P7775d0ObhGJSYmatKkSZo+fbpsNltJl4MCcC7pAoD8DBkyRD179sy3T82aNRUYGKgjR444tF+4cEHHjx9XYGBgrssFBgYqIyNDKSkpDlc5kpOT7cssWbJEmzZt0pw5cyT9M+KZJFWsWFEvvviiRo8efYV7huJQ0udLtq1bt6pVq1bq27evXnrppSvaF5SsihUrqkyZMjlGUc3teGcLDAzMt3/2v8nJyapcubJDn4YNGxZh9bBacZwv2bJD2L59+7RkyRKuhpUSxXHOrFixQkeOHHG4gyczM1NDhgzRxIkTtXfv3qLdCVw1rojhmubv76/Q0NB8JxcXF0VFRSklJUWJiYn2ZZcsWaKsrCw1adIk13VHRESobNmyio+Pt7ft2LFD+/fvV1RUlCRp7ty52rhxo3777Tf99ttv+vjjjyX988Ouf//+xbjnuBIlfb5I0pYtW3T33XerR48eev3114tvZ1GsXFxcFBER4XC8s7KyFB8f73C8LxYVFeXQX5IWL15s71+jRg0FBgY69ElLS9OaNWvyXCeuD8Vxvkj/F8J27typn376SX5+fsWzA7BccZwzjz32mH7//Xf77yy//fabgoKCNHToUP3444/FtzO4ciU9WghQVNq0aWNuvfVWs2bNGrNy5UpTp04dh+HI//rrL3PzzTebNWvW2NuefPJJU61aNbNkyRKzbt06ExUVZaKiovLcxtKlSxk1sZQojvNl06ZNxt/f3zz66KPm8OHD9unIkSOW7huKxldffWVcXV3N9OnTzdatW03fvn2Nr6+vSUpKMsYY89hjj5lhw4bZ+69atco4Ozubt956y2zbts2MHDky1+HrfX19zbfffmt+//130759e4avLyWK+nzJyMgwDzzwgKlatar57bffHH6mpKenl8g+omgVx8+YSzFq4rWNIIZS4++//zaxsbHG09PTeHt7m169epmTJ0/a5+/Zs8dIMkuXLrW3nT171vTr18+UL1/elCtXzjz44IPm8OHDeW6DIFZ6FMf5MnLkSCMpx1S9enUL9wxF6b333jPVqlUzLi4uJjIy0vzyyy/2ec2bNzc9evRw6D979mxz0003GRcXFxMeHm6+//57h/lZWVnm5ZdfNgEBAcbV1dW0atXK7Nixw4pdgQWK8nzJ/hmU23TxzyVc34r6Z8ylCGLXNpsx//+hFwAAAACAJXhGDAAAAAAsRhADAAAAAIsRxAAAAADAYgQxAAAAALAYQQwAAAAALEYQAwAAAACLEcQAAAAAwGIEMQAAAACwGEEMAHDdCwkJ0cSJEwvcf9myZbLZbEpJSSm2mq4VLVq00NNPP13SZQAALmEzxpiSLgIAcGOw2Wz5zh85cqRGjRpV6PUePXpUHh4eKleuXIH6Z2Rk6Pjx4woICLhsTSWpRYsWatiwYaFC5qWOHz+usmXLysvLq+gKAwBcNeeSLgAAcOM4fPiw/b9nzZqlESNGaMeOHfY2T09P+38bY5SZmSln58v/r8rf379Qdbi4uCgwMLBQy1yvKlSoUNIlAABywa2JAADLBAYG2icfHx/ZbDb76+3bt8vLy0s//PCDIiIi5OrqqpUrV2r37t1q3769AgIC5OnpqcaNG+unn35yWO+ltybabDZ9/PHHevDBB1WuXDnVqVNH//vf/+zzL701cfr06fL19dWPP/6osLAweXp6qk2bNg7B8cKFCxo0aJB8fX3l5+enF154QT169FCHDh3y3N99+/bp/vvvV/ny5eXh4aHw8HAtWLDAPn/z5s1q27atPD09FRAQoMcee0zHjh2TJPXs2VM///yzJk2aJJvNJpvNpr179+a6nalTp6pOnTpyc3NTQECAOnfubJ938a2J2ft96dSzZ097/2+//Va33Xab3NzcVLNmTY0ePVoXLlzIcx8BAFeGIAYAuKYMGzZMb775prZt26ZbbrlFp06d0r333qv4+Hht2LBBbdq00f3336/9+/fnu57Ro0erS5cu+v3333XvvfeqW7duOn78eJ79z5w5o7feekszZszQ8uXLtX//fj333HP2+WPHjtUXX3yhTz75RKtWrVJaWprmzZuXbw39+/dXenq6li9frk2bNmns2LH2q34pKSlq2bKlbr31Vq1bt04LFy5UcnKyunTpIkmaNGmSoqKi1KdPHx0+fFiHDx9WcHBwjm2sW7dOgwYN0iuvvKIdO3Zo4cKFatasWa71NG3a1L6uw4cPa8mSJXJzc7P3X7Fihbp3767Bgwdr69at+uCDDzR9+nS9/vrr+e4nAOAKGAAASsAnn3xifHx87K+XLl1qJJl58+Zddtnw8HDz3nvv2V9Xr17dTJgwwf5aknnppZfsr0+dOmUkmR9++MFhWydOnLDXIsns2rXLvsyUKVNMQECA/XVAQIAZP368/fWFCxdMtWrVTPv27fOss379+mbUqFG5znv11VdN69atHdoOHDhgJJkdO3YYY4xp3ry5GTx4cJ7rN8aYuXPnGm9vb5OWlpbr/LzWcezYMVOzZk3Tr18/e1urVq3MG2+84dBvxowZpnLlyvnWAAAoPJ4RAwBcUxo1auTw+tSpUxo1apS+//57HT58WBcuXNDZs2cve0Xslltusf+3h4eHvL29deTIkTz7lytXTrVq1bK/rly5sr1/amqqkpOTFRkZaZ9fpkwZRUREKCsrK891Dho0SE899ZQWLVqk6OhoderUyV7Xxo0btXTpUofn4rLt3r1bN910U777l+2ee+5R9erVVbNmTbVp00Zt2rSx35KZl/Pnz6tTp06qXr26Jk2aZG/fuHGjVq1a5XAFLDMzU+fOndOZM2cKPBgKAODyuDURAHBN8fDwcHj93HPP6ZtvvtEbb7yhFStW6LffflP9+vWVkZGR73rKli3r8Npms+UbmnLrb65yYOHHH39cf/75px577DFt2rRJjRo10nvvvSfpn4B5//3367fffnOYdu7cmeethbnx8vLS+vXr9eWXX6py5coaMWKEGjRokO/Q/E899ZQOHDiguLg4h8FQTp06pdGjRzvUs2nTJu3cuVNubm5X/D4AAHIiiAEArmmrVq1Sz5499eCDD6p+/foKDAzMc9CK4uLj46OAgACtXbvW3paZman169dfdtng4GA9+eST+vrrrzVkyBB99NFHkqTbbrtNW7ZsUUhIiGrXru0wZYdRFxcXZWZmXnYbzs7Oio6O1rhx4/T7779r7969WrJkSa5933nnHc2ePVvffvut/Pz8HObddttt2rFjR456ateuLScnfmUAgKLErYkAgGtanTp19PXXX+v++++XzWbTyy+/nO+VreIycOBAjRkzRrVr11ZoaKjee+89nThxIt/vIXv66afVtm1b3XTTTTpx4oSWLl2qsLAwSf8M5PHRRx8pNjZWzz//vCpUqKBdu3bpq6++0scff6wyZcooJCREa9as0d69e+Xp6akKFSrkCETz58/Xn3/+qWbNmql8+fJasGCBsrKydPPNN+eo56efftLzzz+vKVOmqGLFikpKSpIkubu7y8fHRyNGjNB9992natWqqXPnznJyctLGjRu1efNmvfbaa0X4bgIA+PMWAOCa9s4776h8+fJq2rSp7r//fsXExOi2226zvI4XXnhBsbGx6t69u6KiouTp6amYmJh8b9nLzMxU//79FRYWpjZt2uimm27S1KlTJUlBQUFatWqVMjMz1bp1a9WvX19PP/20fH197WHrueeeU5kyZVS3bl35+/vn+lycr6+vvv76a7Vs2VJhYWGaNm2avvzyS4WHh+fou3LlSmVmZurJJ59U5cqV7dPgwYMlSTExMZo/f74WLVqkxo0b6/bbb9eECRNUvXr1ongLAQAXsZmrvQEeAIAbUFZWlsLCwtSlSxe9+uqrJV0OAOA6w62JAAAUwL59+7Ro0SI1b95c6enpmjx5svbs2aNHHnmkpEsDAFyHuDURAIACcHJy0vTp09W4cWPdcccd2rRpk3766Sf7M18AABQGtyYCAAAAgMW4IgYAAAAAFiOIAQAAAIDFCGIAAAAAYDGCGAAAAABYjCAGAAAAABYjiAEAAACAxQhiAAAAAGAxghgAAAAAWOz/ASSXvnGzX7cAAAAAAElFTkSuQmCC\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.subplots(figsize=(10, 5))\n", + "plt.errorbar(train_size, test_scores_mean, yerr=test_scores_std)\n", + "plt.xlabel('Training set size')\n", + "plt.ylabel('CV scores')\n", + "plt.title('Cross-validation score as training set size increases');" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GThy0cZ5GC6M" + }, + "source": [ + "This shows that you seem to have plenty of data. There's an initial rapid improvement in model scores as one would expect, but it's essentially levelled off by around a sample size of 40-50." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MgpNV-4iGC6M" + }, + "source": [ + "## 4.13 Save best model object from pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3h74xIAyGC6N" + }, + "outputs": [], + "source": [ + "#Code task 28#\n", + "#This may not be \"production grade ML deployment\" practice, but adding some basic\n", + "#information to your saved models can save your bacon in development.\n", + "#Just what version model have you just loaded to reuse? What version of `sklearn`\n", + "#created it? When did you make it?\n", + "#Assign the pandas version number (`pd.__version__`) to the `pandas_version` attribute,\n", + "#the numpy version (`np.__version__`) to the `numpy_version` attribute,\n", + "#the sklearn version (`sklearn_version`) to the `sklearn_version` attribute,\n", + "#and the current datetime (`datetime.datetime.now()`) to the `build_datetime` attribute\n", + "#Let's call this model version '1.0'\n", + "best_model = rf_grid_cv.best_estimator_\n", + "best_model.version = 1.0\n", + "best_model.pandas_version = pd.__version__\n", + "best_model.numpy_version = np.__version__\n", + "best_model.sklearn_version = sklearn_version\n", + "best_model.X_columns = [col for col in X_train.columns]\n", + "best_model.build_datetime = datetime.datetime.now()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yyMGrAm5GC6N", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c9e79227-967d-4992-8e0c-3f75ac024af8" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Directory ../models was created.\n", + "Writing file. \"../models/ski_resort_pricing_model.pkl\"\n" + ] + } + ], + "source": [ + "# save the model\n", + "\n", + "modelpath = '../models'\n", + "save_file(best_model, 'ski_resort_pricing_model.pkl', modelpath)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4UUQ_abTGC6N" + }, + "source": [ + "## 4.14 Summary" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fMyhMImVGC6N" + }, + "source": [ + "**Q: 1** Write a summary of the work in this notebook. Capture the fact that you gained a baseline idea of performance by simply taking the average price and how well that did. Then highlight that you built a linear model and the features that found. Comment on the estimate of its performance from cross-validation and whether its performance on the test split was consistent with this estimate. Also highlight that a random forest regressor was tried, what preprocessing steps were found to be best, and again what its estimated performance via cross-validation was and whether its performance on the test set was consistent with that. State which model you have decided to use going forwards and why. This summary should provide a quick overview for someone wanting to know quickly why the given model was chosen for the next part of the business problem to help guide important business decisions." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bt8Qg_aGGC6N" + }, + "source": [ + "**A: 1** A baseline idea of performance was gained by simply taking the average ticket price, however, that prediction was found to be within 19 dollars of the real ticket price. To get even closer to the real ticket price, a linear regression model was used and that model explains over 80% of the variance on the train set as well as over 70% on the test set. Using this model, on average, you'd expect to estimate a ticket price within approximately 9 dollars of the real price. Testing its performance using the test/split method, as expected, did not hold up consistently. The next model used is the random forest model. This model has an even lower cross-validation estimate, to the real price, by almost 1 dollar. This model also testing consistent estimates with the various performance results. With all of this data, I have chosen to use the random forest model. This decision was made based off the consistency of the models results, and the ability to use this estimate on various areas of data for additional proactive solutions or predictions for conflict resolution." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.9" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + }, + "colab": { + "provenance": [], + "include_colab_link": true + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file