Predict Bike Sharing Demand with AutoGluon Template¶
+Project: Predict Bike Sharing Demand with AutoGluon¶
This notebook is a template with each step that you need to complete for the project.
+Please fill in your code where there are explicit ?
markers in the notebook. You are welcome to add more cells and code as you see fit.
Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.
+File-> Export Notebook As... -> Export Notebook as HTML
There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.
+Completing the code template and writeup template will cover all of the rubric points for this project.
+The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.
+Step 1: Create an account with Kaggle¶
+Create Kaggle Account and download API key¶
Below is example of steps to get the API username and key. Each student will have their own username and key.
+-
+
- Open account settings. + + +
- Scroll down to API and click Create New API Token. + + +
- Open up
kaggle.json
and use the username and key. +
+
Step 2: Download the Kaggle dataset using the kaggle python library¶
+Open up Sagemaker Studio and use starter template¶
+-
+
- Notebook should be using a
ml.t3.medium
instance (2 vCPU + 4 GiB)
+ - Notebook should be using kernal:
Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)
+
Install packages¶
+# !pip install -U pip
+# !pip install -U setuptools wheel
+# !pip install -U "mxnet<2.0.0" bokeh==2.0.1
+# !pip install autogluon --no-cache-dir
+# # Without --no-cache-dir, smaller aws instances may have trouble installing
+
Setup Kaggle API Key¶
+# # create the .kaggle directory and an empty kaggle.json file
+# ! sudo mkdir -p /root/.kaggle
+# ! sudo touch /root/.kaggle/kaggle.json
+# ! sudo chmod 600 /root/.kaggle/kaggle.json
+
# ! pip install kaggle
+
Collecting kaggle + Downloading kaggle-1.6.17.tar.gz (82 kB) + Preparing metadata (setup.py) ... done +Requirement already satisfied: six>=1.10 in /opt/conda/lib/python3.10/site-packages (from kaggle) (1.16.0) +Requirement already satisfied: certifi>=2023.7.22 in /opt/conda/lib/python3.10/site-packages (from kaggle) (2024.6.2) +Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.10/site-packages (from kaggle) (2.9.0) +Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from kaggle) (2.32.3) +Requirement already satisfied: tqdm in /opt/conda/lib/python3.10/site-packages (from kaggle) (4.66.4) +Requirement already satisfied: python-slugify in /opt/conda/lib/python3.10/site-packages (from kaggle) (8.0.4) +Requirement already satisfied: urllib3 in /opt/conda/lib/python3.10/site-packages (from kaggle) (1.26.19) +Requirement already satisfied: bleach in /opt/conda/lib/python3.10/site-packages (from kaggle) (6.1.0) +Requirement already satisfied: webencodings in /opt/conda/lib/python3.10/site-packages (from bleach->kaggle) (0.5.1) +Requirement already satisfied: text-unidecode>=1.3 in /opt/conda/lib/python3.10/site-packages (from python-slugify->kaggle) (1.3) +Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->kaggle) (3.3.2) +Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->kaggle) (3.7) +Building wheels for collected packages: kaggle + Building wheel for kaggle (setup.py) ... done + Created wheel for kaggle: filename=kaggle-1.6.17-py3-none-any.whl size=105786 sha256=ba1b6c14267e08a4cc0bc9040691da13e3524a488744e868c75fc27ee1e38a05 + Stored in directory: /home/sagemaker-user/.cache/pip/wheels/9f/af/22/bf406f913dc7506a485e60dce8143741abd0a92a19337d83a3 +Successfully built kaggle +Installing collected packages: kaggle +Successfully installed kaggle-1.6.17 ++
# # Fill in your user name and key from creating the kaggle account and API token file
+# import json
+# kaggle_username = "satyamchatrola"
+# kaggle_key = "vhvvv"
+
+# # Save API token the kaggle.json file
+# with open("~/.kaggle/kaggle.json", "w") as f:
+# f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))
+
Download and explore dataset¶
+Go to the bike sharing demand competition and agree to the terms¶
+# ! pwd
+
/home/sagemaker-user/cd0385-project-starter/project ++
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
+!kaggle competitions download -c bike-sharing-demand
+# If you already downloaded it you can use the -o command to overwrite the file
+!unzip -o bike-sharing-demand.zip
+
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/satyam/.kaggle/kaggle.json' +bike-sharing-demand.zip: Skipping, found more recently modified local copy (use --force to force download) +Archive: bike-sharing-demand.zip + inflating: sampleSubmission.csv + inflating: test.csv + inflating: train.csv ++
import pandas as pd
+from autogluon.tabular import TabularPredictor
+
# Create the train dataset in pandas by reading the csv
+# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
+train = pd.read_csv("./train.csv")
+train.head()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +casual | +registered | +count | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | +2011-01-01 00:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +81 | +0.0 | +3 | +13 | +16 | +
1 | +2011-01-01 01:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +8 | +32 | +40 | +
2 | +2011-01-01 02:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +5 | +27 | +32 | +
3 | +2011-01-01 03:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +3 | +10 | +13 | +
4 | +2011-01-01 04:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +0 | +1 | +1 | +
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
+train.describe()
+
+ | season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +casual | +registered | +count | +
---|---|---|---|---|---|---|---|---|---|---|---|
count | +10886.000000 | +10886.000000 | +10886.000000 | +10886.000000 | +10886.00000 | +10886.000000 | +10886.000000 | +10886.000000 | +10886.000000 | +10886.000000 | +10886.000000 | +
mean | +2.506614 | +0.028569 | +0.680875 | +1.418427 | +20.23086 | +23.655084 | +61.886460 | +12.799395 | +36.021955 | +155.552177 | +191.574132 | +
std | +1.116174 | +0.166599 | +0.466159 | +0.633839 | +7.79159 | +8.474601 | +19.245033 | +8.164537 | +49.960477 | +151.039033 | +181.144454 | +
min | +1.000000 | +0.000000 | +0.000000 | +1.000000 | +0.82000 | +0.760000 | +0.000000 | +0.000000 | +0.000000 | +0.000000 | +1.000000 | +
25% | +2.000000 | +0.000000 | +0.000000 | +1.000000 | +13.94000 | +16.665000 | +47.000000 | +7.001500 | +4.000000 | +36.000000 | +42.000000 | +
50% | +3.000000 | +0.000000 | +1.000000 | +1.000000 | +20.50000 | +24.240000 | +62.000000 | +12.998000 | +17.000000 | +118.000000 | +145.000000 | +
75% | +4.000000 | +0.000000 | +1.000000 | +2.000000 | +26.24000 | +31.060000 | +77.000000 | +16.997900 | +49.000000 | +222.000000 | +284.000000 | +
max | +4.000000 | +1.000000 | +1.000000 | +4.000000 | +41.00000 | +45.455000 | +100.000000 | +56.996900 | +367.000000 | +886.000000 | +977.000000 | +
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
+test = pd.read_csv("./test.csv")
+test.head()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +
---|---|---|---|---|---|---|---|---|---|
0 | +2011-01-20 00:00:00 | +1 | +0 | +1 | +1 | +10.66 | +11.365 | +56 | +26.0027 | +
1 | +2011-01-20 01:00:00 | +1 | +0 | +1 | +1 | +10.66 | +13.635 | +56 | +0.0000 | +
2 | +2011-01-20 02:00:00 | +1 | +0 | +1 | +1 | +10.66 | +13.635 | +56 | +0.0000 | +
3 | +2011-01-20 03:00:00 | +1 | +0 | +1 | +1 | +10.66 | +12.880 | +56 | +11.0014 | +
4 | +2011-01-20 04:00:00 | +1 | +0 | +1 | +1 | +10.66 | +12.880 | +56 | +11.0014 | +
# Same thing as train and test dataset
+submission = pd.read_csv("./sampleSubmission.csv")
+submission.head()
+
+ | datetime | +count | +
---|---|---|
0 | +2011-01-20 00:00:00 | +0 | +
1 | +2011-01-20 01:00:00 | +0 | +
2 | +2011-01-20 02:00:00 | +0 | +
3 | +2011-01-20 03:00:00 | +0 | +
4 | +2011-01-20 04:00:00 | +0 | +
Step 3: Train a model using AutoGluon’s Tabular Prediction¶
+Requirements:
+-
+
- We are prediting
count
, so it is the label we are setting.
+ - Ignore
casual
andregistered
columns as they are also not present in the test dataset.
+ - Use the
root_mean_squared_error
as the metric to use for evaluation.
+ - Set a time limit of 10 minutes (600 seconds). +
- Use the preset
best_quality
to focus on creating the best model.
+
remove_columns_list = ['casual', 'registered']
+col_names =[x for x in list(train.columns) if x not in remove_columns_list]
+col_names
+
['datetime', + 'season', + 'holiday', + 'workingday', + 'weather', + 'temp', + 'atemp', + 'humidity', + 'windspeed', + 'count']+
predictor = TabularPredictor(label="count",eval_metric="root_mean_squared_error").fit(train_data=train[col_names], time_limit=600, presets="best_quality")
+
No path specified. Models will be saved in: "AutogluonModels/ag-20240814_024634" +Verbosity: 2 (Standard Logging) +=================== System Info =================== +AutoGluon Version: 1.1.1 +Python Version: 3.9.19 +Operating System: Linux +Platform Machine: x86_64 +Platform Version: #1 SMP Wed Mar 2 00:30:59 UTC 2022 +CPU Count: 24 +Memory Avail: 12.37 GB / 14.49 GB (85.4%) +Disk Space Avail: 925.12 GB / 1006.85 GB (91.9%) +=================================================== +Presets specified: ['best_quality'] +Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False) +Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1 +DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence. + This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting. + Running DyStack for up to 150s of the 600s of remaining time (25%). + Running DyStack sub-fit in a ray process to avoid memory leakage. Enabling ray logging (enable_ray_logging=True). Specify `ds_args={'enable_ray_logging': False}` if you experience logging issues. +2024-08-13 22:46:39,915 INFO worker.py:1743 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 + Context path: "AutogluonModels/ag-20240814_024634/ds_sub_fit/sub_fit_ho" +(_dystack pid=19284) Running DyStack sub-fit ... +(_dystack pid=19284) Beginning AutoGluon training ... Time limit = 144s +(_dystack pid=19284) AutoGluon will save models to "AutogluonModels/ag-20240814_024634/ds_sub_fit/sub_fit_ho" +(_dystack pid=19284) Train Data Rows: 9676 +(_dystack pid=19284) Train Data Columns: 9 +(_dystack pid=19284) Label Column: count +(_dystack pid=19284) Problem Type: regression +(_dystack pid=19284) Preprocessing data ... +(_dystack pid=19284) Using Feature Generators to preprocess the data ... +(_dystack pid=19284) Fitting AutoMLPipelineFeatureGenerator... +(_dystack pid=19284) Available Memory: 10945.85 MB +(_dystack pid=19284) Train Data (Original) Memory Usage: 1.29 MB (0.0% of available memory) +(_dystack pid=19284) Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. +(_dystack pid=19284) Stage 1 Generators: +(_dystack pid=19284) Fitting AsTypeFeatureGenerator... +(_dystack pid=19284) Note: Converting 2 features to boolean dtype as they only contain 2 unique values. +(_dystack pid=19284) Stage 2 Generators: +(_dystack pid=19284) Fitting FillNaFeatureGenerator... +(_dystack pid=19284) Stage 3 Generators: +(_dystack pid=19284) Fitting IdentityFeatureGenerator... +(_dystack pid=19284) Fitting DatetimeFeatureGenerator... +(_dystack pid=19284) Stage 4 Generators: +(_dystack pid=19284) Fitting DropUniqueFeatureGenerator... +(_dystack pid=19284) Stage 5 Generators: +(_dystack pid=19284) Fitting DropDuplicatesFeatureGenerator... +(_dystack pid=19284) Types of features in original data (raw dtype, special dtypes): +(_dystack pid=19284) ('float', []) : 3 | ['temp', 'atemp', 'windspeed'] +(_dystack pid=19284) ('int', []) : 5 | ['season', 'holiday', 'workingday', 'weather', 'humidity'] +(_dystack pid=19284) ('object', ['datetime_as_object']) : 1 | ['datetime'] +(_dystack pid=19284) Types of features in processed data (raw dtype, special dtypes): +(_dystack pid=19284) ('float', []) : 3 | ['temp', 'atemp', 'windspeed'] +(_dystack pid=19284) ('int', []) : 3 | ['season', 'weather', 'humidity'] +(_dystack pid=19284) ('int', ['bool']) : 2 | ['holiday', 'workingday'] +(_dystack pid=19284) ('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek'] +(_dystack pid=19284) 0.1s = Fit runtime +(_dystack pid=19284) 9 features in original data used to generate 13 features in processed data. +(_dystack pid=19284) Train Data (Processed) Memory Usage: 0.83 MB (0.0% of available memory) +(_dystack pid=19284) Data preprocessing and feature engineering runtime = 0.07s ... +(_dystack pid=19284) AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error' +(_dystack pid=19284) This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value. +(_dystack pid=19284) To change this, specify the eval_metric parameter of Predictor() +(_dystack pid=19284) Large model count detected (112 configs) ... Only displaying the first 3 models of each family. To see all, set `verbosity=3`. +(_dystack pid=19284) User-specified model hyperparameters to be fit: +(_dystack pid=19284) { +(_dystack pid=19284) 'NN_TORCH': [{}, {'activation': 'elu', 'dropout_prob': 0.10077639529843717, 'hidden_size': 108, 'learning_rate': 0.002735937344002146, 'num_layers': 4, 'use_batchnorm': True, 'weight_decay': 1.356433327634438e-12, 'ag_args': {'name_suffix': '_r79', 'priority': -2}}, {'activation': 'elu', 'dropout_prob': 0.11897478034205347, 'hidden_size': 213, 'learning_rate': 0.0010474382260641949, 'num_layers': 4, 'use_batchnorm': False, 'weight_decay': 5.594471067786272e-10, 'ag_args': {'name_suffix': '_r22', 'priority': -7}}], +(_dystack pid=19284) 'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'], +(_dystack pid=19284) 'CAT': [{}, {'depth': 6, 'grow_policy': 'SymmetricTree', 'l2_leaf_reg': 2.1542798306067823, 'learning_rate': 0.06864209415792857, 'max_ctr_complexity': 4, 'one_hot_max_size': 10, 'ag_args': {'name_suffix': '_r177', 'priority': -1}}, {'depth': 8, 'grow_policy': 'Depthwise', 'l2_leaf_reg': 2.7997999596449104, 'learning_rate': 0.031375015734637225, 'max_ctr_complexity': 2, 'one_hot_max_size': 3, 'ag_args': {'name_suffix': '_r9', 'priority': -5}}], +(_dystack pid=19284) 'XGB': [{}, {'colsample_bytree': 0.6917311125174739, 'enable_categorical': False, 'learning_rate': 0.018063876087523967, 'max_depth': 10, 'min_child_weight': 0.6028633586934382, 'ag_args': {'name_suffix': '_r33', 'priority': -8}}, {'colsample_bytree': 0.6628423832084077, 'enable_categorical': False, 'learning_rate': 0.08775715546881824, 'max_depth': 5, 'min_child_weight': 0.6294123374222513, 'ag_args': {'name_suffix': '_r89', 'priority': -16}}], +(_dystack pid=19284) 'FASTAI': [{}, {'bs': 256, 'emb_drop': 0.5411770367537934, 'epochs': 43, 'layers': [800, 400], 'lr': 0.01519848858318159, 'ps': 0.23782946566604385, 'ag_args': {'name_suffix': '_r191', 'priority': -4}}, {'bs': 2048, 'emb_drop': 0.05070411322605811, 'epochs': 29, 'layers': [200, 100], 'lr': 0.08974235041576624, 'ps': 0.10393466140748028, 'ag_args': {'name_suffix': '_r102', 'priority': -11}}], +(_dystack pid=19284) 'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], +(_dystack pid=19284) 'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], +(_dystack pid=19284) 'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}], +(_dystack pid=19284) } +(_dystack pid=19284) AutoGluon will fit 2 stack levels (L1 to L2) ... +(_dystack pid=19284) Fitting 108 L1 models ... +(_dystack pid=19284) Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 95.73s of the 143.62s of remaining time. +(_dystack pid=19284) -107.445 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 0.02s = Training runtime +(_dystack pid=19284) 0.04s = Validation runtime +(_dystack pid=19284) Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 95.1s of the 143.0s of remaining time. +(_dystack pid=19284) -89.9469 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 0.02s = Training runtime +(_dystack pid=19284) 0.04s = Validation runtime +(_dystack pid=19284) Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 95.03s of the 142.92s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.07%) ++
(_ray_fit pid=20780) [1000] valid_set's rmse: 128.154 +(_ray_fit pid=20780) [7000] valid_set's rmse: 125.4 [repeated 29x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.) ++
(_dystack pid=19284) -131.9758 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 9.9s = Training runtime +(_dystack pid=19284) 1.86s = Validation runtime +(_dystack pid=19284) Fitting model: LightGBM_BAG_L1 ... Training model for up to 82.43s of the 130.33s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.08%) ++
(_ray_fit pid=21334) [1000] valid_set's rmse: 129.285 [repeated 8x across cluster] ++
(_dystack pid=19284) -131.8496 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 3.2s = Training runtime +(_dystack pid=19284) 0.4s = Validation runtime +(_dystack pid=19284) Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 76.52s of the 124.41s of remaining time. +(_dystack pid=19284) -119.5485 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 1.83s = Training runtime +(_dystack pid=19284) 0.58s = Validation runtime +(_dystack pid=19284) Fitting model: CatBoost_BAG_L1 ... Training model for up to 73.64s of the 121.54s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.10%) +(_dystack pid=19284) -131.5393 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 59.11s = Training runtime +(_dystack pid=19284) 0.04s = Validation runtime +(_dystack pid=19284) Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 12.44s of the 60.33s of remaining time. +(_dystack pid=19284) -126.0411 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 1.21s = Training runtime +(_dystack pid=19284) 0.6s = Validation runtime +(_dystack pid=19284) Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 9.72s of the 57.61s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.09%) +(_dystack pid=19284) -141.1826 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 9.99s = Training runtime +(_dystack pid=19284) 0.23s = Validation runtime +(_dystack pid=19284) Fitting model: WeightedEnsemble_L2 ... Training model for up to 143.63s of the 45.01s of remaining time. +(_dystack pid=19284) Ensemble Weights: {'KNeighborsDist_BAG_L1': 1.0} +(_dystack pid=19284) -89.9469 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 0.03s = Training runtime +(_dystack pid=19284) 0.0s = Validation runtime +(_dystack pid=19284) Fitting 106 L2 models ... +(_dystack pid=19284) Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 44.81s of the 44.77s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.11%) ++
(_ray_fit pid=23609) [1000] valid_set's rmse: 71.3602 [repeated 7x across cluster] ++
(_dystack pid=19284) -73.3631 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 4.96s = Training runtime +(_dystack pid=19284) 0.67s = Validation runtime +(_dystack pid=19284) Fitting model: LightGBM_BAG_L2 ... Training model for up to 37.71s of the 37.67s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.11%) +(_dystack pid=19284) -67.5547 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 1.78s = Training runtime +(_dystack pid=19284) 0.07s = Validation runtime +(_dystack pid=19284) Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 33.92s of the 33.88s of remaining time. +(_dystack pid=19284) -66.4069 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 4.77s = Training runtime +(_dystack pid=19284) 0.6s = Validation runtime +(_dystack pid=19284) Fitting model: CatBoost_BAG_L2 ... Training model for up to 28.02s of the 27.99s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.15%) +(_dystack pid=19284) -67.7881 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 10.15s = Training runtime +(_dystack pid=19284) 0.03s = Validation runtime +(_dystack pid=19284) Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 15.57s of the 15.53s of remaining time. +(_dystack pid=19284) -67.1308 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 1.5s = Training runtime +(_dystack pid=19284) 0.58s = Validation runtime +(_dystack pid=19284) Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 13.17s of the 13.13s of remaining time. +(_dystack pid=19284) Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.13%) +(_dystack pid=19284) -68.2231 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 12.84s = Training runtime +(_dystack pid=19284) 0.27s = Validation runtime +(_dystack pid=19284) Fitting model: WeightedEnsemble_L3 ... Training model for up to 143.63s of the -1.98s of remaining time. +(_dystack pid=19284) Ensemble Weights: {'RandomForestMSE_BAG_L2': 0.478, 'NeuralNetFastAI_BAG_L2': 0.304, 'ExtraTreesMSE_BAG_L2': 0.13, 'LightGBM_BAG_L2': 0.043, 'CatBoost_BAG_L2': 0.043} +(_dystack pid=19284) -65.3437 = Validation score (-root_mean_squared_error) +(_dystack pid=19284) 0.04s = Training runtime +(_dystack pid=19284) 0.0s = Validation runtime +(_dystack pid=19284) AutoGluon training complete, total runtime = 146.18s ... Best model: WeightedEnsemble_L3 | Estimated inference throughput: 376.9 rows/s (1210 batch size) +(_dystack pid=19284) TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240814_024634/ds_sub_fit/sub_fit_ho") +(_dystack pid=19284) Deleting DyStack predictor artifacts (clean_up_fits=True) ... +Leaderboard on holdout data (DyStack): + model score_holdout score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order +0 WeightedEnsemble_L3 -68.316221 -65.343654 root_mean_squared_error 7.199337 5.340708 116.342392 0.003961 0.000529 0.040190 3 True 16 +1 RandomForestMSE_BAG_L2 -69.113493 -66.406872 root_mean_squared_error 6.237178 4.394608 90.039309 0.316998 0.595419 4.773150 2 True 12 +2 LightGBM_BAG_L2 -69.854318 -67.554735 root_mean_squared_error 6.060917 3.869370 87.042305 0.140736 0.070181 1.776145 2 True 11 +3 CatBoost_BAG_L2 -70.113553 -67.788061 root_mean_squared_error 5.964736 3.826941 95.418850 0.044556 0.027752 10.152690 2 True 13 +4 ExtraTreesMSE_BAG_L2 -70.657365 -67.130750 root_mean_squared_error 6.217350 4.375010 86.763710 0.297169 0.575821 1.497550 2 True 14 +5 NeuralNetFastAI_BAG_L2 -70.982813 -68.223098 root_mean_squared_error 6.395917 4.071007 98.102666 0.475737 0.271817 12.836506 2 True 15 +6 LightGBMXT_BAG_L2 -72.312187 -73.363065 root_mean_squared_error 6.439293 4.465248 90.225748 0.519113 0.666059 4.959588 2 True 10 +7 KNeighborsDist_BAG_L1 -92.031272 -89.946854 root_mean_squared_error 0.023179 0.038167 0.015943 0.023179 0.038167 0.015943 1 True 2 +8 WeightedEnsemble_L2 -92.031272 -89.946854 root_mean_squared_error 0.024890 0.039210 0.045928 0.001711 0.001043 0.029986 2 True 9 +9 KNeighborsUnif_BAG_L1 -109.161488 -107.445008 root_mean_squared_error 0.032547 0.044175 0.016840 0.032547 0.044175 0.016840 1 True 1 +10 RandomForestMSE_BAG_L1 -118.495627 -119.548529 root_mean_squared_error 0.439944 0.584168 1.831626 0.439944 0.584168 1.831626 1 True 5 +11 ExtraTreesMSE_BAG_L1 -126.116332 -126.041120 root_mean_squared_error 0.427297 0.596470 1.206973 0.427297 0.596470 1.206973 1 True 7 +12 CatBoost_BAG_L1 -130.665239 -131.539289 root_mean_squared_error 0.443278 0.038366 59.109268 0.443278 0.038366 59.109268 1 True 6 +13 LightGBM_BAG_L1 -130.706758 -131.849580 root_mean_squared_error 0.464593 0.399042 3.203022 0.464593 0.399042 3.203022 1 True 4 +14 LightGBMXT_BAG_L1 -131.068282 -131.975832 root_mean_squared_error 1.459916 1.864125 9.895217 1.459916 1.864125 9.895217 1 True 3 +15 NeuralNetFastAI_BAG_L1 -138.126972 -141.182639 root_mean_squared_error 2.629427 0.234677 9.987271 2.629427 0.234677 9.987271 1 True 8 + 1 = Optimal num_stack_levels (Stacked Overfitting Occurred: False) + 162s = DyStack runtime | 438s = Remaining runtime +Starting main fit with num_stack_levels=1. + For future fit calls on this dataset, you can skip DyStack to save time: `predictor.fit(..., dynamic_stacking=False, num_stack_levels=1)` +Beginning AutoGluon training ... Time limit = 438s +AutoGluon will save models to "AutogluonModels/ag-20240814_024634" +Train Data Rows: 10886 +Train Data Columns: 9 +Label Column: count +Problem Type: regression +Preprocessing data ... +Using Feature Generators to preprocess the data ... +Fitting AutoMLPipelineFeatureGenerator... + Available Memory: 9929.96 MB + Train Data (Original) Memory Usage: 1.45 MB (0.0% of available memory) + Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. + Stage 1 Generators: + Fitting AsTypeFeatureGenerator... + Note: Converting 2 features to boolean dtype as they only contain 2 unique values. + Stage 2 Generators: + Fitting FillNaFeatureGenerator... + Stage 3 Generators: + Fitting IdentityFeatureGenerator... + Fitting DatetimeFeatureGenerator... + Stage 4 Generators: + Fitting DropUniqueFeatureGenerator... + Stage 5 Generators: + Fitting DropDuplicatesFeatureGenerator... + Types of features in original data (raw dtype, special dtypes): + ('float', []) : 3 | ['temp', 'atemp', 'windspeed'] + ('int', []) : 5 | ['season', 'holiday', 'workingday', 'weather', 'humidity'] + ('object', ['datetime_as_object']) : 1 | ['datetime'] + Types of features in processed data (raw dtype, special dtypes): + ('float', []) : 3 | ['temp', 'atemp', 'windspeed'] + ('int', []) : 3 | ['season', 'weather', 'humidity'] + ('int', ['bool']) : 2 | ['holiday', 'workingday'] + ('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek'] + 0.1s = Fit runtime + 9 features in original data used to generate 13 features in processed data. + Train Data (Processed) Memory Usage: 0.93 MB (0.0% of available memory) +Data preprocessing and feature engineering runtime = 0.13s ... +AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error' + This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value. + To change this, specify the eval_metric parameter of Predictor() +Large model count detected (112 configs) ... Only displaying the first 3 models of each family. To see all, set `verbosity=3`. +User-specified model hyperparameters to be fit: +{ + 'NN_TORCH': [{}, {'activation': 'elu', 'dropout_prob': 0.10077639529843717, 'hidden_size': 108, 'learning_rate': 0.002735937344002146, 'num_layers': 4, 'use_batchnorm': True, 'weight_decay': 1.356433327634438e-12, 'ag_args': {'name_suffix': '_r79', 'priority': -2}}, {'activation': 'elu', 'dropout_prob': 0.11897478034205347, 'hidden_size': 213, 'learning_rate': 0.0010474382260641949, 'num_layers': 4, 'use_batchnorm': False, 'weight_decay': 5.594471067786272e-10, 'ag_args': {'name_suffix': '_r22', 'priority': -7}}], + 'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'], + 'CAT': [{}, {'depth': 6, 'grow_policy': 'SymmetricTree', 'l2_leaf_reg': 2.1542798306067823, 'learning_rate': 0.06864209415792857, 'max_ctr_complexity': 4, 'one_hot_max_size': 10, 'ag_args': {'name_suffix': '_r177', 'priority': -1}}, {'depth': 8, 'grow_policy': 'Depthwise', 'l2_leaf_reg': 2.7997999596449104, 'learning_rate': 0.031375015734637225, 'max_ctr_complexity': 2, 'one_hot_max_size': 3, 'ag_args': {'name_suffix': '_r9', 'priority': -5}}], + 'XGB': [{}, {'colsample_bytree': 0.6917311125174739, 'enable_categorical': False, 'learning_rate': 0.018063876087523967, 'max_depth': 10, 'min_child_weight': 0.6028633586934382, 'ag_args': {'name_suffix': '_r33', 'priority': -8}}, {'colsample_bytree': 0.6628423832084077, 'enable_categorical': False, 'learning_rate': 0.08775715546881824, 'max_depth': 5, 'min_child_weight': 0.6294123374222513, 'ag_args': {'name_suffix': '_r89', 'priority': -16}}], + 'FASTAI': [{}, {'bs': 256, 'emb_drop': 0.5411770367537934, 'epochs': 43, 'layers': [800, 400], 'lr': 0.01519848858318159, 'ps': 0.23782946566604385, 'ag_args': {'name_suffix': '_r191', 'priority': -4}}, {'bs': 2048, 'emb_drop': 0.05070411322605811, 'epochs': 29, 'layers': [200, 100], 'lr': 0.08974235041576624, 'ps': 0.10393466140748028, 'ag_args': {'name_suffix': '_r102', 'priority': -11}}], + 'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], + 'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], + 'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}], +} +AutoGluon will fit 2 stack levels (L1 to L2) ... +Fitting 108 L1 models ... +Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 291.77s of the 437.75s of remaining time. + -101.5462 = Validation score (-root_mean_squared_error) + 0.04s = Training runtime + 0.04s = Validation runtime +Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 291.31s of the 437.29s of remaining time. + -84.1251 = Validation score (-root_mean_squared_error) + 0.03s = Training runtime + 0.05s = Validation runtime +Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 291.0s of the 436.97s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.08%) + -131.4609 = Validation score (-root_mean_squared_error) + 10.11s = Training runtime + 2.14s = Validation runtime +Fitting model: LightGBM_BAG_L1 ... Training model for up to 278.27s of the 424.24s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.08%) + -131.0542 = Validation score (-root_mean_squared_error) + 2.57s = Training runtime + 0.49s = Validation runtime +Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 273.57s of the 419.55s of remaining time. + -116.5484 = Validation score (-root_mean_squared_error) + 1.83s = Training runtime + 0.5s = Validation runtime +Fitting model: CatBoost_BAG_L1 ... Training model for up to 270.79s of the 416.77s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.10%) + -130.4612 = Validation score (-root_mean_squared_error) + 66.44s = Training runtime + 0.04s = Validation runtime +Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 202.41s of the 348.39s of remaining time. + -124.6007 = Validation score (-root_mean_squared_error) + 1.13s = Training runtime + 0.49s = Validation runtime +Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 199.78s of the 345.76s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.10%) + -136.0437 = Validation score (-root_mean_squared_error) + 23.41s = Training runtime + 0.28s = Validation runtime +Fitting model: XGBoost_BAG_L1 ... Training model for up to 174.24s of the 320.22s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.13%) + -131.8939 = Validation score (-root_mean_squared_error) + 3.33s = Training runtime + 0.17s = Validation runtime +Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 168.64s of the 314.62s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.06%) + -138.0613 = Validation score (-root_mean_squared_error) + 105.77s = Training runtime + 0.14s = Validation runtime +Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 60.8s of the 206.78s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.15%) + -130.1323 = Validation score (-root_mean_squared_error) + 3.33s = Training runtime + 0.29s = Validation runtime +Fitting model: CatBoost_r177_BAG_L1 ... Training model for up to 54.87s of the 200.84s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.10%) + -130.6722 = Validation score (-root_mean_squared_error) + 23.09s = Training runtime + 0.03s = Validation runtime +Fitting model: NeuralNetTorch_r79_BAG_L1 ... Training model for up to 29.43s of the 175.41s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.05%) + -142.2754 = Validation score (-root_mean_squared_error) + 25.46s = Training runtime + 0.17s = Validation runtime +Fitting model: LightGBM_r131_BAG_L1 ... Training model for up to 1.47s of the 147.45s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.12%) + -133.9805 = Validation score (-root_mean_squared_error) + 1.64s = Training runtime + 0.22s = Validation runtime +Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 143.13s of remaining time. + Ensemble Weights: {'KNeighborsDist_BAG_L1': 1.0} + -84.1251 = Validation score (-root_mean_squared_error) + 0.04s = Training runtime + 0.0s = Validation runtime +Fitting 106 L2 models ... +Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 143.06s of the 142.97s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.14%) + -60.3397 = Validation score (-root_mean_squared_error) + 7.21s = Training runtime + 1.28s = Validation runtime +Fitting model: LightGBM_BAG_L2 ... Training model for up to 133.35s of the 133.27s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.14%) + -55.0345 = Validation score (-root_mean_squared_error) + 1.91s = Training runtime + 0.1s = Validation runtime +Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 129.32s of the 129.24s of remaining time. + -53.3698 = Validation score (-root_mean_squared_error) + 6.08s = Training runtime + 0.59s = Validation runtime +Fitting model: CatBoost_BAG_L2 ... Training model for up to 122.2s of the 122.12s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.18%) + -55.6573 = Validation score (-root_mean_squared_error) + 19.22s = Training runtime + 0.03s = Validation runtime +Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 101.02s of the 100.94s of remaining time. + -54.1853 = Validation score (-root_mean_squared_error) + 1.74s = Training runtime + 0.55s = Validation runtime +Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 97.74s of the 97.65s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.16%) + -52.0421 = Validation score (-root_mean_squared_error) + 23.03s = Training runtime + 0.25s = Validation runtime +Fitting model: XGBoost_BAG_L2 ... Training model for up to 72.25s of the 72.17s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.24%) + -55.1093 = Validation score (-root_mean_squared_error) + 3.37s = Training runtime + 0.07s = Validation runtime +Fitting model: NeuralNetTorch_BAG_L2 ... Training model for up to 66.87s of the 66.78s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.09%) + -56.1523 = Validation score (-root_mean_squared_error) + 55.01s = Training runtime + 0.29s = Validation runtime +Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 9.75s of the 9.67s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.35%) + -55.0136 = Validation score (-root_mean_squared_error) + 4.95s = Training runtime + 0.13s = Validation runtime +Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 2.08s of remaining time. + Ensemble Weights: {'NeuralNetFastAI_BAG_L2': 0.545, 'RandomForestMSE_BAG_L2': 0.318, 'ExtraTreesMSE_BAG_L2': 0.091, 'NeuralNetTorch_BAG_L2': 0.045} + -50.753 = Validation score (-root_mean_squared_error) + 0.04s = Training runtime + 0.0s = Validation runtime +AutoGluon training complete, total runtime = 435.91s ... Best model: WeightedEnsemble_L3 | Estimated inference throughput: 284.8 rows/s (1361 batch size) +TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240814_024634") ++
Review AutoGluon's training run with ranking of models that did the best.¶
+predictor.fit_summary()
+
*** Summary of fit() *** +Estimated performance of each model: + model score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order +0 WeightedEnsemble_L3 -50.753029 root_mean_squared_error 6.727206 354.093305 0.000737 0.044316 3 True 25 +1 NeuralNetFastAI_BAG_L2 -52.042138 root_mean_squared_error 5.296108 291.225050 0.245111 23.032903 2 True 21 +2 RandomForestMSE_BAG_L2 -53.369760 root_mean_squared_error 5.636729 274.271327 0.585732 6.079179 2 True 18 +3 ExtraTreesMSE_BAG_L2 -54.185253 root_mean_squared_error 5.604352 269.929027 0.553355 1.736880 2 True 20 +4 LightGBMLarge_BAG_L2 -55.013578 root_mean_squared_error 5.177926 273.146889 0.126929 4.954742 2 True 24 +5 LightGBM_BAG_L2 -55.034493 root_mean_squared_error 5.149973 270.097554 0.098976 1.905407 2 True 17 +6 XGBoost_BAG_L2 -55.109268 root_mean_squared_error 5.118389 271.565089 0.067392 3.372942 2 True 22 +7 CatBoost_BAG_L2 -55.657326 root_mean_squared_error 5.085876 287.407862 0.034879 19.215714 2 True 19 +8 NeuralNetTorch_BAG_L2 -56.152349 root_mean_squared_error 5.342269 323.200027 0.291272 55.007880 2 True 23 +9 LightGBMXT_BAG_L2 -60.339739 root_mean_squared_error 6.331887 275.407073 1.280890 7.214926 2 True 16 +10 KNeighborsDist_BAG_L1 -84.125061 root_mean_squared_error 0.053670 0.029449 0.053670 0.029449 1 True 2 +11 WeightedEnsemble_L2 -84.125061 root_mean_squared_error 0.054513 0.073837 0.000843 0.044388 2 True 15 +12 KNeighborsUnif_BAG_L1 -101.546199 root_mean_squared_error 0.043571 0.039343 0.043571 0.039343 1 True 1 +13 RandomForestMSE_BAG_L1 -116.548359 root_mean_squared_error 0.500272 1.825917 0.500272 1.825917 1 True 5 +14 ExtraTreesMSE_BAG_L1 -124.600676 root_mean_squared_error 0.490283 1.130970 0.490283 1.130970 1 True 7 +15 LightGBMLarge_BAG_L1 -130.132290 root_mean_squared_error 0.286907 3.330837 0.286907 3.330837 1 True 11 +16 CatBoost_BAG_L1 -130.461205 root_mean_squared_error 0.039545 66.443848 0.039545 66.443848 1 True 6 +17 CatBoost_r177_BAG_L1 -130.672167 root_mean_squared_error 0.032152 23.090498 0.032152 23.090498 1 True 12 +18 LightGBM_BAG_L1 -131.054162 root_mean_squared_error 0.490388 2.572617 0.490388 2.572617 1 True 4 +19 LightGBMXT_BAG_L1 -131.460909 root_mean_squared_error 2.140335 10.106941 2.140335 10.106941 1 True 3 +20 XGBoost_BAG_L1 -131.893935 root_mean_squared_error 0.171286 3.333137 0.171286 3.333137 1 True 9 +21 LightGBM_r131_BAG_L1 -133.980455 root_mean_squared_error 0.220261 1.643972 0.220261 1.643972 1 True 14 +22 NeuralNetFastAI_BAG_L1 -136.043719 root_mean_squared_error 0.275932 23.409816 0.275932 23.409816 1 True 8 +23 NeuralNetTorch_BAG_L1 -138.061323 root_mean_squared_error 0.141394 105.773959 0.141394 105.773959 1 True 10 +24 NeuralNetTorch_r79_BAG_L1 -142.275442 root_mean_squared_error 0.165001 25.460844 0.165001 25.460844 1 True 13 +Number of models trained: 25 +Types of models trained: +{'WeightedEnsembleModel', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_XGBoost', 'StackerEnsembleModel_TabularNeuralNetTorch'} +Bagging used: True (with 8 folds) +Multi-layer stack-ensembling used: True (with 3 levels) +Feature Metadata (Processed): +(raw dtype, special dtypes): +('float', []) : 3 | ['temp', 'atemp', 'windspeed'] +('int', []) : 3 | ['season', 'weather', 'humidity'] +('int', ['bool']) : 2 | ['holiday', 'workingday'] +('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek'] +*** End of fit() summary *** ++
/home/satyam/miniforge3/envs/udacity/lib/python3.9/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1" + warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"') ++
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN', + 'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN', + 'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB', + 'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF', + 'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost', + 'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT', + 'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular', + 'XGBoost_BAG_L1': 'StackerEnsembleModel_XGBoost', + 'NeuralNetTorch_BAG_L1': 'StackerEnsembleModel_TabularNeuralNetTorch', + 'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB', + 'CatBoost_r177_BAG_L1': 'StackerEnsembleModel_CatBoost', + 'NeuralNetTorch_r79_BAG_L1': 'StackerEnsembleModel_TabularNeuralNetTorch', + 'LightGBM_r131_BAG_L1': 'StackerEnsembleModel_LGB', + 'WeightedEnsemble_L2': 'WeightedEnsembleModel', + 'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB', + 'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF', + 'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost', + 'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT', + 'NeuralNetFastAI_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular', + 'XGBoost_BAG_L2': 'StackerEnsembleModel_XGBoost', + 'NeuralNetTorch_BAG_L2': 'StackerEnsembleModel_TabularNeuralNetTorch', + 'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB', + 'WeightedEnsemble_L3': 'WeightedEnsembleModel'}, + 'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061, + 'KNeighborsDist_BAG_L1': -84.12506123181602, + 'LightGBMXT_BAG_L1': -131.46090891834504, + 'LightGBM_BAG_L1': -131.054161598899, + 'RandomForestMSE_BAG_L1': -116.54835939455667, + 'CatBoost_BAG_L1': -130.46120460893414, + 'ExtraTreesMSE_BAG_L1': -124.60067564699747, + 'NeuralNetFastAI_BAG_L1': -136.04371928250706, + 'XGBoost_BAG_L1': -131.89393473529245, + 'NeuralNetTorch_BAG_L1': -138.061322960825, + 'LightGBMLarge_BAG_L1': -130.13228993716103, + 'CatBoost_r177_BAG_L1': -130.67216705296767, + 'NeuralNetTorch_r79_BAG_L1': -142.27544213448772, + 'LightGBM_r131_BAG_L1': -133.980455081454, + 'WeightedEnsemble_L2': -84.12506123181602, + 'LightGBMXT_BAG_L2': -60.3397388900449, + 'LightGBM_BAG_L2': -55.034493029059085, + 'RandomForestMSE_BAG_L2': -53.369760461985976, + 'CatBoost_BAG_L2': -55.65732622248002, + 'ExtraTreesMSE_BAG_L2': -54.185253383959605, + 'NeuralNetFastAI_BAG_L2': -52.04213776005317, + 'XGBoost_BAG_L2': -55.10926826253059, + 'NeuralNetTorch_BAG_L2': -56.15234881063657, + 'LightGBMLarge_BAG_L2': -55.01357827045337, + 'WeightedEnsemble_L3': -50.75302856385204}, + 'model_best': 'WeightedEnsemble_L3', + 'model_paths': {'KNeighborsUnif_BAG_L1': ['KNeighborsUnif_BAG_L1'], + 'KNeighborsDist_BAG_L1': ['KNeighborsDist_BAG_L1'], + 'LightGBMXT_BAG_L1': ['LightGBMXT_BAG_L1'], + 'LightGBM_BAG_L1': ['LightGBM_BAG_L1'], + 'RandomForestMSE_BAG_L1': ['RandomForestMSE_BAG_L1'], + 'CatBoost_BAG_L1': ['CatBoost_BAG_L1'], + 'ExtraTreesMSE_BAG_L1': ['ExtraTreesMSE_BAG_L1'], + 'NeuralNetFastAI_BAG_L1': ['NeuralNetFastAI_BAG_L1'], + 'XGBoost_BAG_L1': ['XGBoost_BAG_L1'], + 'NeuralNetTorch_BAG_L1': ['NeuralNetTorch_BAG_L1'], + 'LightGBMLarge_BAG_L1': ['LightGBMLarge_BAG_L1'], + 'CatBoost_r177_BAG_L1': ['CatBoost_r177_BAG_L1'], + 'NeuralNetTorch_r79_BAG_L1': ['NeuralNetTorch_r79_BAG_L1'], + 'LightGBM_r131_BAG_L1': ['LightGBM_r131_BAG_L1'], + 'WeightedEnsemble_L2': ['WeightedEnsemble_L2'], + 'LightGBMXT_BAG_L2': ['LightGBMXT_BAG_L2'], + 'LightGBM_BAG_L2': ['LightGBM_BAG_L2'], + 'RandomForestMSE_BAG_L2': ['RandomForestMSE_BAG_L2'], + 'CatBoost_BAG_L2': ['CatBoost_BAG_L2'], + 'ExtraTreesMSE_BAG_L2': ['ExtraTreesMSE_BAG_L2'], + 'NeuralNetFastAI_BAG_L2': ['NeuralNetFastAI_BAG_L2'], + 'XGBoost_BAG_L2': ['XGBoost_BAG_L2'], + 'NeuralNetTorch_BAG_L2': ['NeuralNetTorch_BAG_L2'], + 'LightGBMLarge_BAG_L2': ['LightGBMLarge_BAG_L2'], + 'WeightedEnsemble_L3': ['WeightedEnsemble_L3']}, + 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.03934335708618164, + 'KNeighborsDist_BAG_L1': 0.0294492244720459, + 'LightGBMXT_BAG_L1': 10.106940507888794, + 'LightGBM_BAG_L1': 2.5726170539855957, + 'RandomForestMSE_BAG_L1': 1.8259170055389404, + 'CatBoost_BAG_L1': 66.44384765625, + 'ExtraTreesMSE_BAG_L1': 1.130969524383545, + 'NeuralNetFastAI_BAG_L1': 23.409815549850464, + 'XGBoost_BAG_L1': 3.333136796951294, + 'NeuralNetTorch_BAG_L1': 105.77395939826965, + 'LightGBMLarge_BAG_L1': 3.3308372497558594, + 'CatBoost_r177_BAG_L1': 23.090498208999634, + 'NeuralNetTorch_r79_BAG_L1': 25.460843563079834, + 'LightGBM_r131_BAG_L1': 1.6439721584320068, + 'WeightedEnsemble_L2': 0.0443878173828125, + 'LightGBMXT_BAG_L2': 7.214926242828369, + 'LightGBM_BAG_L2': 1.9054067134857178, + 'RandomForestMSE_BAG_L2': 6.079179286956787, + 'CatBoost_BAG_L2': 19.21571445465088, + 'ExtraTreesMSE_BAG_L2': 1.736879587173462, + 'NeuralNetFastAI_BAG_L2': 23.03290319442749, + 'XGBoost_BAG_L2': 3.372941732406616, + 'NeuralNetTorch_BAG_L2': 55.007880210876465, + 'LightGBMLarge_BAG_L2': 4.954741716384888, + 'WeightedEnsemble_L3': 0.04431557655334473}, + 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.04357123374938965, + 'KNeighborsDist_BAG_L1': 0.053670406341552734, + 'LightGBMXT_BAG_L1': 2.1403350830078125, + 'LightGBM_BAG_L1': 0.4903883934020996, + 'RandomForestMSE_BAG_L1': 0.5002720355987549, + 'CatBoost_BAG_L1': 0.03954505920410156, + 'ExtraTreesMSE_BAG_L1': 0.4902825355529785, + 'NeuralNetFastAI_BAG_L1': 0.27593159675598145, + 'XGBoost_BAG_L1': 0.17128562927246094, + 'NeuralNetTorch_BAG_L1': 0.14139389991760254, + 'LightGBMLarge_BAG_L1': 0.2869071960449219, + 'CatBoost_r177_BAG_L1': 0.032152414321899414, + 'NeuralNetTorch_r79_BAG_L1': 0.16500067710876465, + 'LightGBM_r131_BAG_L1': 0.2202608585357666, + 'WeightedEnsemble_L2': 0.0008425712585449219, + 'LightGBMXT_BAG_L2': 1.2808897495269775, + 'LightGBM_BAG_L2': 0.09897613525390625, + 'RandomForestMSE_BAG_L2': 0.5857324600219727, + 'CatBoost_BAG_L2': 0.03487873077392578, + 'ExtraTreesMSE_BAG_L2': 0.5533554553985596, + 'NeuralNetFastAI_BAG_L2': 0.24511146545410156, + 'XGBoost_BAG_L2': 0.06739234924316406, + 'NeuralNetTorch_BAG_L2': 0.2912724018096924, + 'LightGBMLarge_BAG_L2': 0.12692904472351074, + 'WeightedEnsemble_L3': 0.0007369518280029297}, + 'num_bag_folds': 8, + 'max_stack_level': 3, + 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'KNeighborsDist_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'LightGBMXT_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'RandomForestMSE_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'CatBoost_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'ExtraTreesMSE_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'NeuralNetFastAI_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetTorch_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMLarge_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_r177_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetTorch_r79_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_r131_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'WeightedEnsemble_L2': {'use_orig_features': False, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'RandomForestMSE_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'CatBoost_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'ExtraTreesMSE_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'NeuralNetFastAI_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetTorch_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMLarge_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'WeightedEnsemble_L3': {'use_orig_features': False, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}}, + 'leaderboard': model score_val eval_metric \ + 0 WeightedEnsemble_L3 -50.753029 root_mean_squared_error + 1 NeuralNetFastAI_BAG_L2 -52.042138 root_mean_squared_error + 2 RandomForestMSE_BAG_L2 -53.369760 root_mean_squared_error + 3 ExtraTreesMSE_BAG_L2 -54.185253 root_mean_squared_error + 4 LightGBMLarge_BAG_L2 -55.013578 root_mean_squared_error + 5 LightGBM_BAG_L2 -55.034493 root_mean_squared_error + 6 XGBoost_BAG_L2 -55.109268 root_mean_squared_error + 7 CatBoost_BAG_L2 -55.657326 root_mean_squared_error + 8 NeuralNetTorch_BAG_L2 -56.152349 root_mean_squared_error + 9 LightGBMXT_BAG_L2 -60.339739 root_mean_squared_error + 10 KNeighborsDist_BAG_L1 -84.125061 root_mean_squared_error + 11 WeightedEnsemble_L2 -84.125061 root_mean_squared_error + 12 KNeighborsUnif_BAG_L1 -101.546199 root_mean_squared_error + 13 RandomForestMSE_BAG_L1 -116.548359 root_mean_squared_error + 14 ExtraTreesMSE_BAG_L1 -124.600676 root_mean_squared_error + 15 LightGBMLarge_BAG_L1 -130.132290 root_mean_squared_error + 16 CatBoost_BAG_L1 -130.461205 root_mean_squared_error + 17 CatBoost_r177_BAG_L1 -130.672167 root_mean_squared_error + 18 LightGBM_BAG_L1 -131.054162 root_mean_squared_error + 19 LightGBMXT_BAG_L1 -131.460909 root_mean_squared_error + 20 XGBoost_BAG_L1 -131.893935 root_mean_squared_error + 21 LightGBM_r131_BAG_L1 -133.980455 root_mean_squared_error + 22 NeuralNetFastAI_BAG_L1 -136.043719 root_mean_squared_error + 23 NeuralNetTorch_BAG_L1 -138.061323 root_mean_squared_error + 24 NeuralNetTorch_r79_BAG_L1 -142.275442 root_mean_squared_error + + pred_time_val fit_time pred_time_val_marginal fit_time_marginal \ + 0 6.727206 354.093305 0.000737 0.044316 + 1 5.296108 291.225050 0.245111 23.032903 + 2 5.636729 274.271327 0.585732 6.079179 + 3 5.604352 269.929027 0.553355 1.736880 + 4 5.177926 273.146889 0.126929 4.954742 + 5 5.149973 270.097554 0.098976 1.905407 + 6 5.118389 271.565089 0.067392 3.372942 + 7 5.085876 287.407862 0.034879 19.215714 + 8 5.342269 323.200027 0.291272 55.007880 + 9 6.331887 275.407073 1.280890 7.214926 + 10 0.053670 0.029449 0.053670 0.029449 + 11 0.054513 0.073837 0.000843 0.044388 + 12 0.043571 0.039343 0.043571 0.039343 + 13 0.500272 1.825917 0.500272 1.825917 + 14 0.490283 1.130970 0.490283 1.130970 + 15 0.286907 3.330837 0.286907 3.330837 + 16 0.039545 66.443848 0.039545 66.443848 + 17 0.032152 23.090498 0.032152 23.090498 + 18 0.490388 2.572617 0.490388 2.572617 + 19 2.140335 10.106941 2.140335 10.106941 + 20 0.171286 3.333137 0.171286 3.333137 + 21 0.220261 1.643972 0.220261 1.643972 + 22 0.275932 23.409816 0.275932 23.409816 + 23 0.141394 105.773959 0.141394 105.773959 + 24 0.165001 25.460844 0.165001 25.460844 + + stack_level can_infer fit_order + 0 3 True 25 + 1 2 True 21 + 2 2 True 18 + 3 2 True 20 + 4 2 True 24 + 5 2 True 17 + 6 2 True 22 + 7 2 True 19 + 8 2 True 23 + 9 2 True 16 + 10 1 True 2 + 11 2 True 15 + 12 1 True 1 + 13 1 True 5 + 14 1 True 7 + 15 1 True 11 + 16 1 True 6 + 17 1 True 12 + 18 1 True 4 + 19 1 True 3 + 20 1 True 9 + 21 1 True 14 + 22 1 True 8 + 23 1 True 10 + 24 1 True 13 }+
Create predictions from test dataset¶
+predictions = predictor.predict(test)
+predictions.head()
+
0 24.994209 +1 39.850437 +2 44.501244 +3 47.644463 +4 50.085098 +Name: count, dtype: float32+
NOTE: Kaggle will reject the submission if we don't set everything to be > 0.¶
+# Describe the `predictions` series to see if there are any negative values
+predictions.describe()
+
count 6493.000000 +mean 100.092888 +std 88.714645 +min 1.866605 +25% 21.703074 +50% 67.577965 +75% 167.202393 +max 353.822418 +Name: count, dtype: float64+
# How many negative values do we have?
+print(f"Number of negative values in the predictions: {predictions[predictions<0].any().sum()}")
+
Number of negative values in the predictions: 0 ++
# Set them to zero
+predictions[predictions<0] = 0
+
Set predictions to submission dataframe, save, and submit¶
+submission["count"] = predictions
+submission.to_csv("submission.csv", index=False)
+
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"
+
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/satyam/.kaggle/kaggle.json' +100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 480kB/s] +Successfully submitted to Bike Sharing Demand+
View submission via the command line or in the web browser under the competition's page - My Submissions
¶
+!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
+
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/satyam/.kaggle/kaggle.json' +fileName date description status publicScore privateScore +--------------------------- ------------------- --------------------------------- -------- ----------- ------------ +submission.csv 2024-08-14 03:03:28 first raw submission complete 1.79357 1.79357 +submission_new_hpo.csv 2024-06-13 05:19:39 new features with hyperparameters complete 0.48667 0.48667 +submission_new_features.csv 2024-06-13 04:47:33 new features complete 0.66136 0.66136 ++
Initial score of 1.79357
¶
+Step 4: Exploratory Data Analysis and Creating an additional feature¶
-
+
- Any additional feature will do, but a great suggestion would be to separate out the datetime into hour, day, or month parts. +
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
+train.hist()
+
array([[<Axes: title={'center': 'season'}>, + <Axes: title={'center': 'holiday'}>, + <Axes: title={'center': 'workingday'}>], + [<Axes: title={'center': 'weather'}>, + <Axes: title={'center': 'temp'}>, + <Axes: title={'center': 'atemp'}>], + [<Axes: title={'center': 'humidity'}>, + <Axes: title={'center': 'windspeed'}>, + <Axes: title={'center': 'casual'}>], + [<Axes: title={'center': 'registered'}>, + <Axes: title={'center': 'count'}>, <Axes: >]], dtype=object)+
train.head()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +casual | +registered | +count | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | +2011-01-01 00:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +81 | +0.0 | +3 | +13 | +16 | +
1 | +2011-01-01 01:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +8 | +32 | +40 | +
2 | +2011-01-01 02:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +5 | +27 | +32 | +
3 | +2011-01-01 03:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +3 | +10 | +13 | +
4 | +2011-01-01 04:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +0 | +1 | +1 | +
train.tail()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +casual | +registered | +count | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
10881 | +2012-12-19 19:00:00 | +4 | +0 | +1 | +1 | +15.58 | +19.695 | +50 | +26.0027 | +7 | +329 | +336 | +
10882 | +2012-12-19 20:00:00 | +4 | +0 | +1 | +1 | +14.76 | +17.425 | +57 | +15.0013 | +10 | +231 | +241 | +
10883 | +2012-12-19 21:00:00 | +4 | +0 | +1 | +1 | +13.94 | +15.910 | +61 | +15.0013 | +4 | +164 | +168 | +
10884 | +2012-12-19 22:00:00 | +4 | +0 | +1 | +1 | +13.94 | +17.425 | +61 | +6.0032 | +12 | +117 | +129 | +
10885 | +2012-12-19 23:00:00 | +4 | +0 | +1 | +1 | +13.12 | +16.665 | +66 | +8.9981 | +4 | +84 | +88 | +
train["datetime"] = pd.to_datetime(train['datetime'])
+test["datetime"] = pd.to_datetime(test['datetime'])
+
train['datetime'].dt.dayofweek
+
0 5 +1 5 +2 5 +3 5 +4 5 + .. +10881 2 +10882 2 +10883 2 +10884 2 +10885 2 +Name: datetime, Length: 10886, dtype: int32+
train_org = train.copy()
+
train.head()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +casual | +registered | +count | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | +2011-01-01 00:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +81 | +0.0 | +3 | +13 | +16 | +
1 | +2011-01-01 01:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +8 | +32 | +40 | +
2 | +2011-01-01 02:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +5 | +27 | +32 | +
3 | +2011-01-01 03:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +3 | +10 | +13 | +
4 | +2011-01-01 04:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +0 | +1 | +1 | +
# create a new feature
+train["year"] = train['datetime'].dt.year
+train["month"] = train['datetime'].dt.month
+train["hour"] = train['datetime'].dt.hour
+train["day"] = train['datetime'].dt.dayofweek
+
test["year"] = test['datetime'].dt.year
+test["month"] = test['datetime'].dt.month
+test["hour"] = test['datetime'].dt.hour
+test["day"] = test['datetime'].dt.dayofweek
+
Make category types for these so models know they are not just numbers¶
-
+
- AutoGluon originally sees these as ints, but in reality they are int representations of a category. +
- Setting the dtype to category will classify these as categories in AutoGluon. +
train["season"] = train["season"].astype("category")
+train["weather"] = train["weather"].astype("category")
+test["season"] = test["season"].astype("category")
+test["weather"] = test["weather"].astype("category")
+
# View are new feature
+train.head()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +casual | +registered | +count | +year | +month | +hour | +day | +
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | +2011-01-01 00:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +81 | +0.0 | +3 | +13 | +16 | +2011 | +1 | +0 | +5 | +
1 | +2011-01-01 01:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +8 | +32 | +40 | +2011 | +1 | +1 | +5 | +
2 | +2011-01-01 02:00:00 | +1 | +0 | +0 | +1 | +9.02 | +13.635 | +80 | +0.0 | +5 | +27 | +32 | +2011 | +1 | +2 | +5 | +
3 | +2011-01-01 03:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +3 | +10 | +13 | +2011 | +1 | +3 | +5 | +
4 | +2011-01-01 04:00:00 | +1 | +0 | +0 | +1 | +9.84 | +14.395 | +75 | +0.0 | +0 | +1 | +1 | +2011 | +1 | +4 | +5 | +
test.head()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +year | +month | +hour | +day | +
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | +2011-01-20 00:00:00 | +1 | +0 | +1 | +1 | +10.66 | +11.365 | +56 | +26.0027 | +2011 | +1 | +0 | +3 | +
1 | +2011-01-20 01:00:00 | +1 | +0 | +1 | +1 | +10.66 | +13.635 | +56 | +0.0000 | +2011 | +1 | +1 | +3 | +
2 | +2011-01-20 02:00:00 | +1 | +0 | +1 | +1 | +10.66 | +13.635 | +56 | +0.0000 | +2011 | +1 | +2 | +3 | +
3 | +2011-01-20 03:00:00 | +1 | +0 | +1 | +1 | +10.66 | +12.880 | +56 | +11.0014 | +2011 | +1 | +3 | +3 | +
4 | +2011-01-20 04:00:00 | +1 | +0 | +1 | +1 | +10.66 | +12.880 | +56 | +11.0014 | +2011 | +1 | +4 | +3 | +
# View histogram of all features again now with the hour feature
+train.hist()
+
array([[<Axes: title={'center': 'datetime'}>, + <Axes: title={'center': 'holiday'}>, + <Axes: title={'center': 'workingday'}>, + <Axes: title={'center': 'temp'}>], + [<Axes: title={'center': 'atemp'}>, + <Axes: title={'center': 'humidity'}>, + <Axes: title={'center': 'windspeed'}>, + <Axes: title={'center': 'casual'}>], + [<Axes: title={'center': 'registered'}>, + <Axes: title={'center': 'count'}>, + <Axes: title={'center': 'year'}>, + <Axes: title={'center': 'month'}>], + [<Axes: title={'center': 'hour'}>, + <Axes: title={'center': 'day'}>, <Axes: >, <Axes: >]], + dtype=object)+
Step 5: Rerun the model with the same settings as before, just with more features¶
+remove_columns_list = ['casual', 'registered']
+col_names =[x for x in list(train.columns) if x not in remove_columns_list]
+col_names
+
['datetime', + 'season', + 'holiday', + 'workingday', + 'weather', + 'temp', + 'atemp', + 'humidity', + 'windspeed', + 'count', + 'year', + 'month', + 'hour', + 'day']+
predictor_new_features = TabularPredictor(label="count", eval_metric="root_mean_squared_error").fit(train_data=train[col_names], time_limit=600, presets="best_quality")
+
No path specified. Models will be saved in: "AutogluonModels/ag-20240814_030610" +Verbosity: 2 (Standard Logging) +=================== System Info =================== +AutoGluon Version: 1.1.1 +Python Version: 3.9.19 +Operating System: Linux +Platform Machine: x86_64 +Platform Version: #1 SMP Wed Mar 2 00:30:59 UTC 2022 +CPU Count: 24 +Memory Avail: 9.25 GB / 14.49 GB (63.9%) +Disk Space Avail: 923.64 GB / 1006.85 GB (91.7%) +=================================================== +Presets specified: ['best_quality'] +Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False) +Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1 +DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence. + This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting. + Running DyStack for up to 150s of the 600s of remaining time (25%). + Context path: "AutogluonModels/ag-20240814_030610/ds_sub_fit/sub_fit_ho" +Verbosity: 2 (Standard Logging) +=================== System Info =================== +AutoGluon Version: 1.1.1 +Python Version: 3.9.19 +Operating System: Linux +Platform Machine: x86_64 +Platform Version: #1 SMP Wed Mar 2 00:30:59 UTC 2022 +CPU Count: 24 +Memory Avail: 9.25 GB / 14.49 GB (63.9%) +Disk Space Avail: 923.64 GB / 1006.85 GB (91.7%) +=================================================== +Presets specified: ['best_quality'] +Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False) +Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1 +DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence. + This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting. + Running DyStack for up to 150s of the 600s of remaining time (25%). + Context path: "AutogluonModels/ag-20240814_030610/ds_sub_fit/sub_fit_ho" +Leaderboard on holdout data (DyStack): + model score_holdout score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order +0 NeuralNetFastAI_BAG_L2 -30.692622 -31.634614 root_mean_squared_error 3.260064 4.006587 106.019453 0.145866 0.155403 12.322259 2 True 16 +1 WeightedEnsemble_L3 -30.718800 -31.325792 root_mean_squared_error 3.568474 4.436383 113.478400 0.003203 0.000381 0.020360 3 True 20 +2 ExtraTreesMSE_BAG_L2 -31.525353 -32.378755 root_mean_squared_error 3.280524 4.132840 94.545854 0.166327 0.281656 0.848660 2 True 15 +3 LightGBMLarge_BAG_L2 -31.546202 -33.053526 root_mean_squared_error 3.251405 3.917233 95.288972 0.137207 0.066048 1.591777 2 True 19 +4 RandomForestMSE_BAG_L2 -31.588065 -32.857741 root_mean_squared_error 3.270143 4.138223 96.578603 0.155945 0.287039 2.881408 2 True 13 +5 LightGBM_BAG_L2 -31.619243 -32.346399 root_mean_squared_error 3.155847 3.890092 94.563380 0.041649 0.038908 0.866185 2 True 12 +6 LightGBMXT_BAG_L2 -31.668759 -32.519268 root_mean_squared_error 3.191010 3.932434 94.727686 0.076813 0.081250 1.030491 2 True 11 +7 CatBoost_BAG_L2 -31.758722 -32.188851 root_mean_squared_error 3.134617 3.878786 98.390445 0.020419 0.027602 4.693250 2 True 14 +8 CatBoost_BAG_L1 -31.881858 -33.923209 root_mean_squared_error 0.299222 0.079131 69.168016 0.299222 0.079131 69.168016 1 True 6 +9 XGBoost_BAG_L2 -31.923653 -32.912451 root_mean_squared_error 3.263084 3.910621 94.967741 0.148886 0.059437 1.270546 2 True 17 +10 NeuralNetTorch_BAG_L2 -32.249224 -33.081794 root_mean_squared_error 3.217201 3.976349 109.437403 0.103003 0.125165 15.740209 2 True 18 +11 WeightedEnsemble_L2 -32.359143 -32.605814 root_mean_squared_error 1.796992 3.343746 81.121628 0.001777 0.000382 0.010350 2 True 10 +12 XGBoost_BAG_L1 -33.530286 -35.594788 root_mean_squared_error 0.224610 0.196381 1.575050 0.224610 0.196381 1.575050 1 True 9 +13 LightGBM_BAG_L1 -33.612440 -34.314620 root_mean_squared_error 0.238078 0.404305 2.073540 0.238078 0.404305 2.073540 1 True 4 +14 LightGBMXT_BAG_L1 -34.006355 -35.151548 root_mean_squared_error 0.772460 2.389825 7.335979 0.772460 2.389825 7.335979 1 True 3 +15 ExtraTreesMSE_BAG_L1 -37.776381 -39.306296 root_mean_squared_error 0.259632 0.287177 0.756530 0.259632 0.287177 0.756530 1 True 7 +16 RandomForestMSE_BAG_L1 -40.528990 -39.018035 root_mean_squared_error 0.260845 0.273722 0.958693 0.260845 0.273722 0.958693 1 True 5 +17 NeuralNetFastAI_BAG_L1 -45.814508 -48.382243 root_mean_squared_error 1.024690 0.156728 11.813962 1.024690 0.156728 11.813962 1 True 8 +18 KNeighborsDist_BAG_L1 -92.031272 -89.946854 root_mean_squared_error 0.016559 0.027300 0.007747 0.016559 0.027300 0.007747 1 True 2 +19 KNeighborsUnif_BAG_L1 -109.161488 -107.445008 root_mean_squared_error 0.018102 0.036615 0.007678 0.018102 0.036615 0.007678 1 True 1 + 1 = Optimal num_stack_levels (Stacked Overfitting Occurred: False) + 157s = DyStack runtime | 443s = Remaining runtime +Starting main fit with num_stack_levels=1. + For future fit calls on this dataset, you can skip DyStack to save time: `predictor.fit(..., dynamic_stacking=False, num_stack_levels=1)` +Beginning AutoGluon training ... Time limit = 443s +AutoGluon will save models to "AutogluonModels/ag-20240814_030610" +Train Data Rows: 10886 +Train Data Columns: 13 +Label Column: count +Problem Type: regression +Preprocessing data ... +Using Feature Generators to preprocess the data ... +Fitting AutoMLPipelineFeatureGenerator... + Available Memory: 8358.78 MB + Train Data (Original) Memory Usage: 0.77 MB (0.0% of available memory) + Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features. + Stage 1 Generators: + Fitting AsTypeFeatureGenerator... + Note: Converting 3 features to boolean dtype as they only contain 2 unique values. + Stage 2 Generators: + Fitting FillNaFeatureGenerator... + Stage 3 Generators: + Fitting IdentityFeatureGenerator... + Fitting CategoryFeatureGenerator... + Fitting CategoryMemoryMinimizeFeatureGenerator... + Fitting DatetimeFeatureGenerator... + Stage 4 Generators: + Fitting DropUniqueFeatureGenerator... + Stage 5 Generators: + Fitting DropDuplicatesFeatureGenerator... + Types of features in original data (raw dtype, special dtypes): + ('category', []) : 2 | ['season', 'weather'] + ('datetime', []) : 1 | ['datetime'] + ('float', []) : 3 | ['temp', 'atemp', 'windspeed'] + ('int', []) : 7 | ['holiday', 'workingday', 'humidity', 'year', 'month', ...] + Types of features in processed data (raw dtype, special dtypes): + ('category', []) : 2 | ['season', 'weather'] + ('float', []) : 3 | ['temp', 'atemp', 'windspeed'] + ('int', []) : 4 | ['humidity', 'month', 'hour', 'day'] + ('int', ['bool']) : 3 | ['holiday', 'workingday', 'year'] + ('int', ['datetime_as_int']) : 3 | ['datetime', 'datetime.year', 'datetime.day'] + 0.2s = Fit runtime + 13 features in original data used to generate 15 features in processed data. + Train Data (Processed) Memory Usage: 0.76 MB (0.0% of available memory) +Data preprocessing and feature engineering runtime = 0.23s ... +AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error' + This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value. + To change this, specify the eval_metric parameter of Predictor() +Large model count detected (112 configs) ... Only displaying the first 3 models of each family. To see all, set `verbosity=3`. +User-specified model hyperparameters to be fit: +{ + 'NN_TORCH': [{}, {'activation': 'elu', 'dropout_prob': 0.10077639529843717, 'hidden_size': 108, 'learning_rate': 0.002735937344002146, 'num_layers': 4, 'use_batchnorm': True, 'weight_decay': 1.356433327634438e-12, 'ag_args': {'name_suffix': '_r79', 'priority': -2}}, {'activation': 'elu', 'dropout_prob': 0.11897478034205347, 'hidden_size': 213, 'learning_rate': 0.0010474382260641949, 'num_layers': 4, 'use_batchnorm': False, 'weight_decay': 5.594471067786272e-10, 'ag_args': {'name_suffix': '_r22', 'priority': -7}}], + 'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'], + 'CAT': [{}, {'depth': 6, 'grow_policy': 'SymmetricTree', 'l2_leaf_reg': 2.1542798306067823, 'learning_rate': 0.06864209415792857, 'max_ctr_complexity': 4, 'one_hot_max_size': 10, 'ag_args': {'name_suffix': '_r177', 'priority': -1}}, {'depth': 8, 'grow_policy': 'Depthwise', 'l2_leaf_reg': 2.7997999596449104, 'learning_rate': 0.031375015734637225, 'max_ctr_complexity': 2, 'one_hot_max_size': 3, 'ag_args': {'name_suffix': '_r9', 'priority': -5}}], + 'XGB': [{}, {'colsample_bytree': 0.6917311125174739, 'enable_categorical': False, 'learning_rate': 0.018063876087523967, 'max_depth': 10, 'min_child_weight': 0.6028633586934382, 'ag_args': {'name_suffix': '_r33', 'priority': -8}}, {'colsample_bytree': 0.6628423832084077, 'enable_categorical': False, 'learning_rate': 0.08775715546881824, 'max_depth': 5, 'min_child_weight': 0.6294123374222513, 'ag_args': {'name_suffix': '_r89', 'priority': -16}}], + 'FASTAI': [{}, {'bs': 256, 'emb_drop': 0.5411770367537934, 'epochs': 43, 'layers': [800, 400], 'lr': 0.01519848858318159, 'ps': 0.23782946566604385, 'ag_args': {'name_suffix': '_r191', 'priority': -4}}, {'bs': 2048, 'emb_drop': 0.05070411322605811, 'epochs': 29, 'layers': [200, 100], 'lr': 0.08974235041576624, 'ps': 0.10393466140748028, 'ag_args': {'name_suffix': '_r102', 'priority': -11}}], + 'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], + 'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}], + 'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}], +} +AutoGluon will fit 2 stack levels (L1 to L2) ... +Fitting 108 L1 models ... +Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 295.32s of the 443.08s of remaining time. + -101.5462 = Validation score (-root_mean_squared_error) + 0.02s = Training runtime + 0.03s = Validation runtime +Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 295.25s of the 443.01s of remaining time. + -84.1251 = Validation score (-root_mean_squared_error) + 0.01s = Training runtime + 0.03s = Validation runtime +Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 295.18s of the 442.94s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.09%) + -34.3803 = Validation score (-root_mean_squared_error) + 6.82s = Training runtime + 2.43s = Validation runtime +Fitting model: LightGBM_BAG_L1 ... Training model for up to 286.68s of the 434.44s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.08%) + -33.9176 = Validation score (-root_mean_squared_error) + 2.76s = Training runtime + 0.53s = Validation runtime +Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 282.64s of the 430.4s of remaining time. + -38.4535 = Validation score (-root_mean_squared_error) + 1.04s = Training runtime + 0.27s = Validation runtime +Fitting model: CatBoost_BAG_L1 ... Training model for up to 281.06s of the 428.82s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.11%) + -33.1248 = Validation score (-root_mean_squared_error) + 72.6s = Training runtime + 0.09s = Validation runtime +Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 207.42s of the 355.19s of remaining time. + -38.5302 = Validation score (-root_mean_squared_error) + 0.6s = Training runtime + 0.27s = Validation runtime +Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 206.35s of the 354.11s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.09%) + -45.1252 = Validation score (-root_mean_squared_error) + 13.25s = Training runtime + 0.17s = Validation runtime +Fitting model: XGBoost_BAG_L1 ... Training model for up to 191.95s of the 339.71s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.15%) + -34.5999 = Validation score (-root_mean_squared_error) + 2.96s = Training runtime + 0.32s = Validation runtime +Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 187.75s of the 335.52s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.05%) + -38.2129 = Validation score (-root_mean_squared_error) + 73.36s = Training runtime + 0.07s = Validation runtime +Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 113.3s of the 261.07s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.18%) + -33.7519 = Validation score (-root_mean_squared_error) + 4.36s = Training runtime + 0.44s = Validation runtime +Fitting model: CatBoost_r177_BAG_L1 ... Training model for up to 107.55s of the 255.31s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.12%) + -33.0185 = Validation score (-root_mean_squared_error) + 41.42s = Training runtime + 0.05s = Validation runtime +Fitting model: NeuralNetTorch_r79_BAG_L1 ... Training model for up to 65.05s of the 212.82s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.05%) + -52.074 = Validation score (-root_mean_squared_error) + 52.9s = Training runtime + 0.09s = Validation runtime +Fitting model: LightGBM_r131_BAG_L1 ... Training model for up to 11.1s of the 158.86s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.12%) + -33.138 = Validation score (-root_mean_squared_error) + 9.38s = Training runtime + 3.64s = Validation runtime +Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 147.11s of remaining time. + Ensemble Weights: {'LightGBMLarge_BAG_L1': 0.24, 'CatBoost_r177_BAG_L1': 0.24, 'NeuralNetTorch_BAG_L1': 0.16, 'LightGBMXT_BAG_L1': 0.12, 'CatBoost_BAG_L1': 0.08, 'KNeighborsDist_BAG_L1': 0.04, 'LightGBM_BAG_L1': 0.04, 'XGBoost_BAG_L1': 0.04, 'LightGBM_r131_BAG_L1': 0.04} + -31.4781 = Validation score (-root_mean_squared_error) + 0.02s = Training runtime + 0.0s = Validation runtime +Fitting 106 L2 models ... +Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 147.08s of the 147.06s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.15%) + -30.8862 = Validation score (-root_mean_squared_error) + 1.61s = Training runtime + 0.14s = Validation runtime +Fitting model: LightGBM_BAG_L2 ... Training model for up to 144.35s of the 144.33s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.16%) + -30.1289 = Validation score (-root_mean_squared_error) + 1.38s = Training runtime + 0.09s = Validation runtime +Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 141.86s of the 141.84s of remaining time. + -31.3299 = Validation score (-root_mean_squared_error) + 3.86s = Training runtime + 0.3s = Validation runtime +Fitting model: CatBoost_BAG_L2 ... Training model for up to 137.5s of the 137.48s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.21%) + -30.2689 = Validation score (-root_mean_squared_error) + 15.96s = Training runtime + 0.03s = Validation runtime +Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 120.52s of the 120.5s of remaining time. + -31.1814 = Validation score (-root_mean_squared_error) + 0.9s = Training runtime + 0.33s = Validation runtime +Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 119.09s of the 119.07s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.16%) + -29.5052 = Validation score (-root_mean_squared_error) + 12.71s = Training runtime + 0.16s = Validation runtime +Fitting model: XGBoost_BAG_L2 ... Training model for up to 105.31s of the 105.29s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.25%) + -30.5688 = Validation score (-root_mean_squared_error) + 1.9s = Training runtime + 0.08s = Validation runtime +Fitting model: NeuralNetTorch_BAG_L2 ... Training model for up to 102.34s of the 102.31s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.09%) + -31.3839 = Validation score (-root_mean_squared_error) + 25.91s = Training runtime + 0.14s = Validation runtime +Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 75.34s of the 75.32s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.37%) + -30.822 = Validation score (-root_mean_squared_error) + 2.57s = Training runtime + 0.09s = Validation runtime +Fitting model: CatBoost_r177_BAG_L2 ... Training model for up to 71.6s of the 71.58s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.22%) + -30.5465 = Validation score (-root_mean_squared_error) + 7.76s = Training runtime + 0.03s = Validation runtime +Fitting model: NeuralNetTorch_r79_BAG_L2 ... Training model for up to 62.69s of the 62.66s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.09%) + -28.9532 = Validation score (-root_mean_squared_error) + 50.87s = Training runtime + 0.14s = Validation runtime +Fitting model: LightGBM_r131_BAG_L2 ... Training model for up to 10.76s of the 10.74s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.21%) + -30.1667 = Validation score (-root_mean_squared_error) + 3.56s = Training runtime + 0.33s = Validation runtime +Fitting model: NeuralNetFastAI_r191_BAG_L2 ... Training model for up to 5.87s of the 5.85s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.17%) + -34.365 = Validation score (-root_mean_squared_error) + 6.54s = Training runtime + 0.32s = Validation runtime +Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -2.13s of remaining time. + Ensemble Weights: {'NeuralNetTorch_r79_BAG_L2': 0.545, 'NeuralNetFastAI_BAG_L2': 0.227, 'LightGBM_BAG_L2': 0.182, 'XGBoost_BAG_L2': 0.045} + -28.4118 = Validation score (-root_mean_squared_error) + 0.02s = Training runtime + 0.0s = Validation runtime +AutoGluon training complete, total runtime = 445.49s ... Best model: WeightedEnsemble_L3 | Estimated inference throughput: 162.3 rows/s (1361 batch size) +TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240814_030610") ++
predictor_new_features.fit_summary()
+
*** Summary of fit() *** +Estimated performance of each model: + model score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order +0 WeightedEnsemble_L3 -28.411769 root_mean_squared_error 8.912918 348.358585 0.000472 0.023752 3 True 29 +1 NeuralNetTorch_r79_BAG_L2 -28.953248 root_mean_squared_error 8.578212 332.345866 0.141977 50.873895 2 True 26 +2 NeuralNetFastAI_BAG_L2 -29.505155 root_mean_squared_error 8.600562 294.185461 0.164327 12.713491 2 True 21 +3 LightGBM_BAG_L2 -30.128885 root_mean_squared_error 8.522249 282.849097 0.086014 1.377126 2 True 17 +4 LightGBM_r131_BAG_L2 -30.166705 root_mean_squared_error 8.763226 285.029273 0.326991 3.557302 2 True 27 +5 CatBoost_BAG_L2 -30.268944 root_mean_squared_error 8.466554 297.429360 0.030319 15.957390 2 True 19 +6 CatBoost_r177_BAG_L2 -30.546485 root_mean_squared_error 8.469838 289.235542 0.033603 7.763571 2 True 25 +7 XGBoost_BAG_L2 -30.568844 root_mean_squared_error 8.520128 283.370320 0.083893 1.898349 2 True 22 +8 LightGBMLarge_BAG_L2 -30.822019 root_mean_squared_error 8.524213 284.039513 0.087978 2.567542 2 True 24 +9 LightGBMXT_BAG_L2 -30.886235 root_mean_squared_error 8.576837 283.085844 0.140602 1.613873 2 True 16 +10 ExtraTreesMSE_BAG_L2 -31.181370 root_mean_squared_error 8.761307 282.376198 0.325072 0.904228 2 True 20 +11 RandomForestMSE_BAG_L2 -31.329898 root_mean_squared_error 8.738191 285.334698 0.301956 3.862728 2 True 18 +12 NeuralNetTorch_BAG_L2 -31.383912 root_mean_squared_error 8.577103 307.383297 0.140868 25.911327 2 True 23 +13 WeightedEnsemble_L2 -31.478087 root_mean_squared_error 7.608207 213.676847 0.000383 0.015923 2 True 15 +14 CatBoost_r177_BAG_L1 -33.018477 root_mean_squared_error 0.047098 41.419941 0.047098 41.419941 1 True 12 +15 CatBoost_BAG_L1 -33.124778 root_mean_squared_error 0.094216 72.604379 0.094216 72.604379 1 True 6 +16 LightGBM_r131_BAG_L1 -33.138027 root_mean_squared_error 3.640237 9.378333 3.640237 9.378333 1 True 14 +17 LightGBMLarge_BAG_L1 -33.751924 root_mean_squared_error 0.435275 4.357254 0.435275 4.357254 1 True 11 +18 LightGBM_BAG_L1 -33.917582 root_mean_squared_error 0.531829 2.759214 0.531829 2.759214 1 True 4 +19 NeuralNetFastAI_r191_BAG_L2 -34.365007 root_mean_squared_error 8.752540 288.010348 0.316305 6.538378 2 True 28 +20 LightGBMXT_BAG_L1 -34.380279 root_mean_squared_error 2.434291 6.815766 2.434291 6.815766 1 True 3 +21 XGBoost_BAG_L1 -34.599919 root_mean_squared_error 0.320909 2.955936 0.320909 2.955936 1 True 9 +22 NeuralNetTorch_BAG_L1 -38.212880 root_mean_squared_error 0.073614 73.355968 0.073614 73.355968 1 True 10 +23 RandomForestMSE_BAG_L1 -38.453450 root_mean_squared_error 0.274664 1.042654 0.274664 1.042654 1 True 5 +24 ExtraTreesMSE_BAG_L1 -38.530234 root_mean_squared_error 0.267140 0.603976 0.267140 0.603976 1 True 7 +25 NeuralNetFastAI_BAG_L1 -45.125231 root_mean_squared_error 0.167663 13.252673 0.167663 13.252673 1 True 8 +26 NeuralNetTorch_r79_BAG_L1 -52.074033 root_mean_squared_error 0.089250 52.895663 0.089250 52.895663 1 True 13 +27 KNeighborsDist_BAG_L1 -84.125061 root_mean_squared_error 0.030355 0.014132 0.030355 0.014132 1 True 2 +28 KNeighborsUnif_BAG_L1 -101.546199 root_mean_squared_error 0.029694 0.016081 0.029694 0.016081 1 True 1 +Number of models trained: 29 +Types of models trained: +{'WeightedEnsembleModel', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_XGBoost', 'StackerEnsembleModel_TabularNeuralNetTorch'} +Bagging used: True (with 8 folds) +Multi-layer stack-ensembling used: True (with 3 levels) +Feature Metadata (Processed): +(raw dtype, special dtypes): +('category', []) : 2 | ['season', 'weather'] +('float', []) : 3 | ['temp', 'atemp', 'windspeed'] +('int', []) : 4 | ['humidity', 'month', 'hour', 'day'] +('int', ['bool']) : 3 | ['holiday', 'workingday', 'year'] +('int', ['datetime_as_int']) : 3 | ['datetime', 'datetime.year', 'datetime.day'] +*** End of fit() summary *** ++
/home/satyam/miniforge3/envs/udacity/lib/python3.9/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1" + warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"') ++
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN', + 'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN', + 'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB', + 'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF', + 'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost', + 'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT', + 'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular', + 'XGBoost_BAG_L1': 'StackerEnsembleModel_XGBoost', + 'NeuralNetTorch_BAG_L1': 'StackerEnsembleModel_TabularNeuralNetTorch', + 'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB', + 'CatBoost_r177_BAG_L1': 'StackerEnsembleModel_CatBoost', + 'NeuralNetTorch_r79_BAG_L1': 'StackerEnsembleModel_TabularNeuralNetTorch', + 'LightGBM_r131_BAG_L1': 'StackerEnsembleModel_LGB', + 'WeightedEnsemble_L2': 'WeightedEnsembleModel', + 'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB', + 'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF', + 'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost', + 'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT', + 'NeuralNetFastAI_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular', + 'XGBoost_BAG_L2': 'StackerEnsembleModel_XGBoost', + 'NeuralNetTorch_BAG_L2': 'StackerEnsembleModel_TabularNeuralNetTorch', + 'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB', + 'CatBoost_r177_BAG_L2': 'StackerEnsembleModel_CatBoost', + 'NeuralNetTorch_r79_BAG_L2': 'StackerEnsembleModel_TabularNeuralNetTorch', + 'LightGBM_r131_BAG_L2': 'StackerEnsembleModel_LGB', + 'NeuralNetFastAI_r191_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular', + 'WeightedEnsemble_L3': 'WeightedEnsembleModel'}, + 'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061, + 'KNeighborsDist_BAG_L1': -84.12506123181602, + 'LightGBMXT_BAG_L1': -34.3802792861704, + 'LightGBM_BAG_L1': -33.91758184996628, + 'RandomForestMSE_BAG_L1': -38.453450307199205, + 'CatBoost_BAG_L1': -33.12477843411282, + 'ExtraTreesMSE_BAG_L1': -38.53023388531077, + 'NeuralNetFastAI_BAG_L1': -45.12523070689563, + 'XGBoost_BAG_L1': -34.59991853898211, + 'NeuralNetTorch_BAG_L1': -38.21287957979157, + 'LightGBMLarge_BAG_L1': -33.75192367098915, + 'CatBoost_r177_BAG_L1': -33.01847657223508, + 'NeuralNetTorch_r79_BAG_L1': -52.07403306749506, + 'LightGBM_r131_BAG_L1': -33.13802736925506, + 'WeightedEnsemble_L2': -31.47808681941441, + 'LightGBMXT_BAG_L2': -30.886235063313997, + 'LightGBM_BAG_L2': -30.128884716636065, + 'RandomForestMSE_BAG_L2': -31.329897896458604, + 'CatBoost_BAG_L2': -30.268943676456807, + 'ExtraTreesMSE_BAG_L2': -31.181370410778847, + 'NeuralNetFastAI_BAG_L2': -29.505155329245934, + 'XGBoost_BAG_L2': -30.568843922812565, + 'NeuralNetTorch_BAG_L2': -31.383912148442754, + 'LightGBMLarge_BAG_L2': -30.822018740442346, + 'CatBoost_r177_BAG_L2': -30.546485447351753, + 'NeuralNetTorch_r79_BAG_L2': -28.953247589580503, + 'LightGBM_r131_BAG_L2': -30.166704777725265, + 'NeuralNetFastAI_r191_BAG_L2': -34.36500666636786, + 'WeightedEnsemble_L3': -28.411769339946353}, + 'model_best': 'WeightedEnsemble_L3', + 'model_paths': {'KNeighborsUnif_BAG_L1': ['KNeighborsUnif_BAG_L1'], + 'KNeighborsDist_BAG_L1': ['KNeighborsDist_BAG_L1'], + 'LightGBMXT_BAG_L1': ['LightGBMXT_BAG_L1'], + 'LightGBM_BAG_L1': ['LightGBM_BAG_L1'], + 'RandomForestMSE_BAG_L1': ['RandomForestMSE_BAG_L1'], + 'CatBoost_BAG_L1': ['CatBoost_BAG_L1'], + 'ExtraTreesMSE_BAG_L1': ['ExtraTreesMSE_BAG_L1'], + 'NeuralNetFastAI_BAG_L1': ['NeuralNetFastAI_BAG_L1'], + 'XGBoost_BAG_L1': ['XGBoost_BAG_L1'], + 'NeuralNetTorch_BAG_L1': ['NeuralNetTorch_BAG_L1'], + 'LightGBMLarge_BAG_L1': ['LightGBMLarge_BAG_L1'], + 'CatBoost_r177_BAG_L1': ['CatBoost_r177_BAG_L1'], + 'NeuralNetTorch_r79_BAG_L1': ['NeuralNetTorch_r79_BAG_L1'], + 'LightGBM_r131_BAG_L1': ['LightGBM_r131_BAG_L1'], + 'WeightedEnsemble_L2': ['WeightedEnsemble_L2'], + 'LightGBMXT_BAG_L2': ['LightGBMXT_BAG_L2'], + 'LightGBM_BAG_L2': ['LightGBM_BAG_L2'], + 'RandomForestMSE_BAG_L2': ['RandomForestMSE_BAG_L2'], + 'CatBoost_BAG_L2': ['CatBoost_BAG_L2'], + 'ExtraTreesMSE_BAG_L2': ['ExtraTreesMSE_BAG_L2'], + 'NeuralNetFastAI_BAG_L2': ['NeuralNetFastAI_BAG_L2'], + 'XGBoost_BAG_L2': ['XGBoost_BAG_L2'], + 'NeuralNetTorch_BAG_L2': ['NeuralNetTorch_BAG_L2'], + 'LightGBMLarge_BAG_L2': ['LightGBMLarge_BAG_L2'], + 'CatBoost_r177_BAG_L2': ['CatBoost_r177_BAG_L2'], + 'NeuralNetTorch_r79_BAG_L2': ['NeuralNetTorch_r79_BAG_L2'], + 'LightGBM_r131_BAG_L2': ['LightGBM_r131_BAG_L2'], + 'NeuralNetFastAI_r191_BAG_L2': ['NeuralNetFastAI_r191_BAG_L2'], + 'WeightedEnsemble_L3': ['WeightedEnsemble_L3']}, + 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.016080856323242188, + 'KNeighborsDist_BAG_L1': 0.014132261276245117, + 'LightGBMXT_BAG_L1': 6.815765857696533, + 'LightGBM_BAG_L1': 2.759213924407959, + 'RandomForestMSE_BAG_L1': 1.0426537990570068, + 'CatBoost_BAG_L1': 72.60437941551208, + 'ExtraTreesMSE_BAG_L1': 0.603975772857666, + 'NeuralNetFastAI_BAG_L1': 13.252672672271729, + 'XGBoost_BAG_L1': 2.9559364318847656, + 'NeuralNetTorch_BAG_L1': 73.35596752166748, + 'LightGBMLarge_BAG_L1': 4.3572540283203125, + 'CatBoost_r177_BAG_L1': 41.419941425323486, + 'NeuralNetTorch_r79_BAG_L1': 52.89566349983215, + 'LightGBM_r131_BAG_L1': 9.37833309173584, + 'WeightedEnsemble_L2': 0.01592254638671875, + 'LightGBMXT_BAG_L2': 1.6138732433319092, + 'LightGBM_BAG_L2': 1.3771264553070068, + 'RandomForestMSE_BAG_L2': 3.862727642059326, + 'CatBoost_BAG_L2': 15.95738959312439, + 'ExtraTreesMSE_BAG_L2': 0.9042277336120605, + 'NeuralNetFastAI_BAG_L2': 12.713490724563599, + 'XGBoost_BAG_L2': 1.8983490467071533, + 'NeuralNetTorch_BAG_L2': 25.91132664680481, + 'LightGBMLarge_BAG_L2': 2.56754207611084, + 'CatBoost_r177_BAG_L2': 7.763571262359619, + 'NeuralNetTorch_r79_BAG_L2': 50.87389540672302, + 'LightGBM_r131_BAG_L2': 3.557302236557007, + 'NeuralNetFastAI_r191_BAG_L2': 6.53837776184082, + 'WeightedEnsemble_L3': 0.023752450942993164}, + 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.029694080352783203, + 'KNeighborsDist_BAG_L1': 0.030354976654052734, + 'LightGBMXT_BAG_L1': 2.434290647506714, + 'LightGBM_BAG_L1': 0.5318291187286377, + 'RandomForestMSE_BAG_L1': 0.27466392517089844, + 'CatBoost_BAG_L1': 0.09421634674072266, + 'ExtraTreesMSE_BAG_L1': 0.26713991165161133, + 'NeuralNetFastAI_BAG_L1': 0.1676630973815918, + 'XGBoost_BAG_L1': 0.3209085464477539, + 'NeuralNetTorch_BAG_L1': 0.07361412048339844, + 'LightGBMLarge_BAG_L1': 0.4352753162384033, + 'CatBoost_r177_BAG_L1': 0.04709815979003906, + 'NeuralNetTorch_r79_BAG_L1': 0.08925008773803711, + 'LightGBM_r131_BAG_L1': 3.6402366161346436, + 'WeightedEnsemble_L2': 0.0003833770751953125, + 'LightGBMXT_BAG_L2': 0.14060163497924805, + 'LightGBM_BAG_L2': 0.0860142707824707, + 'RandomForestMSE_BAG_L2': 0.3019559383392334, + 'CatBoost_BAG_L2': 0.0303192138671875, + 'ExtraTreesMSE_BAG_L2': 0.3250722885131836, + 'NeuralNetFastAI_BAG_L2': 0.16432666778564453, + 'XGBoost_BAG_L2': 0.083892822265625, + 'NeuralNetTorch_BAG_L2': 0.1408679485321045, + 'LightGBMLarge_BAG_L2': 0.08797788619995117, + 'CatBoost_r177_BAG_L2': 0.03360295295715332, + 'NeuralNetTorch_r79_BAG_L2': 0.14197707176208496, + 'LightGBM_r131_BAG_L2': 0.32699060440063477, + 'NeuralNetFastAI_r191_BAG_L2': 0.31630516052246094, + 'WeightedEnsemble_L3': 0.00047206878662109375}, + 'num_bag_folds': 8, + 'max_stack_level': 3, + 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'KNeighborsDist_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'LightGBMXT_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'RandomForestMSE_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'CatBoost_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'ExtraTreesMSE_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'NeuralNetFastAI_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetTorch_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMLarge_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_r177_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetTorch_r79_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_r131_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'WeightedEnsemble_L2': {'use_orig_features': False, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'RandomForestMSE_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'CatBoost_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'ExtraTreesMSE_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'NeuralNetFastAI_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetTorch_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMLarge_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_r177_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetTorch_r79_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_r131_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_r191_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'WeightedEnsemble_L3': {'use_orig_features': False, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}}, + 'leaderboard': model score_val eval_metric \ + 0 WeightedEnsemble_L3 -28.411769 root_mean_squared_error + 1 NeuralNetTorch_r79_BAG_L2 -28.953248 root_mean_squared_error + 2 NeuralNetFastAI_BAG_L2 -29.505155 root_mean_squared_error + 3 LightGBM_BAG_L2 -30.128885 root_mean_squared_error + 4 LightGBM_r131_BAG_L2 -30.166705 root_mean_squared_error + 5 CatBoost_BAG_L2 -30.268944 root_mean_squared_error + 6 CatBoost_r177_BAG_L2 -30.546485 root_mean_squared_error + 7 XGBoost_BAG_L2 -30.568844 root_mean_squared_error + 8 LightGBMLarge_BAG_L2 -30.822019 root_mean_squared_error + 9 LightGBMXT_BAG_L2 -30.886235 root_mean_squared_error + 10 ExtraTreesMSE_BAG_L2 -31.181370 root_mean_squared_error + 11 RandomForestMSE_BAG_L2 -31.329898 root_mean_squared_error + 12 NeuralNetTorch_BAG_L2 -31.383912 root_mean_squared_error + 13 WeightedEnsemble_L2 -31.478087 root_mean_squared_error + 14 CatBoost_r177_BAG_L1 -33.018477 root_mean_squared_error + 15 CatBoost_BAG_L1 -33.124778 root_mean_squared_error + 16 LightGBM_r131_BAG_L1 -33.138027 root_mean_squared_error + 17 LightGBMLarge_BAG_L1 -33.751924 root_mean_squared_error + 18 LightGBM_BAG_L1 -33.917582 root_mean_squared_error + 19 NeuralNetFastAI_r191_BAG_L2 -34.365007 root_mean_squared_error + 20 LightGBMXT_BAG_L1 -34.380279 root_mean_squared_error + 21 XGBoost_BAG_L1 -34.599919 root_mean_squared_error + 22 NeuralNetTorch_BAG_L1 -38.212880 root_mean_squared_error + 23 RandomForestMSE_BAG_L1 -38.453450 root_mean_squared_error + 24 ExtraTreesMSE_BAG_L1 -38.530234 root_mean_squared_error + 25 NeuralNetFastAI_BAG_L1 -45.125231 root_mean_squared_error + 26 NeuralNetTorch_r79_BAG_L1 -52.074033 root_mean_squared_error + 27 KNeighborsDist_BAG_L1 -84.125061 root_mean_squared_error + 28 KNeighborsUnif_BAG_L1 -101.546199 root_mean_squared_error + + pred_time_val fit_time pred_time_val_marginal fit_time_marginal \ + 0 8.912918 348.358585 0.000472 0.023752 + 1 8.578212 332.345866 0.141977 50.873895 + 2 8.600562 294.185461 0.164327 12.713491 + 3 8.522249 282.849097 0.086014 1.377126 + 4 8.763226 285.029273 0.326991 3.557302 + 5 8.466554 297.429360 0.030319 15.957390 + 6 8.469838 289.235542 0.033603 7.763571 + 7 8.520128 283.370320 0.083893 1.898349 + 8 8.524213 284.039513 0.087978 2.567542 + 9 8.576837 283.085844 0.140602 1.613873 + 10 8.761307 282.376198 0.325072 0.904228 + 11 8.738191 285.334698 0.301956 3.862728 + 12 8.577103 307.383297 0.140868 25.911327 + 13 7.608207 213.676847 0.000383 0.015923 + 14 0.047098 41.419941 0.047098 41.419941 + 15 0.094216 72.604379 0.094216 72.604379 + 16 3.640237 9.378333 3.640237 9.378333 + 17 0.435275 4.357254 0.435275 4.357254 + 18 0.531829 2.759214 0.531829 2.759214 + 19 8.752540 288.010348 0.316305 6.538378 + 20 2.434291 6.815766 2.434291 6.815766 + 21 0.320909 2.955936 0.320909 2.955936 + 22 0.073614 73.355968 0.073614 73.355968 + 23 0.274664 1.042654 0.274664 1.042654 + 24 0.267140 0.603976 0.267140 0.603976 + 25 0.167663 13.252673 0.167663 13.252673 + 26 0.089250 52.895663 0.089250 52.895663 + 27 0.030355 0.014132 0.030355 0.014132 + 28 0.029694 0.016081 0.029694 0.016081 + + stack_level can_infer fit_order + 0 3 True 29 + 1 2 True 26 + 2 2 True 21 + 3 2 True 17 + 4 2 True 27 + 5 2 True 19 + 6 2 True 25 + 7 2 True 22 + 8 2 True 24 + 9 2 True 16 + 10 2 True 20 + 11 2 True 18 + 12 2 True 23 + 13 2 True 15 + 14 1 True 12 + 15 1 True 6 + 16 1 True 14 + 17 1 True 11 + 18 1 True 4 + 19 2 True 28 + 20 1 True 3 + 21 1 True 9 + 22 1 True 10 + 23 1 True 5 + 24 1 True 7 + 25 1 True 8 + 26 1 True 13 + 27 1 True 2 + 28 1 True 1 }+
# Remember to set all negative values to zero
+predictions = predictor_new_features.predict(test)
+
predictions.describe()
+
count 6493.000000 +mean 133.103561 +std 111.837509 +min 1.967045 +25% 49.790619 +50% 107.926331 +75% 181.343018 +max 742.234985 +Name: count, dtype: float64+
print(f"Number of negative prediction values: {predictions[predictions<0].any().sum()}")
+
Number of negative prediction values: 0 ++
# Same submitting predictions
+submission["count"] = predictions
+submission.to_csv("submission_new_features.csv", index=False)
+
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features"
+
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/satyam/.kaggle/kaggle.json' +100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 583kB/s] +Successfully submitted to Bike Sharing Demand+
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
+
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/satyam/.kaggle/kaggle.json' +fileName date description status publicScore privateScore +--------------------------- ------------------- --------------------------------- -------- ----------- ------------ +submission_new_features.csv 2024-08-14 03:23:44 new features complete 0.75394 0.75394 +submission.csv 2024-08-14 03:03:28 first raw submission complete 1.79357 1.79357 +submission_new_hpo.csv 2024-06-13 05:19:39 new features with hyperparameters complete 0.48667 0.48667 ++
New Score of 0.75394
¶
+Step 6: Hyper parameter optimization¶
-
+
- There are many options for hyper parameter optimization. +
- Options are to change the AutoGluon higher level parameters or the individual model hyperparameters. +
- The hyperparameters of the models themselves that are in AutoGluon. Those need the
hyperparameter
andhyperparameter_tune_kwargs
arguments.
+
hyperparameters = {
+ 'GBM': [
+ {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}},
+ {},
+ 'GBMLarge',
+ ],
+ 'CAT': {"learning_rate": 0.03, "iterations": 15, "l2_leaf_reg": 0.125},
+ 'XGB': {},
+ 'FASTAI': {},
+ 'RF': [
+ {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression']}},
+ ],
+ 'XT': [
+ {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression']}},
+ ]
+ }
+
hyperparameter_tune_kwargs = {
+ "num_trials": 5,
+ "scheduler": "local",
+ "searcher": "auto",
+ "num_folds": 7
+
+}
+
predictor_new_hpo = TabularPredictor(label="count", problem_type="regression", eval_metric="root_mean_squared_error").fit(train_data=train[col_names], time_limit=1000, presets="best_quality", hyperparameters=hyperparameters, hyperparameter_tune_kwargs= hyperparameter_tune_kwargs)
+
2024-08-14 00:15:05,437 WARNING experiment_state.py:205 -- Experiment state snapshotting has been triggered multiple times in the last 5.0 seconds. A snapshot is forced if `CheckpointConfig(num_to_keep)` is set, and a trial has checkpointed >= `num_to_keep` times since the last snapshot. +You may want to consider increasing the `CheckpointConfig(num_to_keep)` or decreasing the frequency of saving checkpoints. +You can suppress this error by setting the environment variable TUNE_WARN_EXCESSIVE_EXPERIMENT_CHECKPOINT_SYNC_THRESHOLD_S to a smaller value than the current threshold (5.0). +2024-08-14 00:15:05,439 INFO tune.py:1016 -- Wrote the latest version of all result files and experiment state to '/home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2' in 0.0043s. +2024-08-14 00:15:05,443 WARNING experiment_analysis.py:190 -- Failed to fetch metrics for 5 trial(s): +- 458c2ed5: FileNotFoundError('Could not fetch metrics for 458c2ed5: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/458c2ed5') +- 75511c6e: FileNotFoundError('Could not fetch metrics for 75511c6e: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/75511c6e') +- 04d21e8a: FileNotFoundError('Could not fetch metrics for 04d21e8a: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/04d21e8a') +- 3647f6c5: FileNotFoundError('Could not fetch metrics for 3647f6c5: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/3647f6c5') +- d3405d78: FileNotFoundError('Could not fetch metrics for d3405d78: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/d3405d78') +Fitted model: NeuralNetFastAI_BAG_L2/458c2ed5 ... + -32.0959 = Validation score (-root_mean_squared_error) + 19.06s = Training runtime + 0.33s = Validation runtime +Fitted model: NeuralNetFastAI_BAG_L2/75511c6e ... + -32.135 = Validation score (-root_mean_squared_error) +2024-08-14 00:15:05,439 INFO tune.py:1016 -- Wrote the latest version of all result files and experiment state to '/home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2' in 0.0043s. +2024-08-14 00:15:05,443 WARNING experiment_analysis.py:190 -- Failed to fetch metrics for 5 trial(s): +- 458c2ed5: FileNotFoundError('Could not fetch metrics for 458c2ed5: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/458c2ed5') +- 75511c6e: FileNotFoundError('Could not fetch metrics for 75511c6e: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/75511c6e') +- 04d21e8a: FileNotFoundError('Could not fetch metrics for 04d21e8a: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/04d21e8a') +- 3647f6c5: FileNotFoundError('Could not fetch metrics for 3647f6c5: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/3647f6c5') +- d3405d78: FileNotFoundError('Could not fetch metrics for d3405d78: both result.json and progress.csv were not found at /home/satyam/github-repos/udacity/aws-mle-nanodegree/cd0385-project-starter/project/AutogluonModels/ag-20240814_040800/models/NeuralNetFastAI_BAG_L2/d3405d78') +Fitted model: NeuralNetFastAI_BAG_L2/458c2ed5 ... + -32.0959 = Validation score (-root_mean_squared_error) + 19.06s = Training runtime + 0.33s = Validation runtime +Fitted model: NeuralNetFastAI_BAG_L2/75511c6e ... + -32.135 = Validation score (-root_mean_squared_error) + 13.25s = Training runtime + 0.22s = Validation runtime +Fitted model: NeuralNetFastAI_BAG_L2/04d21e8a ... + -32.3379 = Validation score (-root_mean_squared_error) + 18.07s = Training runtime + 0.37s = Validation runtime +Fitted model: NeuralNetFastAI_BAG_L2/3647f6c5 ... + -33.6407 = Validation score (-root_mean_squared_error) + 6.33s = Training runtime + 0.16s = Validation runtime +Fitted model: NeuralNetFastAI_BAG_L2/d3405d78 ... + -32.4363 = Validation score (-root_mean_squared_error) + 19.59s = Training runtime + 0.27s = Validation runtime +Hyperparameter tuning model: XGBoost_BAG_L2 ... Tuning model for up to 74.49s of the 574.86s of remaining time. + 0%| | 0/5 [00:00<?, ?it/s] Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.46%) + 20%|██ | 1/5 [00:03<00:13, 3.32s/it] Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.35%) + 40%|████ | 2/5 [00:06<00:09, 3.20s/it] Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.88%) + 60%|██████ | 3/5 [00:12<00:09, 4.59s/it] Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.31%) + 80%|████████ | 4/5 [00:14<00:03, 3.63s/it] Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=1.58%) +100%|██████████| 5/5 [00:23<00:00, 4.79s/it] +Fitted model: XGBoost_BAG_L2/T1 ... + -32.5075 = Validation score (-root_mean_squared_error) + 3.32s = Training runtime + 0.1s = Validation runtime +Fitted model: XGBoost_BAG_L2/T2 ... + -32.1089 = Validation score (-root_mean_squared_error) + 3.11s = Training runtime + 0.08s = Validation runtime +Fitted model: XGBoost_BAG_L2/T3 ... + -32.3086 = Validation score (-root_mean_squared_error) + 6.25s = Training runtime + 0.14s = Validation runtime +Fitted model: XGBoost_BAG_L2/T4 ... + -32.1589 = Validation score (-root_mean_squared_error) + 2.15s = Training runtime + 0.06s = Validation runtime +Fitted model: XGBoost_BAG_L2/T5 ... + -32.8547 = Validation score (-root_mean_squared_error) + 9.12s = Training runtime + 0.09s = Validation runtime +Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 74.49s of the 550.84s of remaining time. + Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=3, gpus=0, memory=0.46%) + -32.5711 = Validation score (-root_mean_squared_error) + 3.09s = Training runtime + 0.08s = Validation runtime +Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 546.6s of remaining time. + Ensemble Weights: {'NeuralNetFastAI_BAG_L2/458c2ed5': 0.3, 'RandomForestMSE_BAG_L2': 0.2, 'XGBoost_BAG_L2/T2': 0.15, 'XGBoost_BAG_L2/T4': 0.15, 'NeuralNetFastAI_BAG_L2/04d21e8a': 0.1, 'LightGBM_BAG_L2/T2': 0.05, 'ExtraTreesMSE_BAG_L2': 0.05} + -31.6349 = Validation score (-root_mean_squared_error) + 0.04s = Training runtime + 0.0s = Validation runtime +AutoGluon training complete, total runtime = 273.76s ... Best model: WeightedEnsemble_L3 | Estimated inference throughput: 29.0 rows/s (1361 batch size) +TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20240814_040800") ++
predictor_new_hpo.fit_summary()
+
*** Summary of fit() *** +Estimated performance of each model: + model score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order +0 WeightedEnsemble_L3 -31.634925 root_mean_squared_error 48.001303 234.019050 0.000406 0.042475 3 True 56 +1 ExtraTreesMSE_BAG_L2 -32.057370 root_mean_squared_error 46.807935 181.995976 0.320383 1.799293 2 True 44 +2 NeuralNetFastAI_BAG_L2/458c2ed5 -32.095865 root_mean_squared_error 46.815015 199.254547 0.327463 19.057865 2 True 45 +3 XGBoost_BAG_L2/T2 -32.108923 root_mean_squared_error 46.563530 183.306910 0.075979 3.110228 2 True 51 +4 NeuralNetFastAI_BAG_L2/75511c6e -32.135025 root_mean_squared_error 46.705959 193.449495 0.218407 13.252813 2 True 46 +5 RandomForestMSE_BAG_L2 -32.138783 root_mean_squared_error 46.810102 187.009624 0.322551 6.812942 2 True 39 +6 XGBoost_BAG_L2/T4 -32.158893 root_mean_squared_error 46.551207 182.344500 0.063656 2.147817 2 True 53 +7 LightGBM_BAG_L2/T1 -32.171844 root_mean_squared_error 46.525184 182.223403 0.037632 2.026720 2 True 34 +8 LightGBM_BAG_L2/T3 -32.256870 root_mean_squared_error 46.526634 182.671433 0.039083 2.474751 2 True 36 +9 LightGBM_BAG_L2/T5 -32.279724 root_mean_squared_error 46.519275 182.450240 0.031724 2.253557 2 True 38 +10 XGBoost_BAG_L2/T3 -32.308630 root_mean_squared_error 46.627537 186.442305 0.139986 6.245622 2 True 52 +11 LightGBM_BAG_L2/T4 -32.313637 root_mean_squared_error 46.742070 185.379261 0.254518 5.182578 2 True 37 +12 NeuralNetFastAI_BAG_L2/04d21e8a -32.337897 root_mean_squared_error 46.857535 198.271484 0.369983 18.074801 2 True 47 +13 WeightedEnsemble_L2 -32.357126 root_mean_squared_error 11.939492 66.857503 0.000380 0.024284 2 True 28 +14 NeuralNetFastAI_BAG_L2/d3405d78 -32.436304 root_mean_squared_error 46.760490 199.785098 0.272939 19.588415 2 True 49 +15 LightGBM_BAG_L2/T2 -32.507101 root_mean_squared_error 46.520882 182.973630 0.033331 2.776947 2 True 35 +16 XGBoost_BAG_L2/T1 -32.507452 root_mean_squared_error 46.583291 183.514729 0.095739 3.318046 2 True 50 +17 LightGBMXT_BAG_L2/T1 -32.521347 root_mean_squared_error 46.536275 182.029741 0.048723 1.833058 2 True 29 +18 LightGBMLarge_BAG_L2 -32.571129 root_mean_squared_error 46.565442 183.282501 0.077891 3.085818 2 True 55 +19 LightGBMXT_BAG_L2/T3 -32.577648 root_mean_squared_error 46.542341 182.190957 0.054789 1.994274 2 True 31 +20 LightGBMXT_BAG_L2/T4 -32.612851 root_mean_squared_error 46.940064 184.269460 0.452512 4.072778 2 True 32 +21 XGBoost_BAG_L2/T5 -32.854665 root_mean_squared_error 46.582059 189.316947 0.094507 9.120265 2 True 54 +22 LightGBMXT_BAG_L2/T5 -32.927269 root_mean_squared_error 46.531577 182.151535 0.044025 1.954852 2 True 33 +23 LightGBM_BAG_L1/T4 -33.105858 root_mean_squared_error 7.440267 14.355493 7.440267 14.355493 1 True 9 +24 LightGBMXT_BAG_L2/T2 -33.260245 root_mean_squared_error 46.533853 182.482405 0.046302 2.285723 2 True 30 +25 LightGBM_BAG_L1/T2 -33.459710 root_mean_squared_error 0.188156 3.072921 0.188156 3.072921 1 True 7 +26 NeuralNetFastAI_BAG_L2/3647f6c5 -33.640659 root_mean_squared_error 46.647511 186.529029 0.159959 6.332346 2 True 48 +27 LightGBMLarge_BAG_L1 -33.751924 root_mean_squared_error 0.416252 4.077021 0.416252 4.077021 1 True 27 +28 LightGBM_BAG_L1/T1 -33.917582 root_mean_squared_error 0.532185 3.716702 0.532185 3.716702 1 True 6 +29 LightGBM_BAG_L1/T3 -34.045955 root_mean_squared_error 0.692351 4.366904 0.692351 4.366904 1 True 8 +30 LightGBM_BAG_L1/T5 -34.164061 root_mean_squared_error 0.276567 2.874299 0.276567 2.874299 1 True 10 +31 XGBoost_BAG_L1/T3 -34.233626 root_mean_squared_error 0.951551 7.556990 0.951551 7.556990 1 True 24 +32 LightGBMXT_BAG_L1/T1 -34.380279 root_mean_squared_error 2.176811 8.076029 2.176811 8.076029 1 True 1 +33 XGBoost_BAG_L1/T2 -34.504350 root_mean_squared_error 0.427014 4.964226 0.427014 4.964226 1 True 23 +34 XGBoost_BAG_L1/T1 -34.599919 root_mean_squared_error 0.348029 4.369523 0.348029 4.369523 1 True 22 +35 LightGBMXT_BAG_L1/T3 -34.744666 root_mean_squared_error 4.043079 11.301724 4.043079 11.301724 1 True 3 +36 LightGBMXT_BAG_L1/T5 -35.250440 root_mean_squared_error 1.669801 7.287729 1.669801 7.287729 1 True 5 +37 LightGBMXT_BAG_L1/T2 -36.157242 root_mean_squared_error 1.859302 7.940133 1.859302 7.940133 1 True 2 +38 LightGBMXT_BAG_L1/T4 -36.445493 root_mean_squared_error 22.021861 18.097799 22.021861 18.097799 1 True 4 +39 XGBoost_BAG_L1/T5 -37.175294 root_mean_squared_error 0.402790 5.318174 0.402790 5.318174 1 True 26 +40 RandomForestMSE_BAG_L1 -38.453450 root_mean_squared_error 0.283051 2.879521 0.283051 2.879521 1 True 11 +41 XGBoost_BAG_L1/T4 -38.497062 root_mean_squared_error 1.311049 7.883113 1.311049 7.883113 1 True 25 +42 ExtraTreesMSE_BAG_L1 -38.530234 root_mean_squared_error 0.276609 1.277392 0.276609 1.277392 1 True 16 +43 NeuralNetFastAI_BAG_L1/c16cb6c4 -44.790416 root_mean_squared_error 0.300758 17.147795 0.300758 17.147795 1 True 17 +44 NeuralNetFastAI_BAG_L1/a8b2753d -60.136940 root_mean_squared_error 0.262607 13.606992 0.262607 13.606992 1 True 20 +45 NeuralNetFastAI_BAG_L1/a0c45eef -108.879407 root_mean_squared_error 0.222204 9.475138 0.222204 9.475138 1 True 18 +46 NeuralNetFastAI_BAG_L1/abcffeb1 -117.449543 root_mean_squared_error 0.191920 9.218434 0.191920 9.218434 1 True 19 +47 CatBoost_BAG_L2/T3 -119.668365 root_mean_squared_error 46.530171 182.305663 0.042619 2.108980 2 True 42 +48 CatBoost_BAG_L2/T4 -120.145455 root_mean_squared_error 46.532616 181.895744 0.045065 1.699061 2 True 43 +49 CatBoost_BAG_L2/T1 -120.308349 root_mean_squared_error 46.511923 181.706344 0.024371 1.509662 2 True 40 +50 CatBoost_BAG_L2/T2 -120.943340 root_mean_squared_error 46.518286 181.810238 0.030735 1.613555 2 True 41 +51 CatBoost_BAG_L1/T3 -134.164525 root_mean_squared_error 0.020250 1.506447 0.020250 1.506447 1 True 14 +52 CatBoost_BAG_L1/T4 -137.098754 root_mean_squared_error 0.017995 1.554463 0.017995 1.554463 1 True 15 +53 CatBoost_BAG_L1/T1 -140.403609 root_mean_squared_error 0.016952 1.436425 0.016952 1.436425 1 True 12 +54 CatBoost_BAG_L1/T2 -142.626577 root_mean_squared_error 0.016474 1.515989 0.016474 1.515989 1 True 13 +55 NeuralNetFastAI_BAG_L1/f04b1268 -146.162085 root_mean_squared_error 0.121667 5.319308 0.121667 5.319308 1 True 21 +Number of models trained: 56 +Types of models trained: +{'WeightedEnsembleModel', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_XGBoost'} +Bagging used: True (with 8 folds) +Multi-layer stack-ensembling used: True (with 3 levels) +Feature Metadata (Processed): +(raw dtype, special dtypes): +('category', []) : 2 | ['season', 'weather'] +('float', []) : 3 | ['temp', 'atemp', 'windspeed'] +('int', []) : 4 | ['humidity', 'month', 'hour', 'day'] +('int', ['bool']) : 3 | ['holiday', 'workingday', 'year'] +('int', ['datetime_as_int']) : 3 | ['datetime', 'datetime.year', 'datetime.day'] +*** End of fit() summary *** ++
/home/satyam/miniforge3/envs/udacity/lib/python3.9/site-packages/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1" + warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"') ++
{'model_types': {'LightGBMXT_BAG_L1/T1': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L1/T2': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L1/T3': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L1/T4': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L1/T5': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L1/T2': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L1/T3': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L1/T4': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L1/T5': 'StackerEnsembleModel_LGB', + 'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF', + 'CatBoost_BAG_L1/T1': 'StackerEnsembleModel_CatBoost', + 'CatBoost_BAG_L1/T2': 'StackerEnsembleModel_CatBoost', + 'CatBoost_BAG_L1/T3': 'StackerEnsembleModel_CatBoost', + 'CatBoost_BAG_L1/T4': 'StackerEnsembleModel_CatBoost', + 'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT', + 'NeuralNetFastAI_BAG_L1/c16cb6c4': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L1/a0c45eef': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L1/abcffeb1': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L1/a8b2753d': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L1/f04b1268': 'StackerEnsembleModel_NNFastAiTabular', + 'XGBoost_BAG_L1/T1': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L1/T2': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L1/T3': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L1/T4': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L1/T5': 'StackerEnsembleModel_XGBoost', + 'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB', + 'WeightedEnsemble_L2': 'WeightedEnsembleModel', + 'LightGBMXT_BAG_L2/T1': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L2/T2': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L2/T3': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L2/T4': 'StackerEnsembleModel_LGB', + 'LightGBMXT_BAG_L2/T5': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L2/T2': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L2/T3': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L2/T4': 'StackerEnsembleModel_LGB', + 'LightGBM_BAG_L2/T5': 'StackerEnsembleModel_LGB', + 'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF', + 'CatBoost_BAG_L2/T1': 'StackerEnsembleModel_CatBoost', + 'CatBoost_BAG_L2/T2': 'StackerEnsembleModel_CatBoost', + 'CatBoost_BAG_L2/T3': 'StackerEnsembleModel_CatBoost', + 'CatBoost_BAG_L2/T4': 'StackerEnsembleModel_CatBoost', + 'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT', + 'NeuralNetFastAI_BAG_L2/458c2ed5': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L2/75511c6e': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L2/04d21e8a': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L2/3647f6c5': 'StackerEnsembleModel_NNFastAiTabular', + 'NeuralNetFastAI_BAG_L2/d3405d78': 'StackerEnsembleModel_NNFastAiTabular', + 'XGBoost_BAG_L2/T1': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L2/T2': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L2/T3': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L2/T4': 'StackerEnsembleModel_XGBoost', + 'XGBoost_BAG_L2/T5': 'StackerEnsembleModel_XGBoost', + 'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB', + 'WeightedEnsemble_L3': 'WeightedEnsembleModel'}, + 'model_performance': {'LightGBMXT_BAG_L1/T1': -34.3802792861704, + 'LightGBMXT_BAG_L1/T2': -36.15724197890103, + 'LightGBMXT_BAG_L1/T3': -34.74466591940282, + 'LightGBMXT_BAG_L1/T4': -36.445492874837356, + 'LightGBMXT_BAG_L1/T5': -35.25044046525582, + 'LightGBM_BAG_L1/T1': -33.91758184996628, + 'LightGBM_BAG_L1/T2': -33.45970965506699, + 'LightGBM_BAG_L1/T3': -34.04595520416203, + 'LightGBM_BAG_L1/T4': -33.10585842026952, + 'LightGBM_BAG_L1/T5': -34.16406094732935, + 'RandomForestMSE_BAG_L1': -38.453450307199205, + 'CatBoost_BAG_L1/T1': -140.40360914007874, + 'CatBoost_BAG_L1/T2': -142.6265769446745, + 'CatBoost_BAG_L1/T3': -134.16452488636386, + 'CatBoost_BAG_L1/T4': -137.0987537197958, + 'ExtraTreesMSE_BAG_L1': -38.53023388531077, + 'NeuralNetFastAI_BAG_L1/c16cb6c4': -44.79041562109406, + 'NeuralNetFastAI_BAG_L1/a0c45eef': -108.87940684601578, + 'NeuralNetFastAI_BAG_L1/abcffeb1': -117.44954283483514, + 'NeuralNetFastAI_BAG_L1/a8b2753d': -60.13694030765092, + 'NeuralNetFastAI_BAG_L1/f04b1268': -146.16208476180705, + 'XGBoost_BAG_L1/T1': -34.59991853898211, + 'XGBoost_BAG_L1/T2': -34.504350048578715, + 'XGBoost_BAG_L1/T3': -34.23362587506233, + 'XGBoost_BAG_L1/T4': -38.497062186805955, + 'XGBoost_BAG_L1/T5': -37.175293976590616, + 'LightGBMLarge_BAG_L1': -33.75192367098915, + 'WeightedEnsemble_L2': -32.35712637226378, + 'LightGBMXT_BAG_L2/T1': -32.521346776306636, + 'LightGBMXT_BAG_L2/T2': -33.26024521606246, + 'LightGBMXT_BAG_L2/T3': -32.57764808071887, + 'LightGBMXT_BAG_L2/T4': -32.61285092155835, + 'LightGBMXT_BAG_L2/T5': -32.9272693891148, + 'LightGBM_BAG_L2/T1': -32.17184422478994, + 'LightGBM_BAG_L2/T2': -32.50710053987123, + 'LightGBM_BAG_L2/T3': -32.256869658150826, + 'LightGBM_BAG_L2/T4': -32.313636863235175, + 'LightGBM_BAG_L2/T5': -32.27972421580447, + 'RandomForestMSE_BAG_L2': -32.138782566243954, + 'CatBoost_BAG_L2/T1': -120.30834887320307, + 'CatBoost_BAG_L2/T2': -120.94334041853551, + 'CatBoost_BAG_L2/T3': -119.6683645713193, + 'CatBoost_BAG_L2/T4': -120.14545477741467, + 'ExtraTreesMSE_BAG_L2': -32.057369619984755, + 'NeuralNetFastAI_BAG_L2/458c2ed5': -32.09586512237218, + 'NeuralNetFastAI_BAG_L2/75511c6e': -32.13502476810471, + 'NeuralNetFastAI_BAG_L2/04d21e8a': -32.33789707533331, + 'NeuralNetFastAI_BAG_L2/3647f6c5': -33.640658514074, + 'NeuralNetFastAI_BAG_L2/d3405d78': -32.436303624998025, + 'XGBoost_BAG_L2/T1': -32.50745228584534, + 'XGBoost_BAG_L2/T2': -32.108923498289165, + 'XGBoost_BAG_L2/T3': -32.30862991645187, + 'XGBoost_BAG_L2/T4': -32.158893087015855, + 'XGBoost_BAG_L2/T5': -32.85466531035757, + 'LightGBMLarge_BAG_L2': -32.57112927015081, + 'WeightedEnsemble_L3': -31.634925381822697}, + 'model_best': 'WeightedEnsemble_L3', + 'model_paths': {'LightGBMXT_BAG_L1/T1': ['LightGBMXT_BAG_L1', 'T1'], + 'LightGBMXT_BAG_L1/T2': ['LightGBMXT_BAG_L1', 'T2'], + 'LightGBMXT_BAG_L1/T3': ['LightGBMXT_BAG_L1', 'T3'], + 'LightGBMXT_BAG_L1/T4': ['LightGBMXT_BAG_L1', 'T4'], + 'LightGBMXT_BAG_L1/T5': ['LightGBMXT_BAG_L1', 'T5'], + 'LightGBM_BAG_L1/T1': ['LightGBM_BAG_L1', 'T1'], + 'LightGBM_BAG_L1/T2': ['LightGBM_BAG_L1', 'T2'], + 'LightGBM_BAG_L1/T3': ['LightGBM_BAG_L1', 'T3'], + 'LightGBM_BAG_L1/T4': ['LightGBM_BAG_L1', 'T4'], + 'LightGBM_BAG_L1/T5': ['LightGBM_BAG_L1', 'T5'], + 'RandomForestMSE_BAG_L1': ['RandomForestMSE_BAG_L1'], + 'CatBoost_BAG_L1/T1': ['CatBoost_BAG_L1', 'T1'], + 'CatBoost_BAG_L1/T2': ['CatBoost_BAG_L1', 'T2'], + 'CatBoost_BAG_L1/T3': ['CatBoost_BAG_L1', 'T3'], + 'CatBoost_BAG_L1/T4': ['CatBoost_BAG_L1', 'T4'], + 'ExtraTreesMSE_BAG_L1': ['ExtraTreesMSE_BAG_L1'], + 'NeuralNetFastAI_BAG_L1/c16cb6c4': ['NeuralNetFastAI_BAG_L1', 'c16cb6c4'], + 'NeuralNetFastAI_BAG_L1/a0c45eef': ['NeuralNetFastAI_BAG_L1', 'a0c45eef'], + 'NeuralNetFastAI_BAG_L1/abcffeb1': ['NeuralNetFastAI_BAG_L1', 'abcffeb1'], + 'NeuralNetFastAI_BAG_L1/a8b2753d': ['NeuralNetFastAI_BAG_L1', 'a8b2753d'], + 'NeuralNetFastAI_BAG_L1/f04b1268': ['NeuralNetFastAI_BAG_L1', 'f04b1268'], + 'XGBoost_BAG_L1/T1': ['XGBoost_BAG_L1', 'T1'], + 'XGBoost_BAG_L1/T2': ['XGBoost_BAG_L1', 'T2'], + 'XGBoost_BAG_L1/T3': ['XGBoost_BAG_L1', 'T3'], + 'XGBoost_BAG_L1/T4': ['XGBoost_BAG_L1', 'T4'], + 'XGBoost_BAG_L1/T5': ['XGBoost_BAG_L1', 'T5'], + 'LightGBMLarge_BAG_L1': ['LightGBMLarge_BAG_L1'], + 'WeightedEnsemble_L2': ['WeightedEnsemble_L2'], + 'LightGBMXT_BAG_L2/T1': ['LightGBMXT_BAG_L2', 'T1'], + 'LightGBMXT_BAG_L2/T2': ['LightGBMXT_BAG_L2', 'T2'], + 'LightGBMXT_BAG_L2/T3': ['LightGBMXT_BAG_L2', 'T3'], + 'LightGBMXT_BAG_L2/T4': ['LightGBMXT_BAG_L2', 'T4'], + 'LightGBMXT_BAG_L2/T5': ['LightGBMXT_BAG_L2', 'T5'], + 'LightGBM_BAG_L2/T1': ['LightGBM_BAG_L2', 'T1'], + 'LightGBM_BAG_L2/T2': ['LightGBM_BAG_L2', 'T2'], + 'LightGBM_BAG_L2/T3': ['LightGBM_BAG_L2', 'T3'], + 'LightGBM_BAG_L2/T4': ['LightGBM_BAG_L2', 'T4'], + 'LightGBM_BAG_L2/T5': ['LightGBM_BAG_L2', 'T5'], + 'RandomForestMSE_BAG_L2': ['RandomForestMSE_BAG_L2'], + 'CatBoost_BAG_L2/T1': ['CatBoost_BAG_L2', 'T1'], + 'CatBoost_BAG_L2/T2': ['CatBoost_BAG_L2', 'T2'], + 'CatBoost_BAG_L2/T3': ['CatBoost_BAG_L2', 'T3'], + 'CatBoost_BAG_L2/T4': ['CatBoost_BAG_L2', 'T4'], + 'ExtraTreesMSE_BAG_L2': ['ExtraTreesMSE_BAG_L2'], + 'NeuralNetFastAI_BAG_L2/458c2ed5': ['NeuralNetFastAI_BAG_L2', '458c2ed5'], + 'NeuralNetFastAI_BAG_L2/75511c6e': ['NeuralNetFastAI_BAG_L2', '75511c6e'], + 'NeuralNetFastAI_BAG_L2/04d21e8a': ['NeuralNetFastAI_BAG_L2', '04d21e8a'], + 'NeuralNetFastAI_BAG_L2/3647f6c5': ['NeuralNetFastAI_BAG_L2', '3647f6c5'], + 'NeuralNetFastAI_BAG_L2/d3405d78': ['NeuralNetFastAI_BAG_L2', 'd3405d78'], + 'XGBoost_BAG_L2/T1': ['XGBoost_BAG_L2', 'T1'], + 'XGBoost_BAG_L2/T2': ['XGBoost_BAG_L2', 'T2'], + 'XGBoost_BAG_L2/T3': ['XGBoost_BAG_L2', 'T3'], + 'XGBoost_BAG_L2/T4': ['XGBoost_BAG_L2', 'T4'], + 'XGBoost_BAG_L2/T5': ['XGBoost_BAG_L2', 'T5'], + 'LightGBMLarge_BAG_L2': ['LightGBMLarge_BAG_L2'], + 'WeightedEnsemble_L3': ['WeightedEnsemble_L3']}, + 'model_fit_times': {'LightGBMXT_BAG_L1/T1': 8.07602858543396, + 'LightGBMXT_BAG_L1/T2': 7.940133094787598, + 'LightGBMXT_BAG_L1/T3': 11.301723957061768, + 'LightGBMXT_BAG_L1/T4': 18.097798824310303, + 'LightGBMXT_BAG_L1/T5': 7.287728786468506, + 'LightGBM_BAG_L1/T1': 3.7167015075683594, + 'LightGBM_BAG_L1/T2': 3.07292103767395, + 'LightGBM_BAG_L1/T3': 4.366904258728027, + 'LightGBM_BAG_L1/T4': 14.35549283027649, + 'LightGBM_BAG_L1/T5': 2.874298572540283, + 'RandomForestMSE_BAG_L1': 2.879520893096924, + 'CatBoost_BAG_L1/T1': 1.4364254474639893, + 'CatBoost_BAG_L1/T2': 1.515988826751709, + 'CatBoost_BAG_L1/T3': 1.5064473152160645, + 'CatBoost_BAG_L1/T4': 1.5544626712799072, + 'ExtraTreesMSE_BAG_L1': 1.2773916721343994, + 'NeuralNetFastAI_BAG_L1/c16cb6c4': 17.14779543876648, + 'NeuralNetFastAI_BAG_L1/a0c45eef': 9.475137710571289, + 'NeuralNetFastAI_BAG_L1/abcffeb1': 9.218433618545532, + 'NeuralNetFastAI_BAG_L1/a8b2753d': 13.60699200630188, + 'NeuralNetFastAI_BAG_L1/f04b1268': 5.319307804107666, + 'XGBoost_BAG_L1/T1': 4.369523048400879, + 'XGBoost_BAG_L1/T2': 4.964226245880127, + 'XGBoost_BAG_L1/T3': 7.556990146636963, + 'XGBoost_BAG_L1/T4': 7.883112907409668, + 'XGBoost_BAG_L1/T5': 5.318174362182617, + 'LightGBMLarge_BAG_L1': 4.07702112197876, + 'WeightedEnsemble_L2': 0.024283647537231445, + 'LightGBMXT_BAG_L2/T1': 1.8330581188201904, + 'LightGBMXT_BAG_L2/T2': 2.2857227325439453, + 'LightGBMXT_BAG_L2/T3': 1.9942741394042969, + 'LightGBMXT_BAG_L2/T4': 4.072777509689331, + 'LightGBMXT_BAG_L2/T5': 1.9548518657684326, + 'LightGBM_BAG_L2/T1': 2.026719808578491, + 'LightGBM_BAG_L2/T2': 2.776947259902954, + 'LightGBM_BAG_L2/T3': 2.474750518798828, + 'LightGBM_BAG_L2/T4': 5.182577848434448, + 'LightGBM_BAG_L2/T5': 2.2535572052001953, + 'RandomForestMSE_BAG_L2': 6.812941789627075, + 'CatBoost_BAG_L2/T1': 1.5096616744995117, + 'CatBoost_BAG_L2/T2': 1.6135551929473877, + 'CatBoost_BAG_L2/T3': 2.108980178833008, + 'CatBoost_BAG_L2/T4': 1.6990611553192139, + 'ExtraTreesMSE_BAG_L2': 1.799293041229248, + 'NeuralNetFastAI_BAG_L2/458c2ed5': 19.057864665985107, + 'NeuralNetFastAI_BAG_L2/75511c6e': 13.252812623977661, + 'NeuralNetFastAI_BAG_L2/04d21e8a': 18.074800968170166, + 'NeuralNetFastAI_BAG_L2/3647f6c5': 6.332346439361572, + 'NeuralNetFastAI_BAG_L2/d3405d78': 19.588415145874023, + 'XGBoost_BAG_L2/T1': 3.3180460929870605, + 'XGBoost_BAG_L2/T2': 3.110227584838867, + 'XGBoost_BAG_L2/T3': 6.245622158050537, + 'XGBoost_BAG_L2/T4': 2.147817373275757, + 'XGBoost_BAG_L2/T5': 9.120264768600464, + 'LightGBMLarge_BAG_L2': 3.085817813873291, + 'WeightedEnsemble_L3': 0.04247474670410156}, + 'model_pred_times': {'LightGBMXT_BAG_L1/T1': 2.1768107414245605, + 'LightGBMXT_BAG_L1/T2': 1.8593015670776367, + 'LightGBMXT_BAG_L1/T3': 4.043078660964966, + 'LightGBMXT_BAG_L1/T4': 22.0218608379364, + 'LightGBMXT_BAG_L1/T5': 1.6698014736175537, + 'LightGBM_BAG_L1/T1': 0.5321853160858154, + 'LightGBM_BAG_L1/T2': 0.1881561279296875, + 'LightGBM_BAG_L1/T3': 0.6923513412475586, + 'LightGBM_BAG_L1/T4': 7.440266847610474, + 'LightGBM_BAG_L1/T5': 0.2765674591064453, + 'RandomForestMSE_BAG_L1': 0.2830510139465332, + 'CatBoost_BAG_L1/T1': 0.016952037811279297, + 'CatBoost_BAG_L1/T2': 0.016474246978759766, + 'CatBoost_BAG_L1/T3': 0.02024984359741211, + 'CatBoost_BAG_L1/T4': 0.017995357513427734, + 'ExtraTreesMSE_BAG_L1': 0.276608943939209, + 'NeuralNetFastAI_BAG_L1/c16cb6c4': 0.30075788497924805, + 'NeuralNetFastAI_BAG_L1/a0c45eef': 0.22220444679260254, + 'NeuralNetFastAI_BAG_L1/abcffeb1': 0.19191980361938477, + 'NeuralNetFastAI_BAG_L1/a8b2753d': 0.2626068592071533, + 'NeuralNetFastAI_BAG_L1/f04b1268': 0.12166690826416016, + 'XGBoost_BAG_L1/T1': 0.34802865982055664, + 'XGBoost_BAG_L1/T2': 0.4270138740539551, + 'XGBoost_BAG_L1/T3': 0.9515507221221924, + 'XGBoost_BAG_L1/T4': 1.3110485076904297, + 'XGBoost_BAG_L1/T5': 0.4027900695800781, + 'LightGBMLarge_BAG_L1': 0.41625213623046875, + 'WeightedEnsemble_L2': 0.0003802776336669922, + 'LightGBMXT_BAG_L2/T1': 0.048723459243774414, + 'LightGBMXT_BAG_L2/T2': 0.04630160331726074, + 'LightGBMXT_BAG_L2/T3': 0.054788827896118164, + 'LightGBMXT_BAG_L2/T4': 0.452512264251709, + 'LightGBMXT_BAG_L2/T5': 0.04402494430541992, + 'LightGBM_BAG_L2/T1': 0.037631988525390625, + 'LightGBM_BAG_L2/T2': 0.033330678939819336, + 'LightGBM_BAG_L2/T3': 0.03908276557922363, + 'LightGBM_BAG_L2/T4': 0.2545182704925537, + 'LightGBM_BAG_L2/T5': 0.031723737716674805, + 'RandomForestMSE_BAG_L2': 0.32255077362060547, + 'CatBoost_BAG_L2/T1': 0.024370908737182617, + 'CatBoost_BAG_L2/T2': 0.030734777450561523, + 'CatBoost_BAG_L2/T3': 0.04261946678161621, + 'CatBoost_BAG_L2/T4': 0.045064687728881836, + 'ExtraTreesMSE_BAG_L2': 0.32038283348083496, + 'NeuralNetFastAI_BAG_L2/458c2ed5': 0.32746291160583496, + 'NeuralNetFastAI_BAG_L2/75511c6e': 0.21840739250183105, + 'NeuralNetFastAI_BAG_L2/04d21e8a': 0.369983434677124, + 'NeuralNetFastAI_BAG_L2/3647f6c5': 0.1599588394165039, + 'NeuralNetFastAI_BAG_L2/d3405d78': 0.27293872833251953, + 'XGBoost_BAG_L2/T1': 0.09573936462402344, + 'XGBoost_BAG_L2/T2': 0.07597851753234863, + 'XGBoost_BAG_L2/T3': 0.1399855613708496, + 'XGBoost_BAG_L2/T4': 0.06365561485290527, + 'XGBoost_BAG_L2/T5': 0.09450745582580566, + 'LightGBMLarge_BAG_L2': 0.07789063453674316, + 'WeightedEnsemble_L3': 0.0004062652587890625}, + 'num_bag_folds': 8, + 'max_stack_level': 3, + 'model_hyperparams': {'LightGBMXT_BAG_L1/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L1/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L1/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L1/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L1/T5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L1/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L1/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L1/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L1/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L1/T5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'RandomForestMSE_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'CatBoost_BAG_L1/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_BAG_L1/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_BAG_L1/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_BAG_L1/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'ExtraTreesMSE_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'NeuralNetFastAI_BAG_L1/c16cb6c4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L1/a0c45eef': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L1/abcffeb1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L1/a8b2753d': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L1/f04b1268': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L1/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L1/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L1/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L1/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L1/T5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMLarge_BAG_L1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'WeightedEnsemble_L2': {'use_orig_features': False, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L2/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L2/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L2/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L2/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMXT_BAG_L2/T5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L2/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L2/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L2/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L2/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBM_BAG_L2/T5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'RandomForestMSE_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'CatBoost_BAG_L2/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_BAG_L2/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_BAG_L2/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'CatBoost_BAG_L2/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'ExtraTreesMSE_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True, + 'use_child_oof': True}, + 'NeuralNetFastAI_BAG_L2/458c2ed5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L2/75511c6e': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L2/04d21e8a': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L2/3647f6c5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'NeuralNetFastAI_BAG_L2/d3405d78': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L2/T1': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L2/T2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L2/T3': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L2/T4': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'XGBoost_BAG_L2/T5': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'LightGBMLarge_BAG_L2': {'use_orig_features': True, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}, + 'WeightedEnsemble_L3': {'use_orig_features': False, + 'max_base_models': 25, + 'max_base_models_per_type': 5, + 'save_bag_folds': True}}, + 'leaderboard': model score_val eval_metric \ + 0 WeightedEnsemble_L3 -31.634925 root_mean_squared_error + 1 ExtraTreesMSE_BAG_L2 -32.057370 root_mean_squared_error + 2 NeuralNetFastAI_BAG_L2/458c2ed5 -32.095865 root_mean_squared_error + 3 XGBoost_BAG_L2/T2 -32.108923 root_mean_squared_error + 4 NeuralNetFastAI_BAG_L2/75511c6e -32.135025 root_mean_squared_error + 5 RandomForestMSE_BAG_L2 -32.138783 root_mean_squared_error + 6 XGBoost_BAG_L2/T4 -32.158893 root_mean_squared_error + 7 LightGBM_BAG_L2/T1 -32.171844 root_mean_squared_error + 8 LightGBM_BAG_L2/T3 -32.256870 root_mean_squared_error + 9 LightGBM_BAG_L2/T5 -32.279724 root_mean_squared_error + 10 XGBoost_BAG_L2/T3 -32.308630 root_mean_squared_error + 11 LightGBM_BAG_L2/T4 -32.313637 root_mean_squared_error + 12 NeuralNetFastAI_BAG_L2/04d21e8a -32.337897 root_mean_squared_error + 13 WeightedEnsemble_L2 -32.357126 root_mean_squared_error + 14 NeuralNetFastAI_BAG_L2/d3405d78 -32.436304 root_mean_squared_error + 15 LightGBM_BAG_L2/T2 -32.507101 root_mean_squared_error + 16 XGBoost_BAG_L2/T1 -32.507452 root_mean_squared_error + 17 LightGBMXT_BAG_L2/T1 -32.521347 root_mean_squared_error + 18 LightGBMLarge_BAG_L2 -32.571129 root_mean_squared_error + 19 LightGBMXT_BAG_L2/T3 -32.577648 root_mean_squared_error + 20 LightGBMXT_BAG_L2/T4 -32.612851 root_mean_squared_error + 21 XGBoost_BAG_L2/T5 -32.854665 root_mean_squared_error + 22 LightGBMXT_BAG_L2/T5 -32.927269 root_mean_squared_error + 23 LightGBM_BAG_L1/T4 -33.105858 root_mean_squared_error + 24 LightGBMXT_BAG_L2/T2 -33.260245 root_mean_squared_error + 25 LightGBM_BAG_L1/T2 -33.459710 root_mean_squared_error + 26 NeuralNetFastAI_BAG_L2/3647f6c5 -33.640659 root_mean_squared_error + 27 LightGBMLarge_BAG_L1 -33.751924 root_mean_squared_error + 28 LightGBM_BAG_L1/T1 -33.917582 root_mean_squared_error + 29 LightGBM_BAG_L1/T3 -34.045955 root_mean_squared_error + 30 LightGBM_BAG_L1/T5 -34.164061 root_mean_squared_error + 31 XGBoost_BAG_L1/T3 -34.233626 root_mean_squared_error + 32 LightGBMXT_BAG_L1/T1 -34.380279 root_mean_squared_error + 33 XGBoost_BAG_L1/T2 -34.504350 root_mean_squared_error + 34 XGBoost_BAG_L1/T1 -34.599919 root_mean_squared_error + 35 LightGBMXT_BAG_L1/T3 -34.744666 root_mean_squared_error + 36 LightGBMXT_BAG_L1/T5 -35.250440 root_mean_squared_error + 37 LightGBMXT_BAG_L1/T2 -36.157242 root_mean_squared_error + 38 LightGBMXT_BAG_L1/T4 -36.445493 root_mean_squared_error + 39 XGBoost_BAG_L1/T5 -37.175294 root_mean_squared_error + 40 RandomForestMSE_BAG_L1 -38.453450 root_mean_squared_error + 41 XGBoost_BAG_L1/T4 -38.497062 root_mean_squared_error + 42 ExtraTreesMSE_BAG_L1 -38.530234 root_mean_squared_error + 43 NeuralNetFastAI_BAG_L1/c16cb6c4 -44.790416 root_mean_squared_error + 44 NeuralNetFastAI_BAG_L1/a8b2753d -60.136940 root_mean_squared_error + 45 NeuralNetFastAI_BAG_L1/a0c45eef -108.879407 root_mean_squared_error + 46 NeuralNetFastAI_BAG_L1/abcffeb1 -117.449543 root_mean_squared_error + 47 CatBoost_BAG_L2/T3 -119.668365 root_mean_squared_error + 48 CatBoost_BAG_L2/T4 -120.145455 root_mean_squared_error + 49 CatBoost_BAG_L2/T1 -120.308349 root_mean_squared_error + 50 CatBoost_BAG_L2/T2 -120.943340 root_mean_squared_error + 51 CatBoost_BAG_L1/T3 -134.164525 root_mean_squared_error + 52 CatBoost_BAG_L1/T4 -137.098754 root_mean_squared_error + 53 CatBoost_BAG_L1/T1 -140.403609 root_mean_squared_error + 54 CatBoost_BAG_L1/T2 -142.626577 root_mean_squared_error + 55 NeuralNetFastAI_BAG_L1/f04b1268 -146.162085 root_mean_squared_error + + pred_time_val fit_time pred_time_val_marginal fit_time_marginal \ + 0 48.001303 234.019050 0.000406 0.042475 + 1 46.807935 181.995976 0.320383 1.799293 + 2 46.815015 199.254547 0.327463 19.057865 + 3 46.563530 183.306910 0.075979 3.110228 + 4 46.705959 193.449495 0.218407 13.252813 + 5 46.810102 187.009624 0.322551 6.812942 + 6 46.551207 182.344500 0.063656 2.147817 + 7 46.525184 182.223403 0.037632 2.026720 + 8 46.526634 182.671433 0.039083 2.474751 + 9 46.519275 182.450240 0.031724 2.253557 + 10 46.627537 186.442305 0.139986 6.245622 + 11 46.742070 185.379261 0.254518 5.182578 + 12 46.857535 198.271484 0.369983 18.074801 + 13 11.939492 66.857503 0.000380 0.024284 + 14 46.760490 199.785098 0.272939 19.588415 + 15 46.520882 182.973630 0.033331 2.776947 + 16 46.583291 183.514729 0.095739 3.318046 + 17 46.536275 182.029741 0.048723 1.833058 + 18 46.565442 183.282501 0.077891 3.085818 + 19 46.542341 182.190957 0.054789 1.994274 + 20 46.940064 184.269460 0.452512 4.072778 + 21 46.582059 189.316947 0.094507 9.120265 + 22 46.531577 182.151535 0.044025 1.954852 + 23 7.440267 14.355493 7.440267 14.355493 + 24 46.533853 182.482405 0.046302 2.285723 + 25 0.188156 3.072921 0.188156 3.072921 + 26 46.647511 186.529029 0.159959 6.332346 + 27 0.416252 4.077021 0.416252 4.077021 + 28 0.532185 3.716702 0.532185 3.716702 + 29 0.692351 4.366904 0.692351 4.366904 + 30 0.276567 2.874299 0.276567 2.874299 + 31 0.951551 7.556990 0.951551 7.556990 + 32 2.176811 8.076029 2.176811 8.076029 + 33 0.427014 4.964226 0.427014 4.964226 + 34 0.348029 4.369523 0.348029 4.369523 + 35 4.043079 11.301724 4.043079 11.301724 + 36 1.669801 7.287729 1.669801 7.287729 + 37 1.859302 7.940133 1.859302 7.940133 + 38 22.021861 18.097799 22.021861 18.097799 + 39 0.402790 5.318174 0.402790 5.318174 + 40 0.283051 2.879521 0.283051 2.879521 + 41 1.311049 7.883113 1.311049 7.883113 + 42 0.276609 1.277392 0.276609 1.277392 + 43 0.300758 17.147795 0.300758 17.147795 + 44 0.262607 13.606992 0.262607 13.606992 + 45 0.222204 9.475138 0.222204 9.475138 + 46 0.191920 9.218434 0.191920 9.218434 + 47 46.530171 182.305663 0.042619 2.108980 + 48 46.532616 181.895744 0.045065 1.699061 + 49 46.511923 181.706344 0.024371 1.509662 + 50 46.518286 181.810238 0.030735 1.613555 + 51 0.020250 1.506447 0.020250 1.506447 + 52 0.017995 1.554463 0.017995 1.554463 + 53 0.016952 1.436425 0.016952 1.436425 + 54 0.016474 1.515989 0.016474 1.515989 + 55 0.121667 5.319308 0.121667 5.319308 + + stack_level can_infer fit_order + 0 3 True 56 + 1 2 True 44 + 2 2 True 45 + 3 2 True 51 + 4 2 True 46 + 5 2 True 39 + 6 2 True 53 + 7 2 True 34 + 8 2 True 36 + 9 2 True 38 + 10 2 True 52 + 11 2 True 37 + 12 2 True 47 + 13 2 True 28 + 14 2 True 49 + 15 2 True 35 + 16 2 True 50 + 17 2 True 29 + 18 2 True 55 + 19 2 True 31 + 20 2 True 32 + 21 2 True 54 + 22 2 True 33 + 23 1 True 9 + 24 2 True 30 + 25 1 True 7 + 26 2 True 48 + 27 1 True 27 + 28 1 True 6 + 29 1 True 8 + 30 1 True 10 + 31 1 True 24 + 32 1 True 1 + 33 1 True 23 + 34 1 True 22 + 35 1 True 3 + 36 1 True 5 + 37 1 True 2 + 38 1 True 4 + 39 1 True 26 + 40 1 True 11 + 41 1 True 25 + 42 1 True 16 + 43 1 True 17 + 44 1 True 20 + 45 1 True 18 + 46 1 True 19 + 47 2 True 42 + 48 2 True 43 + 49 2 True 40 + 50 2 True 41 + 51 1 True 14 + 52 1 True 15 + 53 1 True 12 + 54 1 True 13 + 55 1 True 21 }+
# Remember to set all negative values to zero
+predictions = predictor_new_hpo.predict(test)
+predictions[predictions < 0] = 0
+
predictions.describe()
+
count 6493.000000 +mean 190.807175 +std 173.615158 +min 2.907167 +25% 45.742661 +50% 147.337540 +75% 281.603760 +max 896.487915 +Name: count, dtype: float64+
# Same submitting predictions
+submission["count"] = predictions
+submission.to_csv("submission_new_hpo.csv", index=False)
+
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"
+
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/satyam/.kaggle/kaggle.json' +100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 583kB/s] +Successfully submitted to Bike Sharing Demand+
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
+
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/satyam/.kaggle/kaggle.json' +fileName date description status publicScore privateScore +--------------------------- ------------------- --------------------------------- -------- ----------- ------------ +submission_new_hpo.csv 2024-08-14 04:37:04 new features with hyperparameters complete 0.45780 0.45780 +submission_new_features.csv 2024-08-14 03:23:44 new features complete 0.75394 0.75394 +submission.csv 2024-08-14 03:03:28 first raw submission complete 1.79357 1.79357 ++
New Score of 0.45780
¶
+# Taking the top model score from each training run and creating a line plot to show improvement
+# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
+fig = pd.DataFrame(
+ {
+ "model": ["initial", "add_features", "hpo"],
+ "score": [50.75, 28.41, 31.63]
+ }
+).plot(x="model", y="score", figsize=(8, 6)).get_figure()
+fig.savefig('model_train_score.png')
+
# Take the 3 kaggle scores and creating a line plot to show improvement
+fig = pd.DataFrame(
+ {
+ "test_eval": ["initial", "add_features", "hpo"],
+ "score": [1.79, 0.75, 0.45]
+ }
+).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
+fig.savefig('model_test_score.png')
+
Hyperparameter table¶
+import numpy as np
+
# The 3 hyperparameters we tuned with the kaggle score as the result
+pd.DataFrame({
+ "model": ["initial", "add_features", "hpo"],
+ "time_limit": [600, 600, 1000],
+ "problem_type": [np.nan, np.nan, "regression"],
+ "catboost": [np.nan, np.nan, "learning_rate: 0.03, iterations: 15, l2_leaf_reg: 0.125"],
+ "score": [1.79, 0.75, 0.45]
+})
+
+ | model | +time_limit | +problem_type | +catboost | +score | +
---|---|---|---|---|---|
0 | +initial | +600 | +NaN | +NaN | +1.79 | +
1 | +add_features | +600 | +NaN | +NaN | +0.75 | +
2 | +hpo | +1000 | +regression | +learning_rate: 0.03, iterations: 15, l2_leaf_reg: 0.125 | +0.45 | +
train.corr()
+
+ | datetime | +season | +holiday | +workingday | +weather | +temp | +atemp | +humidity | +windspeed | +casual | +registered | +count | +year | +month | +hour | +day | +
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
datetime | +1.000000 | +0.480021 | +0.010988 | +-0.003658 | +-0.005048 | +0.180986 | +0.181823 | +0.032856 | +-0.086888 | +0.172728 | +0.314879 | +0.310187 | +0.866570 | +0.494087 | +-0.005663 | +-0.004676 | +
season | +0.480021 | +1.000000 | +0.029368 | +-0.008126 | +0.008879 | +0.258689 | +0.264744 | +0.190610 | +-0.147121 | +0.096758 | +0.164011 | +0.163439 | +-0.004797 | +0.971524 | +-0.006546 | +-0.010553 | +
holiday | +0.010988 | +0.029368 | +1.000000 | +-0.250491 | +-0.007074 | +0.000295 | +-0.005215 | +0.001929 | +0.008409 | +0.043799 | +-0.020956 | +-0.005393 | +0.012021 | +0.001731 | +-0.000354 | +-0.191832 | +
workingday | +-0.003658 | +-0.008126 | +-0.250491 | +1.000000 | +0.033772 | +0.029966 | +0.024660 | +-0.010880 | +0.013373 | +-0.319111 | +0.119460 | +0.011594 | +-0.002482 | +-0.003394 | +0.002780 | +-0.704267 | +
weather | +-0.005048 | +0.008879 | +-0.007074 | +0.033772 | +1.000000 | +-0.055035 | +-0.055376 | +0.406244 | +0.007261 | +-0.135918 | +-0.109340 | +-0.128655 | +-0.012548 | +0.012144 | +-0.022740 | +-0.047692 | +
temp | +0.180986 | +0.258689 | +0.000295 | +0.029966 | +-0.055035 | +1.000000 | +0.984948 | +-0.064949 | +-0.017852 | +0.467097 | +0.318571 | +0.394454 | +0.061226 | +0.257589 | +0.145430 | +-0.038466 | +
atemp | +0.181823 | +0.264744 | +-0.005215 | +0.024660 | +-0.055376 | +0.984948 | +1.000000 | +-0.043536 | +-0.057473 | +0.462067 | +0.314635 | +0.389784 | +0.058540 | +0.264173 | +0.140343 | +-0.040235 | +
humidity | +0.032856 | +0.190610 | +0.001929 | +-0.010880 | +0.406244 | +-0.064949 | +-0.043536 | +1.000000 | +-0.318607 | +-0.348187 | +-0.265458 | +-0.317371 | +-0.078606 | +0.204537 | +-0.278011 | +-0.026507 | +
windspeed | +-0.086888 | +-0.147121 | +0.008409 | +0.013373 | +0.007261 | +-0.017852 | +-0.057473 | +-0.318607 | +1.000000 | +0.092276 | +0.091052 | +0.101369 | +-0.015221 | +-0.150192 | +0.146631 | +-0.024804 | +
casual | +0.172728 | +0.096758 | +0.043799 | +-0.319111 | +-0.135918 | +0.467097 | +0.462067 | +-0.348187 | +0.092276 | +1.000000 | +0.497250 | +0.690414 | +0.145241 | +0.092722 | +0.302045 | +0.246959 | +
registered | +0.314879 | +0.164011 | +-0.020956 | +0.119460 | +-0.109340 | +0.318571 | +0.314635 | +-0.265458 | +0.091052 | +0.497250 | +1.000000 | +0.970948 | +0.264265 | +0.169451 | +0.380540 | +-0.084427 | +
count | +0.310187 | +0.163439 | +-0.005393 | +0.011594 | +-0.128655 | +0.394454 | +0.389784 | +-0.317371 | +0.101369 | +0.690414 | +0.970948 | +1.000000 | +0.260403 | +0.166862 | +0.400601 | +-0.002283 | +
year | +0.866570 | +-0.004797 | +0.012021 | +-0.002482 | +-0.012548 | +0.061226 | +0.058540 | +-0.078606 | +-0.015221 | +0.145241 | +0.264265 | +0.260403 | +1.000000 | +-0.004932 | +-0.004234 | +-0.003785 | +
month | +0.494087 | +0.971524 | +0.001731 | +-0.003394 | +0.012144 | +0.257589 | +0.264173 | +0.204537 | +-0.150192 | +0.092722 | +0.169451 | +0.166862 | +-0.004932 | +1.000000 | +-0.006818 | +-0.002266 | +
hour | +-0.005663 | +-0.006546 | +-0.000354 | +0.002780 | +-0.022740 | +0.145430 | +0.140343 | +-0.278011 | +0.146631 | +0.302045 | +0.380540 | +0.400601 | +-0.004234 | +-0.006818 | +1.000000 | +-0.002925 | +
day | +-0.004676 | +-0.010553 | +-0.191832 | +-0.704267 | +-0.047692 | +-0.038466 | +-0.040235 | +-0.026507 | +-0.024804 | +0.246959 | +-0.084427 | +-0.002283 | +-0.003785 | +-0.002266 | +-0.002925 | +1.000000 | +
+