Skip to content

It is about performing classification task on Forest CoverType dataset from the UCI KDD archive. Dataset link (https://www.kaggle.com/uciml/forest-cover-type-dataset)

Notifications You must be signed in to change notification settings

dbaofd/spark_forest_cover_type_classification

Repository files navigation

++++++++++++++++++++File List++++++++++++++++++++
Forest_Cover_Data_Visualization.ipynb
Forest_Cover_Decision_Tree.ipynb
Forest_Cover_Decision_Tree_Cross_Validation.ipynb
Forest_Cover_Logistic_Regression.ipynb
Forest_Cover_Multilayer_Perceptron.ipynb
plot_tool.py
load_dataset.py
+++++++++++++++++++++++++++++++++++++++++++++++++
Forest_Cover_Data_Visualization.ipynb
Detail:
In this file you can perform data visualization.
You can also visualize the data by using pca1 and pca2.
+++++++++++++++++++++++++++++++++++++++++++++++++
Forest_Cover_Decision_Tree.ipynb
Forest_Cover_Decision_Tree_Cross_Validation.ipynb
Detail:
These two file almost the same. The only difference is the second has the cross validation
line chart. I did the 10 fold cross validation.
You can use these two files to train and evaluate decision tree model.
+++++++++++++++++++++++++++++++++++++++++++++++++
Forest_Cover_Logistic_Regression.ipynb
This file is about training logistic regression model to do classification on the dataset.
You can train and evaluate the model.
The performance is similar with decision tree.
+++++++++++++++++++++++++++++++++++++++++++++++++
Forest_Cover_Multilayer_Perceptron.ipynb
This file is about training multilayer perceptron to do classification on the dataset.
You can train and evaluate the model.
The performance is worst amony the three methods.
+++++++++++++++++++++++++++++++++++++++++++++++++
plot_tool.py
Details:
This Python file provides two functions, bar chart plot and pca chart plot.
+++++++++++++++++++++++++++++++++++++++++++++++++
load_dataset.py
Details:
This Python file provides some necessary functions for loading data when preparing for training data.
+++++++++++++++++++++++++++++++++++++++++++++++++
lrm_model8.model
lrm_model9.model
lrm_model10.model
Details:
These three files are trained logistic regression model, you can load it and evaluate it, 
Or you can use it to make predictions. 

About

It is about performing classification task on Forest CoverType dataset from the UCI KDD archive. Dataset link (https://www.kaggle.com/uciml/forest-cover-type-dataset)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published