Cuba Land Cover Classification Model Testing

INTRODUCTION • INSTALLATION • USAGE • PRELIMINARY RESULTS

Introduction

This repository contains code for a land cover classification analysis for Cuba using Landsat 8 imagery. 4 different models are created, optimized, tested, and compared. The models are:

CART (Decision Tree)
Random Forest
XGBoost
Neural Network

Installation

To install environment dependencies using conda, run the following command in the terminal:

conda env create --name cuba_classification --file=environment.yml

Then to activate the environment you can use:

conda activate cuba_classification

You can see independent packages and their versions in the environment.yml file.

You can now use the cuba_classification environment to run the code in the analysis.ipynb notebook.

Usage

To comply with assignment guidelines, the entirety of the code is in the analysis.ipynb notebook.

To run the code, you will need to a Google Earth Engine account. When the notebook is first run, it will open a browser window asking you to authenticate your account. Once you have done this, it will give you an access token which you can copy and paste into the notebook (where it requests it). You only need to do this once.

Due to long runtimes during model training and preprocessing, the various models and some of the preprocessed data is saved in the models and temporary folders. If you want to re-run all code, you can delete these folders and re-run the notebook. Note: This will take multiple hours. If you want to re-run only parts of the code, you can delete the specific files you want to re-calculate.

Preliminary Results

Model Setup • Prediction Accuracies • Model Performance • Feature Importance • Testing on Isla de la Juventud • Neural Network Structure

Model Setup

All models were trained using the same 75/25 train-test split. Each of the 6 land cover classes were randomly sampled for 10,000 samples (60,000 total). The input data was the 8 30-meter resolution bands of the Landsat 8 imagery. Landsat imagery was stitched together to create mosaic covering the whole of Cuba. Preprocessing then included removing clouds and getting the median values over a year to reduce noise.

Architecture for the neural network can be seen below (Neural Network Structure).

The CART, Random Forest, and XGBoost where tested with 5-fold cross-validation to determine the best hyperparameters (max depth, number of estimators, minimum samples split). The best hyperparameters were then used to train the models on the entire training set.

Prediction Accuracies

Model	Training Accuracy	Testing Accuracy	Difference
Neural Network	$0.953312$	$0.948697$	$0.004615$
Random Forest	$0.992581$	$0.933700$	$0.058881$
XGBoost	$1.000000$	$0.936889$	$0.063111$
CART	$0.990997$	$0.914020$	$0.076977$

Model Performance

The confusion matrix provides a detailed breakdown of the model's predictions for each class.

We can also calculate the precision, recall, and F1-score for each class. All models excelled at predicting shallow and deep water, but struggled most with differentiating between barren and agriculture lands. Overall, all models performed well; however, the neural network model exhibited the best performance (particularly at detecting urban areas).

CART	Random Forest

XGBoost	Neural Network

Feature Importance

The feature importance for the CART, Random Forest, and XGBoost models are shown below. The neural network model does not have feature importance, as it is a black-box model. Predictably, bands 5, 4, and 3 (NIR, Red, and Green) are the most important for all models. Why this was expected will be further discussed in the final report.

CART	Random Forest	XGBoost

Scaled Bands for Isla de La Juventud

Testing on Isla de la Juventud

We can run the models on a portion of Cuba to visualize the differences in predictions. Due to processing restrictions, the models were ran on Cuba's second largest island, Isla de la Juventud. The predictions are shown below.

Predicted Land Cover | Reference Scaled RGB Image

Neural Network Structure

The neural network structure was determined through an iterative trial and error process to in order to maximize accuracy.

This sequential model starts with two sets of Conv1D layers with ReLU activation and 'same' padding to preserve spatial information. Dropout layers are included after each convolutional set to prevent overfitting. Subsequently, two dense layers with decreasing neuron counts are employed for further feature processing and dimensionality reduction. The output layer comprises a Dense layer with softmax activation, outputting probabilities for the 6 land cover classes. The model is trained using the Adam optimizer with categorical cross-entropy loss, and early stopping is employed to prevent overfitting and improve efficiency during training.

Neural Network Training

The training history of the neural network model is shown below:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
images		images
models		models
.gitignore		.gitignore
README.md		README.md
analysis.html		analysis.html
analysis.ipynb		analysis.ipynb
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cuba Land Cover Classification Model Testing

Introduction

Installation

Usage