Comparing Machine Learning algorithms for assessing individual tree mortality: code and data

📂 Repository DOI:

✨ Highlights

6 different Machine Learning algorithms were compared in predicting individual tree mortality.
Effects of dataset size, variable set, thinning, inventory length, and cross-validation were studied.
Random Forest reached a higher performance level in all the case studies proposed except on cross-validation.
Logistic binomial Regression seems to be a more robust algorithm regarding cross-validation.

📖 Abstract

Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought and pathogens are driving tree mortality, but the underlying mechanism is difficult to model. The current environmental changes are even complicating model approaches as they influence alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns able to model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality, but different Machine Learning alternatives already applied to other forest modelling topics can be used for this purpose. Here, we compare the performance of five different Machine Learning algorithms with Logistic binomial Regression in individual tree mortality classification: Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine. Different case studies involving dataset size, number of variables, thinning degrees and inventory record length were conducted to assess the performance of each algorithm. Additionally, a cross-validation among thinning degrees was performed. Our results demonstrate the potential of Machine Learning in assessing tree mortality, while results interpretability can reduce its utility.

📁 Repository Contents

💾 1_data:
- ☀️ climate data obtained from WorldClim data (not provided here)
- 🌳 tree and plot data (test data from DEN experimental plots, original data under serious request)
  - 0_raw contains the original data (DEN experiemtnal plots) regarding coordinates, tree inventory, plot age and thinning grades
  - tmp_DEN are different checkpoints saved after the use of each R code file, in order to explore the evolution of the data used. It works with the test information until the analysis (section 5)
  - 1_original_df contains the results of the original analysis (full dataset) with all the metrics tested in that study (but without tree and plot original data). It can be used from section 6 in advance
💻 2_code: detailed R scripts, purpose, inputs and outputs are summarized below.

⚠️ Code was prepared to be used with the full dataset; code from section 5 to the end was not adapted to be used with the test dataset, but from section 6 in advance it can be used with the checkpoints of the original dataset (1_original_df) ⚠️

Script Name	Purpose	Input	Output
`0_data_curation.r`	Manage initial data: structure, IDs, input missing data	`1_raw/final/VF_daten.xlsx` `1_raw/final/Fi-Daten__age.xlsx`	`3_final/0_initial_df_clean/initial_df_clean.csv`
`1.0_neighborhood_main.r` `1.1_neighborhood_functions.r`	Calculate variables needed for the analysis using a subplot of 0.33*h radii around each tree	`1_data/3_final/0_initial_df_clean/initial_df_clean.csv`	`1_data/3_final/1_neighborhood/*` `trees_r33.csv` `subplot_stats_r33.csv` `neighborhood_stats_r33.csv`
`2_climate_data.r`	Calculate climate variables by plot location. ⚠️ It requires to download several climate information from WorldClim previously, write to the authors for more details ⚠️	`1_raw/final/Koordinaten.xlsx`	`1_data/3_final/2_clima/df_complete_r33.csv`
`3_feature_visualization.r`	Code to make graphs and manually study variable relationships	`1_data/3_final/2_clima/df_complete_r33.csv`	None
`4.0_split_dataset.r` `4.1_split_variables.r` `4.2_functions_var_combis.r`	Code to split datasets (size and thinning) and variables to develop case studies, excluding first plot measurement when needed. ⚠️ It doesn't work as desired with the test dataset, as most of the information is missing ⚠️	`1_data/3_final/2_clima/df_complete_r33.csv`	`1_data/3_final/4_datasets/*`
`5.0_run_analysis.r` `5.1_LR_analysis.r` `5.2_DT_analysis.r` `5.3_RF_analysis.r` `5.4_NB_analysis.r` `5.5_KNN_analysis.r` `5.6_SVM_analysis.r`	Code to run all the analysis (except ANN) on R for the different case studies	`1_data/3_final/4_datasets/*`	`1_data/3_final/5_analysis/case_study/*` `metrics.RData` `models.RData`
`6_HPC`	Code used to run all the simulations on iuFOR HPC, splitted by study case	`1_data/3_final/4_datasets/*`	`1_data/3_final/5_analysis/case_study/*` `metrics.RData` `models.RData`
`7_metrics_compilation.r`	Get metrics from R analysis, calculate ANN metrics, and create a checkpoint	`1_data/3_final/5_analysis/` `case_study/metrics.RData` `ann/preds/case_study/` `ann/timer/case_study/*`	`1_data/3_final/6_final_results/case_study/final_metrics.RData`
`8.0_performance_graphs.r` `8.1_functions_performance_graphs.r` `8.2_classifiers_comparison.r` `8.3_graph_functions.r` `8.4_application_thinning.r` `8.5_application_thinning_comparison.r` `8.6_time_and_performance_graphs.r`	Code to compare the metrics for each analysis and case study using different graphs. Graphs of the original paper used code 8.5 and 8.6	`1_data/3_final/6_final_results/case_study/final_metrics.RData`	`2_scripts/4_figures/*`
`9.0_location_map.r` `9.1_neighbour_graphs.r` `9.2_mortality_graphs.r` `9.3_df_mortality_rates` `9.4_paper_tables.r`	Code used to perform graphs of location, neighborhood and get information to provide tables in the original paper.	`1_raw/final/Koordinaten.xlsx` `1_data/3_final/2_clima/df_complete_r33.csv` `1_data/3_final/1_neighborhood/trees_r33.csv`	`2_scripts/4_figures/*`

📊 3_figures: graphs and figures used in the article and additional information:
- final_figures contains the figures used in the original paper
- tmp_figures contains different graphs used to understand the results of each analysis and metric
📚 4_bibliography: recompilation of all the references used in the article

🤔 How to Use

In order to reproduce the analysis, users must:

💾 Data:
- According to the original paper conditions (Due to the sensitive nature of the data, raw data would remain available only under serious requests.), data must be requested from the authors
- WordClim data needed to develop simulations must be downloaded from its original website
💻 Prerequisites: installation and code: R must be installed to run the code with the used libraries across each script (RStudio was also used to develop the code). Some analyses (specifically when training RF models) will request high computation power, which can provoke out-of-memory in a normal computer. Access to high-computing services is highly recommended in those cases.
📜 Usage: The previous chapter details the purpose, input and output of each script.

📚 Additional Information

A flowchart detailing the training and testing process (scripts from groups 5 and 6) is shown here:

ℹ️ License

The content of this repository is under the MIT license.

🔗 About the authors

Aitor Vázquez Veloso: ORCID \ UVa

Astor Toraño Caicoya: ORDID \ TUM

Felipe Bravo Oviedo: ORCID \ UVa

Peter Biber: ORDID \ TUM

Enno Uhl: ORDID \ TUM

Hans Pretzsch: ORDID \ TUM

Comparing Machine Learning algorithms for assessing individual tree mortality

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
1_data		1_data
2_code		2_code
3_figures		3_figures
4_bibliography		4_bibliography
.Rhistory		.Rhistory
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Machine Learning algorithms for assessing individual tree mortality: code and data

✨ Highlights

📖 Abstract

📁 Repository Contents

🤔 How to Use

📚 Additional Information

ℹ️ License

🔗 About the authors

About

Releases 3

Packages

Languages

License

aitorvv/ML_individual_tree_mortality

Folders and files

Latest commit

History

Repository files navigation

Comparing Machine Learning algorithms for assessing individual tree mortality: code and data

✨ Highlights

📖 Abstract

📁 Repository Contents

🤔 How to Use

📚 Additional Information

ℹ️ License

🔗 About the authors

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages