In this project, we are going to classify patients with euthyroid sick syndrome (ESS) using machine learning approaches. The dataset is from the UCI Machine Learning Repository. The dataset contains 3772 instances and 25 attributes. The dataset is imbalanced with 95% of the instances are labeled as "negative" and 5% are labeled as "positive". The goal of this project is to build a model that can classify patients with ESS with high accuracy.
What things you need to have to be able to run:
- Python 3.6 +
- Pip 3+
- VirtualEnvWrapper is recommended but not mandatory
$ pip install -r requirements.txt
Euthyroid is a term used to describe a normal thyroid function. The thyroid is a gland located in the neck that produces hormones that regulate the body's metabolism. These hormones, called triiodothyronine (T3) and thyroxine (T4), help to control the body's energy levels and metabolism, as well as heart rate and body temperature.
A euthyroid state means that the thyroid is functioning normally and producing the appropriate amount of hormones. The levels of T3 and T4 are within the normal range and the thyroid-stimulating hormone (TSH) produced by the pituitary gland is also within the normal range. This is the typical state for most people, and having euthyroid status is important for maintaining overall health and well-being.
However, if the thyroid gland is underactive (hypothyroidism) or overactive (hyperthyroidism) it will affect the levels of T3, T4, and TSH and can cause symptoms such as fatigue, weight gain or loss, changes in heart rate and many others. In those cases, the treatment is usually hormone replacement therapy.
- Levothyroxine (T4 /T4U)
- Triiodothyronine (T3)
- Total T4 (TT4)
- Free T4 Index (FTI)
- Thyroid Stimulating Hormone (TSH)
We used the above attributes to build a model that can classify patients with ESS with high accuracy. These are chosen because they are the most important attributes in the dataset. Moreover, theses attributes can be measured in a blood test.
Cavalcante, Caio, Vinicius Almeida, Marcos Barros, Nathalee Lima, and Rosana Rego. "Thyroid Syndrome Detection using Machine Learning Algorithms: A Comparative Analysis." In: Congresso Brasileiro de Inteligência Computacional, 2024. Anais do XVI Congresso Brasileiro de Inteligência Computacional. p. 1.
Rego, R. C. B., Vinicius A. Almeida, Caio M. V. Cavalcante, and Nathalee C. A. Lima. "Diagnostic Support System for Euthyroid Sick Syndrome based on Machine Learning Algorithms Approaches." In: International Conference on Intelligent Systems and New Applications, 2023, Liverpool. ICISNA 23 Proceedings Book. Liverpool, 2023. v. 1. pp. 259-264.
Almeida, Vinicius A., Caio M. V. Cavalcante, Nathalee C. A. Lima, and Rosana C. B. Rego. "Classificação da SÃndrome do Doente Eutireoideo com Algoritmos de Machine Learning: Uma Aplicação de Suporte ao Diagnóstico." In: IV Congresso Brasileiro Interdisciplinar em Ciência e Tecnologia, 2023. Anais do Congresso Brasileiro Interdisciplinar em Ciência e Tecnologia, 2023.
Almeida, Vinicius A., and Rosana C. B. Rego. "SÃndrome do Doente Eutireoidiano: Análise de Indicadores Importantes com Machine Learning." In: VI Encontro De Computação Do Oeste Potiguar (ECOP), 2023, Pau dos Ferros. Anais Do Encontro De Computação Do Oeste Potiguar, 2023. v. 1.
We used 4 different machine learning approaches to build a model that can classify patients with ESS with high accuracy. The approaches are:
- Naive Bayes
- Logistic Regression
- Decision Tree
- Random Forest
The results are shown in the table below:
Approach | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Naive Bayes | 0.8493 | 0.7963 | 0.9285 | 0.8573 |
Logistic Regression | 0.9198 | 0.9063 | 0.9321 | 0.9190 |
Decision Tree | 0.9817 | 0.9719 | 0.9911 | 0.9814 |
Random Forest | 0.9834 | 0.9839 | 0.9821 | 0.9830 |
We used 4 different machine learning approaches to build a model that can classify patients with ESS with high accuracy. The approaches are:
- Logistic Regression
- Random Forest
- LightGBM
- XGBoost
- Stack Ensemble based on Random Forest and XGBoost
The results are shown in the table below:
Approach | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
Logistic Regression | 91.98% | 93.21% | 90.62% | 91.90% |
Random Forest | 98.34% | 98.21% | 98.38% | 98.30% |
LightGBM | 97.64% | 97.32% | 97.64% | 97.58% |
XGBoost | 98.60% | 98.77% | 98.57% | 98.57% |
Stack Ensemble | 98.78% | 98.75% | 98.75% | 98.75% |
👤 Vinicius Almeida: vinicius45anacleto@gmail.com
👤 Caio Moisés: caio.cavalcante@alunos.ufersa.edu.br
👤 Rosana Rego