This is the code for the final research project of Emory University CS334 Machine Learning
-
We employed various preprocessing techniques and applied ML models including linear regression, random forest, and XGBoost to find contributing factors to human life expectancy for developing and developed countries separately
-
All the code can be found in main.ipynb
-
Data is retrieved from Kaggle, and can also be found in data
-
A paper summarizing our work can be found here
Life expectancy is the number of years a person can be expected to live and is often used as an indicator of citizens’ health. A collection of social, political, and public-health factors of a country, such as education level and vaccination rate, is believed to determine the country’s average life expectancy. Therefore, a discrepancy can be typically observed between developing countries and developed countries. Previous research has applied machine learning methods to identify crucial contributors to the average lifespan but is limited in terms of their examination of the contributing factors’ distribution and also places inadequate emphasis on the role of developmental status. Our research utilizes four machine learning models, linear regression with and without regularization, random forest, and XGBoost, to address the aforementioned insufficiency to provide a more straightforward and interpretable view of the factors that impact average lifespan by regarding developmental status as a key characteristic to life expectancy research.
-
Yunjie(Ruby) Wu @yunjiewu777
-
Xinran(Alexandra) Li @shinrannli