The only libraries needed to run this code are the standard ones in Data Science: Python 3.X, Jupyter, NumPy, Pandas, Matplotlib, SKLearn, and Seaborn
I used the Kaggle dataset about Student Performance in Math over the course of 3 years to try to better understand the following:
- What are the primary environmental factors that affect math performance in this dataset?
- What are the primary controllable factors that affect math performance?
- What factors actually had the largest effect on final grades?
There is the original CSV data from Kaggle, a text file describing it, and a jupyter notebook available to explore answers to each of the questions posed above.
The conclusions reached can be found at the post here
Full credit goes to Kaggle for the dataset which is available here. This was done as part of Udemy's Data Scientist Nanodegree