This project is a part of the course Data Analytics (UE18CS312) at Department of Computer Science, PES University Electronic city campus. With this, we aim to predict the income class of a person based on his demographics such as age, education, race, gender, marital-status and so on. We achieved a maximum testing score of 0.864524 using the catboost algorithm.
UCI Machine Learning US Census
This dataset is extracted from the US Income census of 1994.
We made use of R and Python to complete this project. We used R for the exploratory data analysis and python for building models. This method is recommended by many across the globe.
To clone the code to your local system run -
git clone https://github.com/vishnureddys/income-prediction
After cloning, you can open the file using Jupyter Notebook with Anaconda or Miniconda. Make sure that the following packages are installed, if not please do install it. If you have the ipynb extension for VS Code, you could use that as well.
catboost
scikit-learn
matplotlib
numpy
pandas
To perform data cleaning use -
pip install pandas, numpy
python cleaning.py
For running the Exploratory Data Analysis (EDA) you will need to have R installed on Jupyter Notebook, which can be done from the shell or Anaconda Console.
In Jupyter, click on run code, to run the code. This is an interactive console and does not need any commands to run it. For more information on how to use Jupyter, please refer to this article.
- Pranav L Nambiar
- Vishnu S Reddy
- P Varshith