This project analyzes demographic data using Pandas to provide insights into various aspects such as race distribution, education levels, work hours, and income. The analysis includes calculating the average age of men, the percentage of people with Bachelor's degrees, and the percentage of people earning more than 50K based on their education level.
The project is organized into the following directories:
demographic-data-analysis/
├── data/
│ ├── adult_data.csv
├── scripts/
│ ├── analysis.py
├── results/
│ ├── analysis_results.txt
├── figures/
│ ├── race_distribution.png
│ ├── education_income.png
├── docs/
│ ├── data-description.md
│ ├── methodology.md
│ ├── results.md
├── requirements.txt
├── README.md
- data/: This directory contains the dataset file.
adult_data.csv
: The dataset used for the analysis.
- scripts/: This directory contains the script for data analysis.
demographics_analysis.py
: The main script that performs the demographic data analysis.
- results/: This directory contains the results of the analysis.
analysis_results.txt
: A text file with the results of the demographic analysis.
- figures/: This directory contains the generated plots and visualizations.
race_distribution.png
: Plot of race distribution.education_income.png
: Plot of education vs income.
- docs/: This directory contains documentation files.
data-description.md
: Detailed description of the dataset.methodology.md
: Methodology of the analysis.results.md
: Results of the analysis.
- requirements.txt: This file contains the necessary Python libraries for the project.
- README.md: The readme file for the project.
To set up the project on your local machine, follow these steps:
-
Clone the repository:
git clone <repository_url> cd demographic-data-analysis
-
Install dependencies:
pip install -r requirements.txt
-
Place the dataset: Ensure the dataset
adult_data.csv
is located in thedata
directory.
To run the analysis, use the following command:
python scripts/demographics_analysis.py
This analysis offers valuable insights into demographic factors influencing income. It highlights the importance of education and work hours in determining income levels and provides a comprehensive view of demographic distributions.
- Extend the analysis to include more demographic factors.
- Explore the impact of additional variables such as marital status and occupation type on income.
Contributions are welcome! Please fork the repository and submit pull requests with any improvements or additional features.
This project is licensed under the MIT License - see the LICENSE.md file for details.
The conversation continues on Kaggle