This project entails conducting Exploratory Data Analysis (EDA) on the AMCAT dataset, focusing on understanding various features and their relationships with the target variable, Salary. Additionally, hypothesis testing is performed to investigate specific claims, such as the relationship between education and earning potential, and to test the claim made in a Times of India article regarding the earning potential of Computer Science Engineering graduates.
- EDA on AMCAT Dataset: An open-ended EDA is conducted to explore the AMCAT dataset, analyzing various features and their distributions, correlations, and relationships with the target variable, Salary.
- Exploration of Gender and Specialization Preferences: The relationship between gender and specialization preferences among individuals in the AMCAT dataset is explored to understand potential patterns and trends.
- Hypothesis Testing: Statistical tests are applied to validate hypotheses and investigate specific claims, including the earning potential of Computer Science Engineering graduates and the relationship between education and earning potential.
This repository contains the following files:
dataset/
: Directory containing the AMEO Dataset.notebooks/
: Directory containing Jupyter notebooks for exploratory data analysis.reports/
: Directory containing the report files.README.md
: This file, providing an overview of the project and instructions.requirements.txt
: File listing all required dependencies for the project.
To replicate the analysis:
- Clone this repository to your local machine.
- Navigate to the project directory.
- Install the required dependencies using
pip install -r requirements.txt
. - Run the Jupyter notebooks in the
notebooks/
directory to perform data analysis and extract information. - Review the results and conclusions drawn from the EDA and hypothesis testing.
Through comprehensive EDA and hypothesis testing, insights into the AMCAT dataset and its relationship with salary are gained. The analysis sheds light on gender and specialization preferences, validates specific claims regarding earning potential, and provides valuable insights for stakeholders.
This project follows the MIT LICENSE.