The data analysis part of the employee attrition project involves exploring the dataset to gain insights into the factors that may be influencing employee attrition. Here's a breakdown of the key steps involved in the data analysis process:
-
Loading the Data :
- The dataset is loaded into a pandas DataFrame from a CSV file using the provided URL.
-
Checking for Null Values:
- The code checks if there are any null values in the dataset and reports that there are no null values present.
-
Descriptive Statistics:
- Descriptive statistics, such as mean, standard deviation, minimum, maximum, etc., are calculated for the numerical columns in the dataset.
- Two columns,
Employ_Count
andStandard_Hours
, have a standard deviation of 0, indicating that they have unique values and may not be useful for analysis.
-
Categorical Columns Analysis:
- Analysis is performed on categorical columns with respect to the target column
Attrition
. - Visualizations, such as count plots, are used to show the distribution of employees across different categories (e.g., Business Travel, Department, Education Field, Gender, Overtime, Job Role) and their relation to attrition.
- Insights are drawn from these visualizations, such as:
- Employees who travel rarely are more likely to leave.
- Research and Development department has the highest attrition rate.
- Employees from the Life Sciences and Medical fields are more likely to leave.
- Males are more likely to quit than females.
- Overtime does not seem to have a significant impact on attrition.
- Certain job roles, such as Laboratory Technician and Sales Executive, have higher attrition rates.
- Analysis is performed on categorical columns with respect to the target column
-
Continuous Data Analysis:
- Analysis is performed on continuous numerical columns with respect to the target column
Attrition
. - Histograms and other visualizations are used to show the distribution of data and its relation to attrition.
- Insights are drawn from these visualizations, such as:
- Employees aged 25 to 35 are more likely to leave.
- Employees living closer to work are more likely to leave.
- Higher monthly income is associated with lower attrition rates.
- Employees who have worked for fewer companies are more likely to leave.
- Higher salary hike percentage is associated with lower attrition rates.
- Analysis is performed on continuous numerical columns with respect to the target column
-
Discrete Data Analysis:
- Analysis is performed on discrete numerical columns with respect to the target column
Attrition
. - Count plots are used to show the distribution of data and its relation to attrition.
- Insights are drawn from these visualizations, such as:
- Higher job satisfaction and environmental satisfaction are associated with lower attrition rates.
- Higher job level is associated with lower attrition rates.
- Higher job involvement is associated with higher attrition rates.
- Analysis is performed on discrete numerical columns with respect to the target column
Overall, the data analysis part provides a comprehensive understanding of the dataset and highlights key factors that may be contributing to employee attrition. These insights can be used to guide further analysis and develop strategies to reduce attrition rates.