Skip to content

Latest commit

 

History

History
42 lines (35 loc) · 3.11 KB

Data Analysis Process.md

File metadata and controls

42 lines (35 loc) · 3.11 KB

The data analysis part of the employee attrition project involves exploring the dataset to gain insights into the factors that may be influencing employee attrition. Here's a breakdown of the key steps involved in the data analysis process:

  1. Loading the Data :

    • The dataset is loaded into a pandas DataFrame from a CSV file using the provided URL.
  2. Checking for Null Values:

    • The code checks if there are any null values in the dataset and reports that there are no null values present.
  3. Descriptive Statistics:

    • Descriptive statistics, such as mean, standard deviation, minimum, maximum, etc., are calculated for the numerical columns in the dataset.
    • Two columns, Employ_Count and Standard_Hours, have a standard deviation of 0, indicating that they have unique values and may not be useful for analysis.
  4. Categorical Columns Analysis:

    • Analysis is performed on categorical columns with respect to the target column Attrition.
    • Visualizations, such as count plots, are used to show the distribution of employees across different categories (e.g., Business Travel, Department, Education Field, Gender, Overtime, Job Role) and their relation to attrition.
    • Insights are drawn from these visualizations, such as:
      • Employees who travel rarely are more likely to leave.
      • Research and Development department has the highest attrition rate.
      • Employees from the Life Sciences and Medical fields are more likely to leave.
      • Males are more likely to quit than females.
      • Overtime does not seem to have a significant impact on attrition.
      • Certain job roles, such as Laboratory Technician and Sales Executive, have higher attrition rates.
  5. Continuous Data Analysis:

    • Analysis is performed on continuous numerical columns with respect to the target column Attrition.
    • Histograms and other visualizations are used to show the distribution of data and its relation to attrition.
    • Insights are drawn from these visualizations, such as:
      • Employees aged 25 to 35 are more likely to leave.
      • Employees living closer to work are more likely to leave.
      • Higher monthly income is associated with lower attrition rates.
      • Employees who have worked for fewer companies are more likely to leave.
      • Higher salary hike percentage is associated with lower attrition rates.
  6. Discrete Data Analysis:

    • Analysis is performed on discrete numerical columns with respect to the target column Attrition.
    • Count plots are used to show the distribution of data and its relation to attrition.
    • Insights are drawn from these visualizations, such as:
      • Higher job satisfaction and environmental satisfaction are associated with lower attrition rates.
      • Higher job level is associated with lower attrition rates.
      • Higher job involvement is associated with higher attrition rates.

Overall, the data analysis part provides a comprehensive understanding of the dataset and highlights key factors that may be contributing to employee attrition. These insights can be used to guide further analysis and develop strategies to reduce attrition rates.