You work as an analyst in a company. The company's HR boss provided you with three datasets. The first two contain information about employees' performance in offices A and B: how much they work, their salaries, the number of their projects, departments, and so on. The third one is an extensive dataset with information on the employees' satisfaction with their jobs, their latest evaluation metrics, and the current status in the company. Your task is to analyze the data and answer some of the HR’s questions.
Conduct data analysis and handle a case that resembles the actual tasks a data analyst may encounter at their job. Master data merging, grouping, aggregation functions, and draw up pivot tables using the pandas functionality.
Stage 1 : Learn how to load data from the XML format, explore, and reindex it properly.
Stage 2 : Practice how to merge several datasets into a big one.
Stage 3 : Master the pandas methods to extract insights from the data.
Stage 4 : Let's try aggregating Pandas DataFrames, which allows you to quickly find different metrics, such as the mean or standard deviation across other columns.
Stage 5 : Explore how to generate pivot tables with Pandas in order to summarize data.
To learn more about this project, please visit HyperSkill Website - HR Data Analyst.
This project's difficulty has been labelled as Hard where this is how HyperSkill describes each of its four available difficulty levels:
- Easy Projects - if you're just starting
- Medium Projects - to build upon the basics
- Hard Projects - to practice all the basic concepts and learn new ones
- Challenging Projects - to perfect your knowledge with challenging tasks
This Repository contains one .py file and one folder:
code.py - Contains the code used to complete the data analysis requirements
Data repository - Contains the three .xml files that contain the data: A_office_data.xml, B_office_data.xml and hr_data.xml
Project was built using python version 3.11.3
number_project
— number of projects an employee has worked on;average_monthly_hours
— typical workload per month in hours;time_spend_company
— how many years an employee has worked in the company;Work_accident
— whether an employee has had an injury at work;promotion_last_5years
— whether an employee has had any promotions during the last five years;Department
— employee's department;salary
— employee's salary rate;employee_office_id
— employee's ID (1, 2, 3, etc.).
satisfaction_level
— how well an employee performs their job;last_evaluation
— the last evaluation score of an employee;left
— whether an employee has left the company;employee_id
— employee's ID in the company (A125 — from the A office; 125 in this case, is employee_office_id).
Download the files to your local repository and open the project in your choice IDE and run the project. The different data frames and their dictionary form will be printed on the console according to the requirements stated in each stage's docstring. Please read each Stage's docstring to know the requirements.