This project aims to predict salaries based on various job-related features using a Random Forest Regressor model on a job dataset.
The dataset contains information about job titles, categories, experience levels, employment types, and more. The goal is to predict salaries by analyzing these factors.
The dataset includes columns such as:
work_year
: Year of data recordjob_title
: Title of the jobjob_category
: Category of the jobsalary_currency
: Currency used for salarysalary
: Salary in the local currencysalary_in_usd
: Salary converted to USDexperience_level
: Level of experience (Junior, Mid-level, Senior, etc.)employment_type
: Type of employmentcompany_location
: Location of the company
- Load the job dataset from a CSV file.
- Clean the data by handling missing values and encoding categorical features using one-hot encoding.
- Split the data into features (X) and the target variable (y).
- Split the data into training and testing sets (80% train, 20% test).
- Utilize a Random Forest Regressor model for predicting salaries.
- Train the model using the training data.
- Evaluate model performance using R-squared and Mean Squared Error (MSE).
- Visualize temporal trends in salaries using line plots.
- Analyze salary distributions across job categories with box plots.
- Display scatter plots comparing actual vs. predicted salaries.
- Create histograms to illustrate the distribution of median house values.
- Python
- Pandas for data manipulation
- Matplotlib for data visualization
- Scikit-learn for machine learning modeling
To run the project:
- Clone or download this repository to your local machine.
- Ensure you have Python installed along with necessary libraries (Pandas, Matplotlib, Scikit-learn).
- Run the provided code in your Python environment to perform the salary prediction analysis.