Skip to content

omotuno/baseball_salary_prediction

Repository files navigation

Baseball Salary Prediction

Explore the complete code and analysis for the Baseball Salary Prediction project here. This project, developed in R Studio, delves into regression modeling for predicting annual salaries of major league baseball players using the Hitters dataset from the ISLR package. The dataset includes 18 variables for players from the 1986 and 1987 MLB seasons.

The workflow includes:

-- Fitting a full multiple linear regression model Screenshot 2023-12-10 at 3 58 10 PM

-- Checking model assumptions like linearity, normality, constant variance Screenshot 2023-12-10 at 3 58 41 PM

Screenshot 2023-12-10 at 3 59 16 PM Screenshot 2023-12-10 at 3 59 33 PM

-- Addressing issues like multicollinearity through variable selection Screenshot 2023-12-10 at 3 59 51 PM Screenshot 2023-12-10 at 4 00 06 PM

-- Applying transformations to satisfy linearity assumptions Screenshot 2023-12-10 at 4 00 18 PM

-- Identifying and removing influential observations

Screenshot 2023-12-10 at 4 00 31 PM

-- Evaluating model fit and prediction accuracy

Screenshot 2023-12-10 at 4 00 59 PM Screenshot 2023-12-10 at 4 01 12 PM

-- The final model incorporates 10 predictor variables and uses a power transformation on the response Salary. It has an adjusted R-squared of 0.75, indicating excellent fit. Variables for career runs batted in (CRBI) and career walks (CWalks) have the strongest relationships with salary.

-- Predictions on test data show low error rates, demonstrating that key performance metrics like hits, walks, and runs batted in are highly predictive of player salary. This model could help teams evaluate player contracts or arbitration cases based on statistical production.