Skip to content

Predicting Year in Million Song Dataset with Linear Regression using Pyspark

Notifications You must be signed in to change notification settings

santoshd97/Machine-learning-in-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Machine Learning in Spark

Objective

Predicting the year in million song dataset with machine learning (mllib package) using pyspark

Data

"Year Prediction MSD Dataset" from UCI Machine Learning Repository is used for this project https://archive.ics.uci.edu/ml/datasets/yearpredictionmsd

Data pre-processing

  • Load the dataset and use min max scaling to scale features between 0 and 1
  • Normalize the labels by subtracting min year
  • Split the dataset into train (70%), test (20%), and validation (10%) set

Model

Using Linear Regression

About

Predicting Year in Million Song Dataset with Linear Regression using Pyspark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published