Skip to content

🎹 Human Learning meets Machine Learning - 1,200+ hours of piano practice. Predictive modelling on my own piano practice data.

Notifications You must be signed in to change notification settings

peterhontaru/Piano-Journal

Repository files navigation

Human Learning meets Machine Learning - 1,200+ hours of piano practice

by Peter Hontaru

I have a soft spot for grand pianos

Problem statement

the why

Learning a piano piece is a time-intensive process. Like with most other things, we tend to overestimate our own ability and then become frustrated that we cannot learn and play that Chopin piece like a concert pianist after only 30 minutes of practice. Fortunately, unlike what you might hear on Wall Street, previous performance is indicative of future success.

There’s also a secondary goal here to hopefully provide a source of inspiration for other people that have always thought to themselves “one day I’ll learn a musical instrument”. Any other skill qualifies here, though. I aim to be doing this by, at the very least, allowing for visibility into my own journey. If this is what you want, why not give it a try?

the what

Can we predict how long it would take to learn a piano piece based on a number of factors? If so, which factors influenced the total amount of hours required to learn the piece the most?

context

I started playing the piano in 2018 as a complete beginner and I’ve been tracking my practice time for around 2 and a half years. I’ve now decided to put that data to good use and see what interesting patterns I might be able to find and hopefully develop a tool that others might be able to use in their journeys.

Timeline

Here’s an example of a recent performance - I mainly play classical music but cannot help but love Elton John’s music.

click to view

Elton John - Your Song

Key insights

  • identified various trends in my practice habits (can you guess in which month I had my piano exam in 2019?)

trends

  • pieces could take me anywhere from ~4 hours to 40+ hours of practice, subject to difficulty (as assessed by the ABRSM grade)

difficulty

  • the Random Forest model was shown to be the most optimal model (bootstrap resampling, 25x)

    • Rsquared - 0.57
    • MAE - 6.0 hours
    • RMSE - 7.6 hours
  • looking at the variability of errors, there is still a tendency to over-predict for pieces that took very little time to learn and under-predict for the more difficult ones. There could be two main reasons for this:

    • artificially inflating the number of hours spent on a piece by returning to it a second time (due to a recital performance, wanting to improve the interpretation further or simply just liking it enough to play it again)
    • learning easier pieces later on in my journey which means I will learn them faster than expected (based on my earlier data where a piece of a similar difficulty took longer)

Residuals

  • the most important variables were shown to be the length of the piece, standard of playing(performance vs casual) and experience(lifetime total practice before first practice session on each piece)

factors

Data collection

  • imputed conservative estimations for the first 10 months of the first year (Jan ’18 to Oct ’18) and on Excel spreadsheet for Nov ’18
  • everything from Dec ’18 onwards was tracked using Toggl, a time-tracking app/tool
  • time spent in piano lessons was not tracked/included (usually 2-3 hours total per month)
  • the Extract, Transform, Load script is available in the global.R file of this repo;
  • for security reasons, I am not able to share the API script as the token also gives the option to change/remove the profile data; the raw data however, is stored in the raw data folder of this repo (not having the API call in simply just means that it won’t be up to date for the current year)

Disclaimer: I am not affiliated with Toggl. I started using it a few years ago because it provided all the functionality I needed and loved its minimalistic design. The standard membership, which I use, is free of charge.

Credits

Limitations

  • very limited data which did not allow for a train/test split; however, a bootstrap resampling method is known to be a good substitute
  • biased to one person’s learning ability (others might learn faster or slower)
  • on top of total hours of practice, quality of practice is a significant factor which is not captured in this dataset
  • very difficult to assess when a piece is “finished” as you can always further improve on your interpretation
  • not all pieces had official ABRSM ratings and a few had to be estimated; even for those that do have an official rating, the difficulty of a piece is highly subjective to each pianist and hard to quantify with one number
  • memorisation might be a confounding variable that was not accounted for and it could lead to inflating the numbers required on a specific piece

What’s next?

  • keep practicing, gather more data and refresh this analysis + adjust the model
  • add a recommender tab to the shiny dashboard to recommend pieces based on specific features

Extended analysis

Full project available

  • at the following link, in HTML format
  • in the EDA-and-modelling.md file of this repo (however, I recommend previewing it at the link above since it was originally designed as a HTML document)

Interactive application

Screenshot

About

🎹 Human Learning meets Machine Learning - 1,200+ hours of piano practice. Predictive modelling on my own piano practice data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages