If you find my work to be useful, please star this repository!
This is a collection of methods for collecting, compiling, cleaning, analyzing, modeling, and predicting team and player (skaters and goalies) performances and strategies. This repository does not claim ownership of the data and reflects the perspectives of the organizations or entities mentioned. All original code (including generic and model algorithms) may be used freely, provided proper citation and credit are given to this repository.
All of my analyses and deep-dive insights are written and presented in my Medium blog.
I use publicly available data to build up the analytics capabilities and insights generated beyond the headline statics easily measurable.
- Capture complex strategical, behavioral, and performance trends asked by fans of the sport
- Integrate different data sources (e.g. college hockey roaster and building up performance trend beyond players' professional career)
I hope works saved in this repository allows for replications, explorations, and advancing new measurements and insights.
Applied Tools
Capabilities I use for data collection, processing, and analysis to derive insights, data visualizations, and predictive models.
Capability | Tools used |
---|---|
General | |
Data Collection & Processing | |
ML Model Build | |
Interactive Data Visualization |
Data Pull & Process Automation with Github Actions
The Github Actions is being used to update the data saved in this repository folder ./latest/
. The data collection is run every day.
- Team-level rank
- Game-level stats
- Game-level betting odds
- Play-by-play records
Required package version used is saved in ./src/requirement
through .sh
command. Note that the python environment function pull is based on where the script is located, where as data file reference is based on Github repository head directory.
,
- \O , .-.___
- /\ O/ /xx\XXX\
- __/\ `\ /\ |xx|XXX|
` \, \_ = _/` << |xx|XXX|
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""