This is github repository for project for "Machine learning" class at FIB UPC.
- Martin Galajda
- Florian König
- Aitor Ortiz de Latierro
⚠️ When working with R scripts, working directory is set to the project root (all code should follow this convention! 🙏)
-
The purpose of this directory is to contain models which will be built for different predictions (match result, goals scored, player analytics)
-
- predicting match result (win, tie, loss)
-
- predicting goals scored
- The purpose of this directory is to define functions which will load different data with different features (used in models). Possibly uses precomputed features which take a bit time to compute.
- The purpose of this directory is to define everything related to feature extraction.
-
- this directory contains functions which when executed (source(path) in R) will make sure that cached precomputed features are present in data/ directory
- this are used by some loaders to make sure that precomputed features are saved in csv file in data/ directory
-
- compute attack strength ratio for each match by taking into account all matches from past (so we don't leak future information into our data)
- saves precomputed features in data/ directory
-
- compute win ratios for given match
- takes parameter which determines how many matches from past to take into account
- saves precomputed features in data/ directory
- The purpose of this directory is to contain all data for predicting / analysing
- Specifically, we store here even precomputed features which are not easily computed on the run (take ~ 5 min)
- The purpose of this directory is to contain all code that extracts usable data from the SQLite database.
- The purpose of this directory is to contain all code related to validation and selecting right models/features. Some of the scripts compute CV-errors for different subset of features/ parameters and save them as Rdata (because it takes a lot of time to compute).
- The purpose of this directory is to observe obtained results (for match_result). It uses saved RData to present obtained results from cross-validation and test error from best model.
- R environment (RStudio preferably)
- set root of the project as working directory (some scripts require that)
- run the code
data_conversion\X.R
generates necessarydata\db_X.CSV
database from originaldata/database.sqlite
.validation/match_result/*.R
- scripts contain logic for feature selection, tuning parameters for different models. However, execution time is very big for some scripts (therefore they store results inresults/
directory after computing).results/match_results/validation/*.R
- scripts contain interpretation of results achieved by running scripts fromvalidation/match_result.R
scripts. They use saved RData fromresults/match_results/saved_Rdata
directory.results/match_result/test_error_best_model.R
- script computing test error for best model for predicting match resultsresults/match_result/compare_models.R
- script comparing different models for predicting match result and observing their respective CV-errors.analyze_players.R
- full code for Player Analysis sectionmodels/goals_scored/*.R
- self-contained scripts to tune and train the different number of goals models including the baseline +goals_distribution.R
plot script.