This package is used to analyze Serie A soccer (Calcio) data. It creates an accessible R data-frame with information about match results, as well as team stats, Elo ratings, and overall standings. This data-frame is used to generate visualizations on a Shiny App: https://datavisr.shinyapps.io/calcior/
The data is sourced from https://github.com/openfootball which contains the results of all Serie A match since the 2013/14 season. The data is extracted using Ruby with the sportdb gem. Running this will create a local SQLite database sport.db
that we can use to read into R.
#> List of 4
#> $ teams :Classes 'tbl_df', 'tbl' and 'data.frame': 28 obs. of 10 variables:
#> ..$ id : int [1:28] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ key : chr [1:28] "milan" "inter" "lazio" "roma" ...
#> ..$ title : chr [1:28] "Milan" "Inter" "Lazio" "Roma" ...
#> ..$ code : chr [1:28] "MIL" "INT" "LAZ" "ROM" ...
#> ..$ synonyms : chr [1:28] "AC Milan|Associazione Calcio Milan" "Internazionale|FC Internazionale Milano" "SS Lazio|Società Sportiva Lazio|Lazio Roma" "AS Roma|Associazione Sportiva Roma" ...
#> ..$ country_id: int [1:28] 117 117 117 117 117 117 117 117 117 117 ...
#> ..$ club : chr [1:28] "t" "t" "t" "t" ...
#> ..$ since : int [1:28] NA NA NA NA NA NA NA NA NA NA ...
#> ..$ web : chr [1:28] NA NA NA NA ...
#> ..$ national : chr [1:28] "f" "f" "f" "f" ...
#> $ events:Classes 'tbl_df', 'tbl' and 'data.frame': 4 obs. of 8 variables:
#> ..$ id : int [1:4] 1 2 3 4
#> ..$ key : chr [1:4] "it.2016/17" "it.2015/16" "it.2014/15" "it.2013/14"
#> ..$ league_id: int [1:4] 1 1 1 1
#> ..$ season_id: int [1:4] 5 6 7 8
#> ..$ start_at : chr [1:4] "2016-08-21" "2015-08-22" "2014-08-30" "2013-08-24"
#> ..$ team3 : chr [1:4] "t" "t" "t" "t"
#> ..$ sources : chr [1:4] "seriea-i,seriea-ii" "seriea-i,seriea-ii" "seriea-i,seriea-ii" "seriea-i,seriea-ii"
#> ..$ config : chr [1:4] "seriea.yml" "seriea.yml" "seriea.yml" "seriea.yml"
#> $ games :Classes 'tbl_df', 'tbl' and 'data.frame': 1523 obs. of 13 variables:
#> ..$ id : int [1:1523] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ round_id : int [1:1523] 1 1 1 1 1 1 1 1 1 1 ...
#> ..$ pos : int [1:1523] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ team1_id : int [1:1523] 4 7 10 17 11 16 5 1 15 20 ...
#> ..$ team2_id : int [1:1523] 13 12 3 19 2 6 18 8 14 9 ...
#> ..$ play_at : chr [1:1523] "2016-08-20 12:00:00.000000" "2016-08-20 12:00:00.000000" "2016-08-21 12:00:00.000000" "2016-08-21 12:00:00.000000" ...
#> ..$ postponed: chr [1:1523] "f" "f" "f" "f" ...
#> ..$ knockout : chr [1:1523] "f" "f" "f" "f" ...
#> ..$ home : chr [1:1523] "t" "t" "t" "t" ...
#> ..$ score1 : int [1:1523] 4 2 3 1 2 0 3 3 0 2 ...
#> ..$ score2 : int [1:1523] 0 1 4 0 0 1 1 2 1 2 ...
#> ..$ winner : int [1:1523] 1 1 2 1 1 2 1 1 2 0 ...
#> ..$ winner90 : int [1:1523] 1 1 2 1 1 2 1 1 2 0 ...
#> $ rounds:Classes 'tbl_df', 'tbl' and 'data.frame': 154 obs. of 8 variables:
#> ..$ id : int [1:154] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ event_id: int [1:154] 1 1 1 1 1 1 1 1 1 1 ...
#> ..$ title : chr [1:154] "1^ Giornata" "Pescara 1-2 Fiorentina (19.Giornata) 02.02." "3^ Giornata" "4^ Giornata" ...
#> ..$ pos : int [1:154] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ knockout: chr [1:154] "f" "f" "f" "f" ...
#> ..$ start_at: chr [1:154] "2016-08-20" "2016-08-27" "2016-09-10" "2016-09-16" ...
#> ..$ end_at : chr [1:154] "2016-08-21" "2016-08-28" "2016-09-12" "2016-09-18" ...
#> ..$ auto : chr [1:154] "t" "t" "t" "t" ...
The source data is transformed from a set of relational tables to a single data-frame serie_a
which contains list columns of data-frame to maintain the relationship of teams and matches to match_days (rounds) and season. Summary data and Elo ratings are also calculated (details below).
serie_a
#> # A tibble: 4 × 6
#> season results match_days_complete teams
#> <dbl> <list> <dbl> <list>
#> 1 1 <tibble [38 × 2]> 38 <tibble [20 × 1]>
#> 2 2 <tibble [38 × 2]> 38 <tibble [20 × 1]>
#> 3 3 <tibble [38 × 2]> 38 <tibble [20 × 1]>
#> 4 4 <tibble [38 × 2]> 32 <tibble [20 × 1]>
#> # ... with 2 more variables: ratings <list>, standings <list>
Serie A seasons starting from 2013/14 to 2016/17
The number of matches completed so far for each season.
The teams included for each season in Serie A. They change each season as the bottom 3 teams are sent down to Serie B and the top 3 teams from Serie B are promoted.
serie_a %>% select(season, teams) %>% tidyr::unnest(teams) %>% glimpse()
#> Observations: 80
#> Variables: 2
#> $ season <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
#> $ p_team <chr> "catania", "lazio", "juventus", "napoli", "chievoverona...
For every season
, match_day
and team (p_team
, for primary team) it shows their score (p_score
), their opponents score (o_score
), if they were home (p_home
) and how many points
the p_team
earned from the result.
serie_a %>% select(season, results) %>% tidyr::unnest(results) %>% tidyr::unnest(data) %>% glimpse()
#> Observations: 3,040
#> Variables: 8
#> $ season <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
#> $ match_day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
#> $ p_team <chr> "hellasverona", "sampdoria", "inter", "cagliari", "l...
#> $ o_team <chr> "milan", "juventus", "genoa", "atalanta", "udinese",...
#> $ p_score <int> 2, 0, 2, 2, 2, 0, 3, 0, 2, 2, 1, 1, 0, 1, 1, 2, 0, 0...
#> $ o_score <int> 1, 1, 0, 1, 1, 2, 0, 0, 0, 1, 2, 0, 2, 2, 2, 0, 3, 0...
#> $ p_home <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE...
#> $ points <dbl> 3, 0, 3, 3, 3, 0, 3, 1, 3, 3, 0, 3, 0, 0, 0, 3, 0, 1...
For every season
, match_day
and team (p_team
) it shows the teams Elo rating r
.
The Elo calculations are mostly based on this site: http://www.eloratings.net/system.html. With k
= 20 and a season reverting factor of 0.25.
serie_a %>% select(season, ratings) %>% tidyr::unnest(ratings) %>% tidyr::unnest(data) %>% glimpse()
#> Observations: 3,120
#> Variables: 4
#> $ season <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
#> $ match_day <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
#> $ p_team <chr> "atalanta", "bologna", "cagliari", "catania", "chiev...
#> $ r <dbl> 1492.801, 1487.402, 1507.199, 1492.801, 1502.801, 15...
For every season
,match_day
and team (p_team
) it shows the teams cumulative points
, goals_for
, goals_against
and goal_diff
, along with their position
in comparison to the other teams.
serie_a %>% select(season, standings) %>% tidyr::unnest(standings) %>% tidyr::unnest(data) %>% glimpse()
#> Observations: 3,120
#> Variables: 9
#> $ season <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
#> $ match_day <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
#> $ p_team <chr> "lazio", "juventus", "fiorentina", "cagliari", ...
#> $ position <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...
#> $ points <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 0, 0, 0, 0, 0,...
#> $ matches_played <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
#> $ goals_for <dbl> 2, 1, 2, 2, 2, 2, 2, 2, 3, 0, 0, 0, 0, 0, 0, 1,...
#> $ goals_against <dbl> 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 3, 2, 2, 2, 2,...
#> $ goal_diff <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 3, 0, 0, -3, -2, -2, -2...