- Fixed a bug where
calculate_stats()
incorrectly countedreceiving_air_yards
. (#500)
- Added new function
calculate_stats()
that combines the output of allcalculate_player_stats*()
functions with a more robust and faster approach. Thecalculate_player_stats*()
function will be deprecated in a future release. (#470) - Added new exported dataframe
nfl_stats_variables
. It lists and explains all variables returned bycalculate_stats()
. A searchable table is available at https://www.nflfastr.com/articles/stats_variables.html. (#470)
- Drop
{crayon}
,{DT}
,{httr}
,{jsonlite}
,{qs}
dependencies. (#453) - The function
calculate_player_stats_def
now returnsseason_type
if argumentweekly
is set toTRUE
for consistency with the other player stats functions. (#455) - The function
missing_raw_pbp()
now allows filtering by season. (#457) - More robust handling of player IDs in
decode_player_ids()
. (#458) - Fixed rare cases where the value of the
yrdln
variable didn't equal"MID 50"
at midfield. (#459) - Fixed rare cases where
drive_start_yard_line
missed the blank space between team name and yard line number. (#459) - Fixed play description in some 1999 and 2000 games where the string "D.Holland" replaced the kick distance. (#459)
- Fixed a problem where the
goal_to_go
variable wasFALSE
in actual goal to go situations. (#460) - Fixed a bug in
fixed_drive
andfixed_drive_result
where the second weather delay in2023_13_ARI_PIT
wasn't identified correctly. (#461) punter_player_id
, andpunter_player_name
are filled for blocked punt attempts. (#463)- Fixed an issue affecting scores of 2022 games involving a return touchdown (#466)
- Added identification of scrambles from 1999 through 2004 with thank to Aaron Schatz (#468, #489)
- Updated the dataframe
stat_ids
with some IDs that were previously missing. (#470) - nflfastR tried to fix bugs in the underlying pbp data of JAX home games prior to the 2016 season. An update of the raw pbp data resolved those bugs so nflfastR needs to remove the hard coded adjustments. This means that nflfastR <= v4.6.1 will return incorrect pbp data for all Jacksonville home games prior to the 2016 season! (#478)
- Fixed a problem where
clean_pbp()
returnedpass = 1
in actual rush plays in very rare cases. (#479) - Removed extra lines for injury timeouts that were breaking
fixed_drive
(#482) - The variable
penalty_type
now correctly lists the penalty "Kickoff Short of Landing Zone" introduced in the 2024 season. (#486) - Fixed a bug where
ep
was incorrect on PAT attempts preceded by a timeout and then a penalty (extremely rare). This bug also caused the variablestotal_home_epa
andtotal_away_epa
to be incorrect for all subsequent plays in the same game. (#493)
Thank you to @ahmed-cheema, @andrewtek, @guga31bb, @isaactpetersen, @JoeMarino2021, @john-b-edwards, @marcusSasser, @mlounsberry, @morganandrew, @mrcaseb, @mscoop16, @parsnipz, @rjthompson2, and @Useight for their questions, feedback, and contributions towards this release.
- The function
calculate_series_conversion_rates()
now correctly aggregates season level conversion rates. Performance has also been improved. (#440) - Adjusted test behavior at CRAN's request.
Thank you to @andrewtek, @gregalvi86, @Ic4ru5Wing, @JoeMarino2021, @jreddy1990, @marvin3FF, @mrcaseb, @RicShern, @SPNE, and @trivialfis for their questions, feedback, and contributions towards this release.
- nflfastR now fully supports loading raw pbp data from local file system. The best way to use this feature is to set
options("nflfastR.raw_directory" = {"your/local/directory"})
. Alternatively, bothbuild_nflfastR_pbp()
andfast_scraper()
support the argumentdir
which defaults to the above option. (#423) - Added the new function
save_raw_pbp()
which efficiently downloads raw play-by-play data and saves it to the local file system. This serves as a helper to setup the system for faster play-by-play parsing via the above functionality. (#423) - Added the new function
missing_raw_pbp()
that computes a vector of game IDs missing in the local raw play-by-play directory. (#423)
- The internal function
get_pbp_nfl()
now usesifelse()
instead ofdplyr::if_else()
to handle some null-checking, fixes bug found in2022_21_CIN_KC
match. - The function
calculate_player_stats()
now summarises target share and air yards share correctly when called with argumentweekly = FALSE
(#413) - The function
calculate_player_stats()
now returns the opponent team when called with argumentweekly = TRUE
(#414) - The function
calculate_player_stats_def()
no longer errors when small subsets of pbp data are missing stats. (#415) - The function
calculate_series_conversion_rates()
no longer returnsNA
values if a small subset of pbp data is missing series on offense or defense. (#417) fixed_drive
now correctly increments on plays where posteam lost a fumble but remains posteam because defteam also lost a fumble during the same play. (#419)- nflfastR now fixes missing drive number counts in raw pbp data in order to provide accurate drive information. (#420)
- nflfastR now returns correct
kick_distance
on all punts and kickoffs. (#422) - Decode player IDs in 2023 pbp. (#425)
- Drop the pseudo plays TV Timeout and Two-Minute Warning. (#426)
- Fix posteam on kickoffs and PATs following a defensive TD in 2023+ pbp. (#427)
calculate_player_stats()
no more counts lost fumbles on plays where a player fumbles, a team mate recovers and then loses a fumble to the defense. (#431)- The variables
passer
,receiver
, andrusher
no more returnNA
on "abnormal" plays - like direct snaps, aborted snaps, laterals etc. - that resulted in a penalty. (#435)
Thank you to @903124, @ak47twq, @andrewtek, @darkhark, @dennisbrookner, @marvin3FF, @mistakia, @mrcaseb, @nicholasmendoza22, @rickstarblazer, @RileyJohnson22, and @tanho63 for their questions, feedback, and contributions towards this release.
- New implementation of tests to be able to identify breaking changes in reverse dependencies (#396, #406)
calculate_standings()
no more freezes when computing standings from schedules where some games are missing results, i.e. upcoming games.- Bug fix that caused problems with upcoming dplyr and tidyselect updates that weren't reverse compatible.
- Significant performance improvements of internal functions. (#402)
- Wrap examples in
try()
to avoid CRAN problems. (#404) - Fixed a bug where
calculate_standings()
wasn't able to handle nflverse pbp data. (#404)
- Added new function
calculate_player_stats_def()
that aggregates defensive player stats either at game level or overall. (#288) - The situation report
nflverse_sitrep
which is an alias of the already availablereport()
- Added new function
calculate_player_stats_kicking()
that aggregates player stats for field goals and extra points at game level or overall. (#381) - Added new function
calculate_series_conversion_rates()
that computes series conversion and series result rates at a game level or season level. (#393)
- Internal change to
calculate_player_stats()
that reflects new nflverse data infrastructure. calculate_player_stats()
now unifies player names and joins the following player information vianflreadr::load_players()
:player_display_name
- Full name of the playerposition
- Position of the playerposition_group
- Position group of the playerheadshot_url
- URL to a player headshot image
- Make data work in 2022 (hopefully)
- Fix Amon-Ra St. Brown breaking the name parser
- Add gsis_id patch to
clean_pbp()
. calculate_player_stats_def()
failed in situations where play-by-play data is missing certain stats. (#382)- Spot-fixing
calculate_player_stats()
forNA
names.
- Added new function
calculate_standings()
that computes regular season division standings and playoff seeds from nflverse data. - The database function
update_db()
now supports the option "nflfastR.dbdirectory" which can be used to set the directory of the nflfastR pbp database globally and independent of any project structure or working directories. - The embedded data frame
?teams_colors_logos
has been updated to reflect the most recent team color themes and gained additional variables for conference and division as well as logo urls to the conference and league logos. (#290) - The embedded data frame
?teams_colors_logos
has been updated with the Washington Commanders. (#312)
- The argument
qs
in the functionsload_pbp()
andload_player_stats()
has been deprecated as of nflfastR 4.3.0. This release removes the argument entirely.
- Fixed bug where a player could be duplicated in
calculate_player_stats()
in very rare cases caused by plays with laterals. (#289) - Fixed a bug where the function
add_xpass()
failed when called with an empty data frame. (#296) - Fixed a bug where
play_type
showedno_play
on plays with penalties that don't result in a replay of the down. (#277, #281) - Fixed a bug in the variable descriptions of
total_home_score
andtotal_away_score
. (#300) fast_scraper_rosters()
andfast_scraper_schedules()
now callnflreadr::load_rosters()
andnflreadr::load_schedules()
under the hood (#304)- Fixed a bug causing missing EPA on game-ending turnovers in overtime
- Bump minimum nflreadr version to 1.2.0 for data repository change
- Fix a bug affecting yardline for a very small number of plays in the 2000 season (#323)
update_db()
now uses a default play to predefine column types for all db drivers. (#324)- Fix a bug that resulted in incorrect
xyac_mean_yardage
on 4th downs (#327) - Fix a bug that resulted in missing
xyac
information for plays involving J.O'Shaughnessy (#329) - Fix a bug that resulted in missing
epa
on the last play of some games involving NE and BUF (#331) fast_scraper()
andbuild_nflfastR_pbp()
now return data frames of classnflverse_data
to be consistent withnflreadr
.- Fix behavior of EP model in neutral site games (treats both teams as away teams)
- Add nflreadr to dependecies and drop lubridate and magrittr dependency
- The functions
load_pbp()
andload_player_stats()
now callnflreadr::load_pbp()
andnflreadr::load_player_stats()
respectively. Therefore the argumentqs
has been deprecated in both functions. It will be removed in a future release. Runningload_player_stats()
without any argument will now return player stats of the current season only (the default innflreadr
). - The deprecated arguments
source
andpp
in the functionsfast_scraper_*()
andbuild_nflfastR_pbp()
have been removed - Added the variables
racr
("Receiver Air Conversion Ratio"),target_share
,air_yards_share
,wopr
("Weighted Opportunity Rating") andpacr
("Passing Air Conversion Ratio") to the output ofcalculate_player_stats()
- Added the function
report()
which will be used by the maintainers to help users debug their problems (#274).
- Fixed a minor bug in the console output of
update_db()
- Fix for a handful of missing
receiver
names (#270) - Fixed bug with missing
return_team
on interception return touchdowns (#275) - Fixed a rare bug where an internal object wasn't predefined (#272)
- All
wpa
variables areNA
on end game line - All
wp
variables are 0, 0.5, 1, orNA
on end game line - Fix bug where win prob on PATs assumed a PAT placed at 15 yard line, even in older seasons
- The function
decode_player_ids()
now really decodes the new variablefantasy_id
(#229) - Fixed a bug that caused slightly differing
wp
values depending on the first game in the data set (#183) - Edited GitHub references to point to nflverse
- Added the variables
sack_yards
,sack_fumbles
,rushing_fumbles
andreceiving_fumbles
to the output of the functioncalculate_player_stats()
, thanks to Mike Filicicchia (@TheMathNinja). (#239) - Fixed a bug where
calculate_player_stats()
falsely counted lost fumbles on aborted snaps (#238) - Added the variable
season_type
to the output ofcalculate_player_stats()
andload_player_stats()
in preparation of the extended Regular Season starting in 2021 (#240) - Updated
season_type
definitions in preparation of the extended Regular Season starting in 2021 (#242) - Fix for
fixed_drive
where it wasn't incrementing when there was a muffed punt followed by timeout (#244) - Fix for
fixed_drive
where it wasn't incrementing following an interception with the intercepting player then losing a fumble (#247) - Fix for more issues with missing play info in 2018_01_ATL_PHI (#246)
- Added the variables
safety_player_name
andsafety_player_id
to the play-by-play data (#252) - Dropped the dependency
usethis
- Added the function
calculate_player_stats()
that aggregates official passing, rushing, and receiving stats either at game level or overall - Added the function
load_player_stats()
that loads weekly player stats from 1999 to the most recent season - The performance of the functions
add_xyac()
andclean_pbp()
has been significantly improved
- Added the new columns
td_player_name
andtd_player_id
to clearly identify the player who scored a touchdown (this is especially helpful for plays with multiple fumbles or laterals resulting in a touchdown) - The function
calculate_player_stats()
now adds the variabledakota
, theepa
+cpoe
composite, for players with minimum 5 pass attempts. - Added column
home_opening_kickoff
toclean_pbp()
- Added the variables
sack_player_id
,sack_player_name
,half_sack_1_player_id
,half_sack_1_player_name
,half_sack_2_player_id
andhalf_sack_2_player_name
who identify players that recorded sacks (or half sacks). Also updated the description of the variablesqb_hit_1_player_id
,qb_hit_1_player_name
,qb_hit_2_player_id
andqb_hit_2_player_name
to make more clear that they did not record a sack. (#180)
- The variable
qb_scramble
was incomplete for the 2005 season because of missing scramble indicators in the play description. This has been mostly fixed courtesy of charting data from Football Outsiders (with thanks to Aaron Schatz!). Some notes on this fix: Weeks 1-16 are based on charting. Weeks 17-21 are guesses (basically every QB run except those that were a) a loss, b) no gain, or c) on 3/4 down with 1-2 to go). Plays nullified by penalty are not included. - Change
name
,id
,rusher
, andrusher_id
to be the player charged with the fumble on aborted snaps when the QB is unable to make a play (i.e. pass, sack, or scramble) (#162) - The function
clean_pbp()
now standardizes the team name columnstackle_with_assist_*_team
- Fix bug in
drive
that was causing incorrect overtime win probabilities (#194) - Fixed a bug where
posteam
was notNA
on end of quarter 2 (or end of quarter 4 in overtime games) causing wrong values forfixed_drive
,fixed_drive_result
,series
andseries_result
- Fixed a bug where
fixed_drive
andseries
were falsely incrementing on kickoffs recovered by the kicking team or on defensive touchdowns followed by timeouts - Fixed a bug where
fixed_drive
andseries
were falsely incrementing on muffed punts recovered by the punting team for a touchdown - Fixed a bug where
add_xpass()
crashed when ran with data already including xpass variables. - Fixed a bug in
epa
when a safety is scored by the team beginning the play in possession of the ball (#186) - Fix some bugs related to David and Duke Johnson on the Texans in 2020 (#163)
- Fix yet another bug related to correctly identifying possession team on kickoffs nullified by penalty (#199)
- Fixed a bug where
calculate_player_stats()
forgot to clean player names by using their IDs - Fixed a bug where special teams touchdowns were missing in the output of
calculate_player_stats()
(#203) - Fixed for some old Jaguars games where the wrong team was awarded points for safeties and kickoff return TDs (#209)
- The function
update_db()
no more falsely closes a database connection provided by the argumentdb_connection
(#210) - Fixed a bug where
yards_gained
was missing yardage on plays with laterals. (#216) - Fixed a bug where there were stats wrongly given on a play with penalty (#218)
fixed_drive
now increments properly on onside kick recoveries (#215)fixed_drive
no longer counts a muffed kickoff as a one-play drive on its own (#217)fixed_drive
now properly increments after a safety (#219)- Improved parser for
penalty_type
and updated the description of the variable to make more clear it's the first penalty that happened on a play. (#223)
- Deprecated the arguments
source
andpp
all across the package. Using them will cause a warning. Parallel processing has to be activated by choosing an appropriatefuture::plan()
before calling the relevant functions. For more information please see the package documentation. - The function
build_nflfastR_pbp()
will now rundecode_player_ids()
by default (can be deactivated with the argumentdecode = FALSE
). - The function
build_nflfastR_pbp()
will now runadd_xpass()
by default and add the new variablesxpass
andpass_oe
. - The functions
fast_scraper()
andbuild_nflfastR_pbp()
now allow the output offast_scraper_schedules()
directly as input so it's not necessary anymore to pull thegame_id
first.
- Added the new function
load_pbp()
that loads complete seasons into memory for fast access of the play-by-play data. - Added the new variables
rushing_yards
,lateral_rushing_yards
,passing_yards
,receiving_yards
,lateral_receiving_yards
to fix an old bug whereyards_gained
gets overwritten on plays with laterals (#115). - Added columns
vegas_wpa
andvegas_home_wpa
which contain Win Probability Added from the spread-adjusted WP model - Added column
out_of_bounds
- Added columns
fantasy
,fantasy_id
,fantasy_player_name
, andfantasy_player_id
that indicate the rusher or receiver on the play - Added columns
tackle_with_assist
,tackle_with_assist_1_player_id
,tackle_with_assist_1_player_name
,tackle_with_assist_1_team
,tackle_with_assist_2_player_id
,tackle_with_assist_2_player_name
,tackle_with_assist_2_team
- Tuned spread-adjusted win probability model one final (?) time. Expected points is now no longer
required for
calculate_win_probability()
- Added field descriptions
vignette("field_descriptions")
with a searchable list of all nflfastR variables - Switched data source for 2001-2010 to what is used for 2011 and on
- All models have been moved to the fastrmodels package
- Added the data frames
?field_descriptions
and?stat_ids
to the package
- Fix bug where
fixed_drive
andseries
weren't updating after muffed punt (#144) - Fix bug induced by fixing the above (#149)
- Fix bug where some special teams plays were incorrectly being labeled as pass plays (#125)
- Fix bug where points for safeties were given to the
defteam
instead of theposteam
(#152) - Fix bug where a muffed punt TD was given to the wrong team in a 2011 Jaguars game (#154)
- Win probability is now calculated prior to PAT attempts rather than using WP on the ensuing kickoff
- Improved performance of internal functions that speed up the rebuilding process in
update_db()
(addedqs
andcurl
to dependencies) - Fixed a bug where
calculate_expected_points()
andcalculate_win_probability()
duplicated some existing variables instead of replacing them (#170) - Fixed a bug where
penalty_type
wasn't"no_play"
although it should have been (#172) - Fixed a bug where
penalty_team
could be incorrect in games of the Jaguars in the seasons 2011 - 2015 (#174) - Fixed a bug related to the calculation of
epa
on plays before a failed pass interference challenge in a few 2019 games (#175) - Fixed a bug related to lots of fields with
NA
on offsetting penalties (#44) - Fixed a bug in
epa
when possession team changes at end of 1st or 3rd quarter (#182) - Fixed a bug where various functions have left open connections
vegas_wp
is nowNA
on final line since there is no possession team
- Performance update for win probability model with point spread (
vegas_wp
) - Added
yardline_100
as an input to both win probability models (not having it included was an oversight)
- Fixed a bug where
series
was increased on PATs - Fixed a bug affecting the week 10 Raiders-Broncos game
- Added the column
team_wordmark
- which contains URLs to the team's wordmarks - to the included data frame?teams_colors_logos
- The argument
force_rebuild
of the functionupdate_db()
is now of hybrid type. It can rebuild the play by play data table either for the whole nflfastR era (withforce_rebuild = TRUE
) or just for specified seasons (e.g.force_rebuild = 2019:2020
). The latter is intended to be used for running seasons because the NFL fixes bugs in the play by play data during the week and we recommend to rebuild the current season every Thursday. - Fixed a bug where
update_db()
disconnected the connection to a database provided by the argumentdb_connection
(#102) - Fixed a bug where
update_db()
didn't build a fresh database without providing the argumentforce_rebuild
update_db()
no longer removes the complete data table when a numeric argumentforce_rebuild
is passed but only removes the rows within the table (#109)
- Added the new function
build_nflfastR_pbp()
, a convenient wrapper around multiple nflfastR functions for an easy creation of the nflfastR play-by-play data set - Added a function that applies our experimental expected pass model,
add_xpass()
, that creates columnsxpass
andpass_oe
- More fixes for
fixed_drive
which was not incrementing properly on drives that began following a timeout - Fixed more bugs in EPA and win probability on PATs and kickoffs with penalties
- Fixed a bug where scoring probabilities weren't adding to 1 on field goal attempts near the end of a half
- Messages to the user are now created with the new dependency
usethis
- Fixed bug where plays with "backward pass" in play description were counted as
pass plays (
pass
= 1) - Fixed missing kick distance on touchbacks and blocked punts (#53)
- Added the option
fast
(eitherTRUE
orFALSE
) to the functiondecode_player_ids()
to activate the high efficient C++ decoder of the packagegsisdecoder
fast_scraper_roster()
is finally back! It loads NFL roster of a given season.- Added the function
decode_player_ids()
to decode all player IDs to the commonly known GSIS ID format (00-00xxxxx)
- Add option
source = "old"
tofast_scraper()
to enable scraping of old source. This is mostly useless as it doesn't work for 2020 and provides less info - Added new option
db_connection
toupdate_db()
to allow advanced users to use other DBI drivers, such asRMariaDB::MariaDB()
,RPostgres::Postgres()
orodbc::odbc()
(please see dbplyr for more information)
clean_pbp()
now fixes some bugs in jersey numbersclean_pbp()
,add_qb_epa()
andadd_xyac()
can now handle empty data frames- Fix empty line causing
fast_scraper()
to fail (affects multiple games of the 2020 season) - Fix bug in
fixed_drive
that counted PAT after defensive TD as its own drive - Fixed a bug which caused too high number of tackles in special cases
- Fixed a bug where CPOE was NA when targeting players with apostrophe in last name
- Fix
add_xyac()
breaking with some old packages - Fix
add_xyac()
andadd_qb_epa()
calculations being wrong for some failed 4th downs - Updated Readme with ep and cp model plots
- Updated
vignette("examples")
with the newadd_xyac()
function - Added xYAC model to
vignette("nflfastR-models")
- Added variables
fixed_drive
andfixed_drive_result
to the output offast_scraper()
because the NFL-provided drive info is extremely buggy - Added variable
series_result
clean_pbp()
now adds 4 new variablespasser_jersey_number
,rusher_jersey_number
,receiver_jersey_number
andjersey_number
. These can be used to join rosters.- Fixed incorrect
timeout_team
,return_team
,fumble_recovery_1_team
for JAX games from 2011-2015 - Re-trained EPA model with
fixed_drive
and corrections totimeout_team
- New function
add_xyac()
which adds the following columns associated with expected yards after the catch (xYAC):xyac_epa
,xyac_success
,xyac_fd
,xyac_mean_yardage
,xyac_median_yardage
- Fixed a bug in
series_success
caused by baddrive
information provided by NFL
- Added the following columns that are available 2011 and later:
special_teams_play
,st_play_type
,time_of_day
, andorder_sequence
- Added
old_game_id
column (useful for merging to external data that still uses this ID: format is YYYYMMDDxx) - The
clean_pbp()
function now adds anaborted_play
column - Fixed a bug where pass plays with a penalty at end of play were classified as
play_type
=no_play
rather thanpass
- Fixed bug where EPA on defensive 2 point return was -0.95 instead of -2.95
- Fixed some remaining failed challenge plays that incorrectly had 0 for EPA
- Updated the included dataframe
teams_colors_logos
for the interim name of the 'Washington Football Team' and the corresponding logo urls. - Some internal code improvements causing the required
tidyselect
version to be >= 1.1.0
-
clean_pbp()
now standardizes player IDs across the old (1999-2010) and new (2011+) data sources. Player IDs once again uniquely identify players, and each unique player has one unique ID (as they did before the NFL data source change):- For players whose careers finished before 2011, their IDs remain the same
- For players who played in both eras or only in the new era, their ID is the new ID
- For example, Akili Smith (ID: 00-0015082) and Alex Smith (ID: 32013030-2d30-3032-3334-3336b638d37d) are both abbreviated as "A.Smith" but can be distinguished by their IDs, with Akili showing what the old format ID looks like, and Smith the new one
- Standardization is realized by using an ID map available in the data repo
-
clean_pbp()
now removes all variables it is about to create to make sure nothing unexpected can happen
-
Added minimum version requirements to some package dependencies because installation broke for some users with outdated packages
-
Made a minor bug fix to catch more out-of-order plays and fixed a bug where some plays were being incorrectly dropped in older seasons
-
Standardized team names (e.g.
SD
-->LAC
) in some columns we had missed
- Removed
week
from Expected Points models along with an update ofvignette("nflfastR-models")
andvignette("examples")
- Added function
update_db()
which adds all completed games to a SQLite database - Added function
calculate_win_probability()
- Added new examples to
vignette("examples")
demonstrating the usage of the above mentioned functions
- Fixed a problem with inconsistent data types of the variable
drive_real_start_time
pre and post 2011 - Fixed a problem where some
game_id
s were overwritten during the play by play parsing - Fix some more WP bugs on kickoffs with penalties and rare play description
fast_scraper()
now loads the raw game data from a separate raw data repo- Completely overhauled the entire code base to directly implement
tidy evaluation using
.data
from the rlang package (this is a major code change that takes some getting used to but we need it in preparation of a future release)
- Fixed a problem where defensive two point conversions were not counted
- Kneels on kickoffs are no longer counted as qb kneels
- Variable
yards_gained
more precisely defined - Bugfixes for more games with out of order of plays
- Fix bug related to EPA on plays with a failed pass interference challenge
- Added new example to
vignette("examples")
to demonstrate Expected Points calculatorcalculate_expected_points()
- Fix for WP on 2-pt conversion negated by penalty
- Add more variables (containing team names) to team standardization in
clean_pbp()
- Fix WP for onside kicks
- Fix yet another bug caused by NFL providing plays out of order
- Fix bugs related to penalties on PATs and kickoffs
- Fix bugs related to NFL providing wrong scoring team on defensive touchdowns in older games involving the Jaguars
- Fix some minor issues related to wrong
first_down_rush
andreturn_touchdown
- Improved error handling of
fast_scraper()
for not yet played games - Improved variable documentation and prepared for new website
- Improved performance for dplyr v1.0.0
- Rebuilt EP and WP models due to bugfixes in the underlying data in the versions 2.0.3, 2.0.4 and 2.0.5
- Fix another bug with out of order plays
- Fix bug affecting cumulative totals for WPA, air_WPA and yac_WPA
- Fix bug affecting cumulative totals for air_EPA and yac_EPA
- Fix for NFL providing plays out of order
- Fix for series not incrementing following defensive TD
- Fixed a bug in the series and series success calculations caused by timeouts following a possession change
- Fixed win probability on PATs
- Added minimum version requirement on
xgboost
(>= 1.1) as the recentxgboost
update caused a breaking change leading to failure in adding model results to data
-
Added new models for Expected Points, Win Probability and Completion Probability and removed
nflscrapR
dependency. This is a major change as we are stepping away from the well establishednflscrapR
models. But we believe it is a good step forward. Seedata-raw/MODEL-README.md
for detailed model information. -
Added internal functions for
EPA
andWPA
tohelper_add_ep_wp.R
. -
Added new function
calculate_expected_points()
usable for the enduser.
-
Completely overhauled
fast_scraper()
to make it work with the NFL's new server backend. The optionsource
is still available but will be deprecated since there is only one source now. There are some changes in the output as well (please see below). -
fast_scraper()
now adds game data to the play by play data set courtesy of Lee Sharpe. Game data include: away_score, home_score, location, result, total, spread_line, total_line, div_game, roof, surface, temp, wind, home_coach, away_coach, stadium, stadium_id, gameday -
fastcraper_schedules()
now incorporates Lee Sharpe'sgames.rds
. -
The functions
fast_scraper_clips()
andfast_scraper_roster()
are deactivated due to the missing data source. They might be reactivated or completely dropped in future versions. -
The function
fix_fumbles()
has been renamed toadd_qb_epa()
as the new name much better describes what the function is actually doing.
-
Added progress information using the
progressr
package and removed thefurrr
progress bars. -
clean_pbp()
now adds the columnìd
which is the id of the player in the columnname
. Because we have to piece together different data to cover the full span of years, player IDs are not consistent between the early (1999-2010) and recent (2011 onward) periods. -
Added a
NEWS.md
file to track changes to the package. -
Fixed several bugs inhereted from
nflscrapR
, including one where EPA was missing when a play was followed by two timeouts (for example, a two-minute warning followed by a timeout), and another whereplay_type
was incorrect on plays with declined penalties. -
Fixed a bug, where
receiver_player_name
andreceiver
didn't name the correct players on plays with lateral passes.
The output has changed a little bit.
Dropped Variables | Description |
---|---|
game_key | RS feed game identifier. |
game_time_local | Kickoff time in local time zone. |
iso_time | Kickoff time according ISO 8601. |
game_type | One of 'REG', 'WC', 'DIV', 'CON', 'SB' indicating if a game was a regular season game or one of the playoff rounds. |
site_id | RS feed id for game site. |
site_city | Game site city. |
site_state | Game site state. |
drive_possession_team_abbr | Abbreviation of the possession team in a given drive. |
scoring_team_abbr | Abbreviation of the scoring team if the play was a scoring play. |
scoring_type | String indicating the scoring type. One of 'FG', 'TD', 'PAT', 'SFTY', 'PAT2'. |
alert_play_type | String describing the play type of a play the NFL has listed as alert play. For most of those plays there are highlight clips available through fast_scraper_clips. |
time_of_day | Local time at the beginning of the play. |
yards | Analogue yards_gained but with the kicking team being the possession team (which means that there are many yards gained through kickoffs and punts). |
end_yardline_number | Yardline number within the above given side at the end of the given play. |
end_yardline_side | String indicating the side of the field at the end of the given play. |
Renamed Variables | Description |
---|---|
game_time_eastern -> start_time | Kickoff time in eastern time zone. |
site_fullname -> stadium | Game site name. |
drive_how_started -> drive_start_transition | String indicating how the offense got the ball. |
drive_how_ended -> drive_end_transition | String indicating how the offense lost the ball. |
drive_start_time -> drive_game_clock_start | Game time at the beginning of a given drive. |
drive_end_time -> drive_game_clock_end | Game time at the end of a given drive. |
drive_start_yardline -> drive_start_yard_line | String indicating where a given drive started consisting of team half and yard line number. |
drive_end_yardline -> drive_end_yard_line | String indicating where a given drive ended consisting of team half and yard line number. |
roof_type -> roof | One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |
Added Variables | Description |
---|---|
vegas_wp | Estimated win probabiity for the posteam given the current situation at the start of the given play, incorporating pre-game Vegas line. |
vegas_home_wp | Estimated win probability for the home team incorporating pre-game Vegas line. |
weather | String describing the weather including temperature, humidity and wind (direction and speed). Doesn't change during the game! |
nfl_api_id | UUID of the game in the new NFL API. |
play_clock | Time on the playclock when the ball was snapped. |
play_deleted | Binary indicator for deleted plays. |
end_clock_time | Game time at the end of a given play. |
end_yard_line | String indicating the yardline at the end of the given play consisting of team half and yard line number. |
drive_real_start_time | Local day time when the drive started (currently not used by the NFL and therefore mostly 'NA'). |
drive_ended_with_score | Binary indicator the drive ended with a score. |
drive_quarter_start | Numeric value indicating in which quarter the given drive has started. |
drive_quarter_end | Numeric value indicating in which quarter the given drive has ended. |
drive_play_id_started | Play_id of the first play in the given drive. |
drive_play_id_ended | Play_id of the last play in the given drive. |
away_score | Total points scored by the away team. |
home_score | Total points scored by the home team. |
location | Either 'Home' o 'Neutral' indicating if the home team played at home or at a neutral site. |
result | Equals home_score - away_score and means the game outcome from the perspective of the home team. |
total | Equals home_score + away_score and means the total points scored in the given game. |
spread_line | The closing spread line for the game. A positive number means the home team was favored by that many points, a negative number means the away team was favored by that many points. (Source: Pro-Football-Reference) |
total_line | The closing total line for the game. (Source: Pro-Football-Reference) |
div_game | Binary indicator for if the given game was a division game. |
roof | One of 'dome', 'outdoors', 'closed', 'open' indicating indicating the roof status of the stadium the game was played in. (Source: Pro-Football-Reference) |
surface | What type of ground the game was played on. (Source: Pro-Football-Reference) |
temp | The temperature at the stadium only for 'roof' = 'outdoors' or 'open'.(Source: Pro-Football-Reference) |
wind | The speed of the wind in miles/hour only for 'roof' = 'outdoors' or 'open'. (Source: Pro-Football-Reference) |
home_coach | First and last name of the home team coach. (Source: Pro-Football-Reference) |
away_coach | First and last name of the away team coach. (Source: Pro-Football-Reference) |
stadium_id | ID of the stadium the game was played in. (Source: Pro-Football-Reference) |
game_stadium | Name of the stadium the game was played in. (Source: Pro-Football-Reference) |