Skip to content

bdilday/pychadwick

Repository files navigation

pychadwick

A Python package to interface with the Chadwick libray.
Chadwick is a set of tools for parsing retrosheet data and is available at

http://chadwick.sourceforge.net/doc/index.html

https://github.com/chadwickbureau/chadwick

Features

As of now this package supports retrosheet event data only.

Installation

$ pip install pychadwick

Example use

Python replacement for cwevent

When you install pychadwick, it will install a Python exe that mimic the cwevent exe from the chadwick project. It reads a set of event files and prints them out in csv format to stdout.

This downloads a fresh copy of the retrosheet event files, and parses them with 7 CPUs

$ time pycwevent -n 7  > /tmp/events1.csv
stderr: data_root not given as argument, downloading fresh copy of retrosheet events...
stderr: found 2254 files
Warning: Invalid integer value 'b'

real	3m14.517s
user	12m18.104s
sys	0m25.264s

$ wc -l /tmp/events1.csv 
13976191 /tmp/events1.csv

This uses a pre-downloaded copy of the retrosheet event files, with 7 CPUs

$ time pycwevent -n 7 --data-root /tmp/retrosheet-master/event/regular/ > /tmp/events2.csv
stderr: found 2254 files
Warning: Invalid integer value 'b'

real	1m57.499s
user	9m52.236s
sys	0m17.672s

$ wc -l /tmp/events2.csv 
13976184 /tmp/events2.csv

Python interface to cwevent

Load events

Load events for a game from a file stored on the web

>>> from pychadwick.chadwick import Chadwick                                                                                    

>>> chadwick = Chadwick()                                                                                                       

>>> file_path = "https://raw.githubusercontent.com/chadwickbureau/retrosheet/master/event/regular/1982OAK.EVA" 

>>> games = chadwick.games(file_path)                                                                                           

>>> game = next(games)                                                                                                          

>>> df = chadwick.game_to_dataframe(game)                                                                                       

>>> df                                                                                                                           
         GAME_ID AWAY_TEAM_ID  INN_CT  BAT_HOME_ID  ...  ASS9_FLD_CD  ASS10_FLD_CD  UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0   OAK198204060          CAL       1            0  ...            0             0                   F                     F
1   OAK198204060          CAL       1            0  ...            0             0                   F                     F
2   OAK198204060          CAL       1            0  ...            0             0                   F                     F
3   OAK198204060          CAL       1            1  ...            0             0                   F                     F
4   OAK198204060          CAL       1            1  ...            0             0                   F                     F
..           ...          ...     ...          ...  ...          ...           ...                 ...                   ...
81  OAK198204060          CAL      11            1  ...            0             0                   F                     F
82  OAK198204060          CAL      11            1  ...            0             0                   F                     F
83  OAK198204060          CAL      11            1  ...            0             0                   F                     F
84  OAK198204060          CAL      11            1  ...            0             0                   F                     F
85  OAK198204060          CAL      11            1  ...            0             0                   F                     F

[86 rows x 159 columns]

Load events for a game from a local file

>>> file_path = " /tmp/retrosheet-master/event/regular/1982OAK.EVA"

>>> games = chadwick.games(file_path)                                                                                           

>>> game = next(games)                                                                                                          

>>> df = chadwick.game_to_dataframe(game)                                                                                       

>>> df                                                                                                                           
         GAME_ID AWAY_TEAM_ID  INN_CT  BAT_HOME_ID  ...  ASS9_FLD_CD  ASS10_FLD_CD  UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0   OAK198204060          CAL       1            0  ...            0             0                   F                     F
1   OAK198204060          CAL       1            0  ...            0             0                   F                     F
2   OAK198204060          CAL       1            0  ...            0             0                   F                     F
3   OAK198204060          CAL       1            1  ...            0             0                   F                     F
4   OAK198204060          CAL       1            1  ...            0             0                   F                     F
..           ...          ...     ...          ...  ...          ...           ...                 ...                   ...
81  OAK198204060          CAL      11            1  ...            0             0                   F                     F
82  OAK198204060          CAL      11            1  ...            0             0                   F                     F
83  OAK198204060          CAL      11            1  ...            0             0                   F                     F
84  OAK198204060          CAL      11            1  ...            0             0                   F                     F
85  OAK198204060          CAL      11            1  ...            0             0                   F                     F

[86 rows x 159 columns]

Check which columns are defined

>>>  chadwick.all_headers

Check which columns are enabled

>>>  chadwick.active_headers

Disable all columns, and add only GAME_ID and BAT_ID

>>> _ = [chadwick.unset_event_field(e) for e in chadwick.all_headers]                                                          

>>> chadwick.active_headers                                                                                                    
[]

>>> chadwick.set_event_field("GAME_ID")                                                                                        

>>> chadwick.set_event_field("BAT_ID")                                                                                         

>>> games = chadwick.games(file_path)                                                                                          

>>>  game = next(games)                                                                                                         

>>> df = chadwick.game_to_dataframe(game)                                                                                      

>>> df

         GAME_ID    BAT_ID
0   OAK198204060  burlr001
1   OAK198204060  lynnf001
2   OAK198204060  carer001
3   OAK198204060  hendr001
4   OAK198204060  murpd002
..           ...       ...
81  OAK198204060  meyed001
82  OAK198204060  armat001
83  OAK198204060  grosw001
84  OAK198204060  spenj101
85  OAK198204060  loped001

[86 rows x 2 columns]

Activate all the columns again

>>> chadwick.set_all_headers()