An ongoing compilation of publicly available datasets for sport science projects.
The importance of data skills for sport scientists is not new. Regardless the level of experience, being able to showcase skills in this area will help in various ways, such as future job interviews, networking or help create opportunities to collaborate with others in the field.
Although there are more sport analytics courses and learning materials available nowadays, a comment that I usually get when it comes to learning data skills is that the datasets used during early learning stages are not motivating and not sport specific.
Unfortunately, sport scientists may not always have access to the type of data that is usually available to professional teams and sport organizations however, there are more and more publicly available datasets that can be used to develop and show your data skills, analytical process and creativity when it comes to sport science analysis.
This resource aims to provide a list of some of those publicly available datasets that can hopefully be used to create sport science data projects and the goal is to continue adding more over time.
Click here to see the list of all available datasets.
-
Tennis Player Tracking ATP Tour Australian Open Final: Tracking data from the 2019 Australian Open Final between Nadal and Djokovic. Includes information about events as well as 2D player positions | Download | Source | Type: CSV |
-
NBA Player Shooting Motions: 3D ball tracking data of basketball shots for a selected group of NBA players. | Download | Source | Type: Feather |
-
NBA SportVU Athlete Tracking: Positional tracking data for the 2015 NBA season captured via SportVU. Includes raw x/y data, play by play logs and space coordinates for shots. | Download | Source | Type: 7-zip, CSV |
-
NBA Schedule Metrics Since 1947 NBA schedule and travel related metrics since 1947 (distance traveled, rest between games, location, time zone shifts, etc) for both teams in a game. | Download | Source | Type: CSV |
-
NBA Draft & Combine: NBA Draft elections since 1947 along with two files containing anthropometric and physical performance data from combines since 2000-01 season. | Download | Source | Type: CSV |
-
NFL Combine & Pro Day Data: Data from NFL combines and pro days since 1987. This dataset contains more than 13K observations with anthropometric and physical profile metrics. | Donwload | Source | Type: CSV |
-
NFL Game Tracking: Athlete Tracking data from each game in the 2017 NFL Season. Includes files with information about players, events and play by play. | Donwload | Source | Type: Feather |
-
MLB Sprint Running Metrics: Split times (0 to 90ft) and max running speed (ft/s) for every MLB position player between 2015 until May 2021. | Donwload | Source | Type: XLSX |
-
Annotated Sport Videos Dataset: This dataset contains links to 1,133,158 YouTube videos annotated with 487 sports labels. Suitable for machine learning and computer vision related work. | Download | Source | Type: Video |
-
Video Databse of Golf Swing Sequencing: GolfDB is a high-quality video dataset created for general recognition applications in the sport of golf, and specifically for the task of golf swing sequencing. | Download | Source | Type: Video |
-
Oura Ring Data: This dataset contains a year worth of wellness data collected with the Oura ring. | Download | Source | Type: CSV |
-
Sleep Dataset: Acceleration (in units of g) and heart rate (bpm, measured from photoplethysmography) recorded from the Apple Watch, as well as labeled sleep scored from gold-standard polysomnography from 31 subjetcs. | Download | Source | Type: TXT |
-
NHL Tracking and Play by Play: The data represents all the official metrics measured for each game in the NHL between 2015-21. Information includes tracking, events, play-by-play, etc. | Download | Source | Type: FST |
-
IPL Cricket Dataset: The folder contains ball-by-ball data for the IPL matches in csv format. It contains data for 845 matches. There is an extra file called as the 'all_matches.csv' which contains the combined information of all matches in one single file.| Download | Source | Type: CSV |
-
Soccer StatsBomb Data: Includes event, lineup, and match data in JSON format for hundreds of matches from various leagues. | Download | Source | Type: JSON | Documentation | Terms |
-
Soccer Bio-banding Data: This data was downloaded from Ally Hamilton's dissertation on The effect of Bio-banding on the technical, tactical and physical demands of soccer specific small-sided games. It contains athlete maturation, biobanding categories as well as a number of tracking, technical and tactical variables. | Download | Source | Type: .XLSX |
-
Mid-Long Distance Running Injuries Dataset: Two files containing training logs (weekly and daily) along with injury records. Information includes 7 years worth of data with more that 70 variables including distances, intensities, perceive efforts and training quality, etc. | Download | Source | Type: CSV |
-
eSports Dataset: Psycho-physiological data collected on 10 pro and amateur eSport athletes in 22 League of Legends matches. The dataset includes in-game logs and match info as well as monitoring data such us enviromental, IMU movements, EMG, GSR, HR, EEG, mouse/keyboard activity, face skin temperature, eye tracking, post-game surveys, etc. collected simultaneously for 5 players. | Download | Source | Research | Type: CSV/JSON |
-
24h Monitoring HRV, Sleep, Saliva: Multilevel Monitoring of Activity and Sleep in Healthy people (MMASH) dataset provides 24 hours of continuous beat-to-beat heart data, triaxial accelerometer data, sleep quality, physical activity and psychological characteristics (i.e., anxiety status, stress events and emotions) for 22 healthy participants. Moreover, saliva bio-markers (i.e.cortisol and melatonin) and activity log are also provided in this dataset. | Download | Source | Research | Type: CSV |
Contributions to help grow this resource are more than welcome so others can benefit. Every contributor will be visible on this page.
Contributing guidelines:
-
Update the
README
file with a bullet point refering to your dataset, following the same format including: title, brief explanation, download link, source link, file type. -
If you have access to the raw dataset upload it to the repo inside a folder. The name of the folder should minimally describe the data inside. Consider adding a document briefly explaining the metrics along with the files if needed.
-
Use the source link on the
README
paragraph to credit the person who made the data available, or the original location where the dataset can be found. If this is your own dataset then credit yourself! The source link is important, so users know where to go to learn more about each specific dataset.
Topics of interest include:
- optical/sensor athlete tracking
- athlete monitoring data
- physical profiling
- Sport physiology data
- injuries
- schedule & travel metrics
- sport biomechanics
- video materials
- etc.
Companies that provide data through technology are also welcome to upload sample datasets as a way to help sport scientists become more familiar with the data.
If not sure about how to make a contribution, here is a tutorial that explains how to contribute to a github project: Link
Thanks for your contribution!
Special thanks to the companies and individuals that made these datasets public. Please check the source
link on each dataset to visit the original resource.
The aim of this repository is to feature and provide direct access to datasets that are currently publicly available or that someone wishes to make available for others to use. We don't do any modifications on the datasets.