Skip to content

A Jupyter Notebook project to scrape detailed football data from FBref, covering various leagues. The repository includes both the data collection process and a final cleaned dataset for immediate use in analysis. Ideal for football data science projects, including xG, passing metrics, and tactical insights.

License

Notifications You must be signed in to change notification settings

GuechtouliAnis/Football-Data-Scraping

Repository files navigation

Football Data Scraper

Overview

This repository contains a Python notebook designed for scraping football data from various sources. The main goal of this project is to collect detailed player statistics from the past season to create a comprehensive football database for further analysis.

Introduction

This is my first attempt at building a football database. As a data science student in college, I am passionate about football and data analysis, and this project is a step towards combining these interests. By collecting and analyzing data from past football seasons, I aim to uncover insights and trends that can be useful for various applications in the football industry.

Table of Contents

Key Features

Player Statistics

Detailed statistics for players, including their performance in various areas like defense, goalkeeping, passing, shooting, and more.

Data Organization

The data is organized into multiple tables, each focusing on a specific aspect of the game, such as defense, goalkeeping, passing types, and playing time.

Comprehensive Analysis

The collected data can be used for in-depth analysis and research, providing insights into player performance and team dynamics.

Leagues Contained

The data covers various leagues split by country. The leagues included are only those that contain all the detailed data that FBref provides. Leagues that do not have advanced stats such as passing and pass types are not included.

European Leagues

  • England : Premier League & Championship (EFL)
  • Spain : La Liga & Segunda Division
  • Italy : Serie A & Serie B
  • Germany : Bundesliga & 2. Bundesliga
  • France : Ligue 1 & Ligue 2
  • Portugal : Primeira Liga
  • Netherlands : Eredivisie
  • Belgium : Pro League

American Leagues

  • USA : MLS (east & west)
  • Argentina : Primera Division
  • Brazil : Serie A (Brazilian League)
  • Mexico : Liga MX

Note that Brazil's 1st division is called "Serie A" too but was changed to "Brazilian League" to avoid confusion

Installation

To run this project, ensure you have Python installed. You can clone the repository and install the required libraries using the following commands:

git clone https://github.com/GuechtouliAnis/Football-Data-Scraping
cd Football-Data-Scraping
pip install -r requirements.txt

Dependencies

  • pandas
  • numpy
  • requests
  • beautifulsoup4

Usage

Open the Jupyter notebook data-scraping.ipynb in your Jupyter environment and follow the instructions provided in the notebook to scrape and process the football data.

Motivation and Goals

As a football enthusiast and data science student, my goal is to break into the field of football data analysis. This project is an opportunity to apply my data science skills to real-world football data, gaining practical experience and insights that will be valuable for my future career. By analyzing past seasons' data, I hope to identify patterns and trends that can contribute to a deeper understanding of the game.

Contributing

Feel free to submit issues or pull requests if you would like to contribute to this project. Make sure to follow the repository's contribution guidelines.

Data Source

The data for this project is sourced from FBref. Only leagues with comprehensive detailed data provided by FBref are included in this dataset. Leagues without advanced statistics such as passing and pass types are excluded.

Kaggle Database

The processed database is available on Kaggle. You can find the dataset here in CSV and SQLite formats both normalized and denormalized.

NOTE: The dataset provided here was further cleaned personally by me after scraping, so the results of the scraping notebook won't be identical to the data in the database.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as needed.

Author

Anis Guechtouli

Contact

For any questions or suggestions, feel free to contact me at [guechtoulianiss7@gmail.com].

About

A Jupyter Notebook project to scrape detailed football data from FBref, covering various leagues. The repository includes both the data collection process and a final cleaned dataset for immediate use in analysis. Ideal for football data science projects, including xG, passing metrics, and tactical insights.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published