Football Data Scraper

Overview

This repository contains a Python notebook designed for scraping football data from various sources. The main goal of this project is to collect detailed player statistics from the past season to create a comprehensive football database for further analysis.

Introduction

This is my first attempt at building a football database. As a data science student in college, I am passionate about football and data analysis, and this project is a step towards combining these interests. By collecting and analyzing data from past football seasons, I aim to uncover insights and trends that can be useful for various applications in the football industry.

Key Features

Player Statistics

Detailed statistics for players, including their performance in various areas like defense, goalkeeping, passing, shooting, and more.

Data Organization

The data is organized into multiple tables, each focusing on a specific aspect of the game, such as defense, goalkeeping, passing types, and playing time.

Comprehensive Analysis

The collected data can be used for in-depth analysis and research, providing insights into player performance and team dynamics.

Leagues Contained

The data covers various leagues split by country. The leagues included are only those that contain all the detailed data that FBref provides. Leagues that do not have advanced stats such as passing and pass types are not included.

European Leagues

England : Premier League & Championship (EFL)
Spain : La Liga & Segunda Division
Italy : Serie A & Serie B
Germany : Bundesliga & 2. Bundesliga
France : Ligue 1 & Ligue 2
Portugal : Primeira Liga
Netherlands : Eredivisie
Belgium : Pro League

American Leagues

USA : MLS (east & west)
Argentina : Primera Division
Brazil : Serie A (Brazilian League)
Mexico : Liga MX

Note that Brazil's 1st division is called "Serie A" too but was changed to "Brazilian League" to avoid confusion

Installation

To run this project, ensure you have Python installed. You can clone the repository and install the required libraries using the following commands:

git clone https://github.com/GuechtouliAnis/Football-Data-Scraping
cd Football-Data-Scraping
pip install -r requirements.txt

Dependencies

pandas
numpy
requests
beautifulsoup4

Usage

Open the Jupyter notebook data-scraping.ipynb in your Jupyter environment and follow the instructions provided in the notebook to scrape and process the football data.

Motivation and Goals

As a football enthusiast and data science student, my goal is to break into the field of football data analysis. This project is an opportunity to apply my data science skills to real-world football data, gaining practical experience and insights that will be valuable for my future career. By analyzing past seasons' data, I hope to identify patterns and trends that can contribute to a deeper understanding of the game.

Contributing

Feel free to submit issues or pull requests if you would like to contribute to this project. Make sure to follow the repository's contribution guidelines.

Data Source

The data for this project is sourced from FBref. Only leagues with comprehensive detailed data provided by FBref are included in this dataset. Leagues without advanced statistics such as passing and pass types are excluded.

Kaggle Database

The processed database is available on Kaggle. You can find the dataset here in CSV and SQLite formats both normalized and denormalized.

NOTE: The dataset provided here was further cleaned personally by me after scraping, so the results of the scraping notebook won't be identical to the data in the database.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as needed.

Author

Anis Guechtouli

Contact

For any questions or suggestions, feel free to contact me at [guechtoulianiss7@gmail.com].

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Data		Data
.gitignore		.gitignore
DS_Functions.py		DS_Functions.py
Data Cleaning.md		Data Cleaning.md
Features Explained.md		Features Explained.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
from_db_to_csv.py		from_db_to_csv.py
notebook-data-cleaning.ipynb		notebook-data-cleaning.ipynb
notebook-data-scraping.ipynb		notebook-data-scraping.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Football Data Scraper

Overview

Introduction

Table of Contents

Key Features

Player Statistics

Data Organization

Comprehensive Analysis

Leagues Contained

European Leagues

American Leagues

Installation

Dependencies

Usage

Motivation and Goals

Contributing

Data Source

Kaggle Database

License

Author

Contact

About

Releases

Packages

Languages

License

GuechtouliAnis/Football-Data-Scraping

Folders and files

Latest commit

History

Repository files navigation

Football Data Scraper

Overview

Introduction

Table of Contents

Key Features

Player Statistics

Data Organization

Comprehensive Analysis

Leagues Contained

European Leagues

American Leagues

Installation

Dependencies

Usage

Motivation and Goals

Contributing

Data Source

Kaggle Database

License

Author

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages