This project is for Machine Learning practice. I will learn how to predict the winner of football matches in the English Premier League (EPL).
Project Steps
- Scrape match data using request, BeautifulSoup, and pandas.
- Clean the data and get it ready for machine learning using pandas.
- Make predictions about who will win a match using scikit-learn.
- Measure error and improve our predictions.
- Add comments on code
The code will be in two files:
matches.ipynb
- a Jupyter notebook that scrapes our data.predictions.ipynb
- a Jupyter notebook that makes predictions.
To follow this project, this is needed locally:
- JupyterLab
- Python 3.8+
- Python packages
- pandas
- requests
- BeautifulSoup
- scikit-learn
- html5lib
I will be scraping FBref to get the data first.
For the predictions, I will be using the CSV file with all the data scrapped.
After running the code I found a couple of issues:
- FBref does not kicks me out of the server regarless of fulfilling the 3 seconds minimun rest betweent requests.
- To have a better dataset I manually set each link of the season I wanted to scrape.