I worked on this project during the 2018 World Cup as it was an event that captivated me. As a former competitive soccer player and someone who can appreciate an event which brings the entire world together, I set out to explore trends in world cup data. The project was done mainly using Pandas, Plotly and Seaborn in Python. Thank you to Sowmya (https://github.com/sowmya0627) for the partnership on this project.
Players tend to gravitate to the major clubs in Europe: England, France, Germany, Italy, Spain (observed by the vertical bands of the heatmap). This can be explained by the fact that European leagues are the most competitive teams and there is the most money available for top players. If players aren't in Europe, they tend to play for their home country (observed on the main diagonal of the heatmap).
All teams have an average age range between 25 and 29. Interestingly, all the final 4 teams (England, Belgium, Croatia, and France) are young (below the average age). France, the winner, was the youngest team in the World Cup.
Other insights on player BMI, player height, & historical finishes of countries are in the attached Jupyter Notebook.
I would like to expand the analysis to include performance in games (wins, goals scored, goals saved) based on physical traits, explore specific country vs. country matchups, and see how teams do when they are the host nation, compared to their historical average.