Data analysis projects, mostly about the NBA. Using mostly R, some SQL. Visualization using ggplot2. Doing it for fun, self-learning and changing the discourse to a data-informed one.
Scraping and parsing data from Basketball-reference.com (using Rvest, tidyverse). code
Scraping and parsing play by play data (text) to tabular data (using Rvest, tidyverse, lubridate, StringR) code
Scraping and parsing data from Basketball-reference.com (using Rvest, tidyverse) code
Scraping and parsing NBA Draft data from Basketball-reference.com (using Rvest, tidyverse) code
Scraping data from stats.nba.com (using httr, stringr) code
Matching ID to each player (using Rvest) code
I have tested the hypothesis of the effect of time and score on shot distribution, thinking that teams tend to take riskier (/higer variance) shots under certein terms (when the game is getting out of hand). However, visual analysis of shot distribution per minute has not shown any results, nor did heatmap visualisation of time (x axis) against score (y axis). I have also used regression with time and score in order to estimate their impact (with interaction), but the coeffiecents were small.
Instead, I have used the visulations to illustrate a different problem - the very large share of free throws in the end of games, especially in the end of close games, which can really harm the viewing experience.
Visualization:
Bar Chart | Heatmap |
---|---|
The regression mentioned above did indicate that a comeback win (a win after a large defecit), does have an impact on its team shot distribution. In order to further invesitigate it, I have compared the shot distribution of the losing team and the winning team in each the comeback wins of 2020-2021, during the comeback. As can be seen below, no difference in shot distribution, and a large difference in shot making. This indicates that in order to come back from a large defecit, a team does not need to change its strategy. It only needs to make more shots.
Visualization:
Shot Distribution | Shot Making |
---|---|
After publishing my findings above, I have received a follow-up question, asking whether the win is impacted more by strength of the defense or the offense. I have created a lolipop chart which details each game (ordered by offensive efficiency of winning team), and indicating if the offensive efficiency of the winning team is above the league average (dashed line), or if its defenseive efficiency. code
Visualization:
Using the Wilcoxon signed-rank test in order to find out which player stats distribution has changed between decades. I have used the wilcoxon test because it is a-parmetric, and most of the stats distribution are unknown.
Visalization:
Box-plot indicating that the average margin of victory this year has drastically increased.
Visualization:
Checking which team changes its strategy (shot distribution), in final minutes of close games, and wether the decision is impacted by its position in the game (trailing/leading)
Visualization:
Visualiation:
Counting which initials (First name and last name) produce the most NBA player, and the best NBA players (total and per capita) (SQL)
visualization: