Skip to content

Latest commit

 

History

History
113 lines (67 loc) · 6.63 KB

README.md

File metadata and controls

113 lines (67 loc) · 6.63 KB

Data-Analysis-Projects

Data analysis projects, mostly about the NBA. Using mostly R, some SQL. Visualization using ggplot2. Doing it for fun, self-learning and changing the discourse to a data-informed one.

Scraping

Basketball-reference game id scraper

Scraping and parsing data from Basketball-reference.com (using Rvest, tidyverse). code

Basketball-reference Play by Play scraper

Scraping and parsing play by play data (text) to tabular data (using Rvest, tidyverse, lubridate, StringR) code

Basketball-reference Player stats scarper

Scraping and parsing data from Basketball-reference.com (using Rvest, tidyverse) code

Basketball-reference Draft Scraper

Scraping and parsing NBA Draft data from Basketball-reference.com (using Rvest, tidyverse) code

Stats.nba.com Scraper

Scraping data from stats.nba.com (using httr, stringr) code

ESPN ID Scraper

Matching ID to each player (using Rvest) code

Projects

Analyzing effect of time and score on shot distribtion

I have tested the hypothesis of the effect of time and score on shot distribution, thinking that teams tend to take riskier (/higer variance) shots under certein terms (when the game is getting out of hand). However, visual analysis of shot distribution per minute has not shown any results, nor did heatmap visualisation of time (x axis) against score (y axis). I have also used regression with time and score in order to estimate their impact (with interaction), but the coeffiecents were small.

Instead, I have used the visulations to illustrate a different problem - the very large share of free throws in the end of games, especially in the end of close games, which can really harm the viewing experience.

code

Visualization:

Bar Chart Heatmap

Analyzing if a comeback (closing a large defecit) impacts shot distribtion

The regression mentioned above did indicate that a comeback win (a win after a large defecit), does have an impact on its team shot distribution. In order to further invesitigate it, I have compared the shot distribution of the losing team and the winning team in each the comeback wins of 2020-2021, during the comeback. As can be seen below, no difference in shot distribution, and a large difference in shot making. This indicates that in order to come back from a large defecit, a team does not need to change its strategy. It only needs to make more shots.

code

Visualization:

Shot Distribution Shot Making

Analyzing whether comeback is impacted by defense or offense

After publishing my findings above, I have received a follow-up question, asking whether the win is impacted more by strength of the defense or the offense. I have created a lolipop chart which details each game (ordered by offensive efficiency of winning team), and indicating if the offensive efficiency of the winning team is above the league average (dashed line), or if its defenseive efficiency. code

Visualization:

alt text

Testing differences between decades in NBA with Wilcoxon signed-rank test

Using the Wilcoxon signed-rank test in order to find out which player stats distribution has changed between decades. I have used the wilcoxon test because it is a-parmetric, and most of the stats distribution are unknown.

code

Visalization:

alt text

Checking difference between the past 7 years in margin of victory

Box-plot indicating that the average margin of victory this year has drastically increased.

Visualization:

alt text

Comet Plot

Checking which team changes its strategy (shot distribution), in final minutes of close games, and wether the decision is impacted by its position in the game (trailing/leading)

code

Visualization:

alt text

alt text

What's the effect of guarding shots in game?

code

Visualiation:

alt text

Just for fun

Counting Buzzer beaters (end of game shots) per NBA team

Counting which initials (First name and last name) produce the most NBA player, and the best NBA players (total and per capita) (SQL)

code

Counting which duo of players have played in the most teams together (SQL)

code

Is the Boston Celtics is the worst team on a Sunday Afternoon?

code

visualization:

alt text

Counting wire-to-wire wins