An interactive dashboard to explore bicycle collisions resulting in injury in San Francisco from 2013 to 2023, identify trends, and see if bicycling has gotten safer.
- Overview
- Setup and Usage
- Data
- Repository Structure
- Project Evaluation
- Future Work
- Acknowledgements
- License
The primary purpose of this project was to create an interactive dashboard to explore traffic collisions resulting in injury that involved bicyclists. Specifically, I wanted to better understand when, where, and how these collisions occur in San Francisco. This project also provided the opportunity to expand my Power BI skills and learn more about the types of data publicly available through DataSF.
The following software requirements and instructions were developed and have only been tested on a Windows 11 computer.
- Applications: Microsoft Power BI Desktop
- Python Version: 3.10.13
- Python Packages: requests, pathlib, pandas, census
- Editors: VS Code 1.86 and Jupyter Notebook 6.5.4
Note: Items 2-4 are only needed if you will be using the Jupyter notebook to import and clean US census data.
-
Verify the above software requirements and dependencies have been met on your computer.
-
Clone this repository on your local computer. If you do not want to clone the entire repository, you can download the bicycle_collisions_sf.pbix Power Bi report and the two CSV files in the resources folder.
-
Open the Power BI Report.
-
Select Transform data from the Home tab to open the Power Query Editor.
-
Select the resourcesFolderPath query and replace the folder path with the full path to the location of the CSV files on your computer.
-
If you would like to access traffic collision data directly through DataSF via oData,
-
Select the dataTrafficCrashes query.
-
Click on the Source step.
-
Replace sourceTrafficCrashesCSV with sourceTrafficCrashesDataSF.
Note: You may need to review and remove rows with errors when importing data.
-
-
Select Close & Apply from the Home tab.
-
Edit the config_blank.py file in the data_processing folder. Update the variable with your API key. You can get an API key at https://api.census.gov/data/key_signup.html.
census_key = '[Enter your API key without the brackets]'
-
Save and close the file, then rename it to config.py.
-
Open a terminal window in the data_processing directory and start Jupyter Notebook.
jupyter notebook
-
From Jupyter notebook, open the file data_processing_us_census.ipynb.
-
Modify the code if necessary and then run each cell to create and export a dataframe to a CSV file.
-
Open the Power BI Report, select Transform data from the Home tab and modify the dimCensusData query as needed.
-
Open the bicycle_collisions_sf.pbix Power Bi report. There are 5 dashboard pages and 2 ToolTip pages.
-
Use the page tabs at the bottom or 5 icons on the left navigation menu to move between the dashboard pages. Be sure to control-click on the navigation icons - this is not needed with published reports.
-
Use the 3 slicers along the top to filter data on any of the 5 dashboard pages. You can also filter visuals directly or use the filters panel.
- DataSF: Data on traffic crashes resulting in injury was available at https://data.sfgov.org/Public-Safety/Traffic-Crashes-Resulting-in-Injury/ubvf-ztfx/about_data.
- US Census Bureau: Data on bike commuters in San Francisco was available at https://data.census.gov.
- DataSF: Data was exported to a csv for initial exploration and then imported into Power Bi using an oData connection. The Power Bi report is designed to use both methods.
- US Census Bureau: Data was imported into a pandas dataframe using the https://pypi.org/project/census.
The US Census Bureau data was processed using Python in a Jupyter notebook as follows:
-
Population data for each year was imported into a separate dataframe and then merged into one dataframe.
-
The data was transformed by isolating and renaming the year, population, and bike commuter columns and correcting the data type for the last two columns. The dataframe was then exported to a CSV file.
The DataSF dataset was processed using Power Query in Power Bi as follows:
-
The sourceTrafficCrashesCSV query was created to import the data from a csv file. Unneeded columns were then removed from the query.
-
The dataTrafficCrashes query was created directly referencing the above query. The data was then filtered (removing rows with null values, crashes not involving bicycle and duplicates) and column values were replaced for clarity and simplicity.
-
The dimCollisionType query was created referencing the dataTrafficCrashes query to get a list of the unique collision types and create a primary key column. Similar queries were created for lists of the unique collision severities, neighborhoods, parties involved in the collisions, parties at fault and vehicle code violations.
-
The factCollisions query was created referencing the dataTrafficCrashes query to merge all of the list queries and replace text values with their primary keys to optimize the data and minimize the project's storage size.
The following data model was created in Power Bi:
This repository is organized into the following folders:
- images - image files for the Power Bi report and this readme file.
- data_processing - Jupyter notebook and python file to import and processes US Census data.
- resources - data source files and helpful information obtained on various city websites
- root - Power Bi Project, license file, and this readme.
This project was an incredible learning opportunity not only in expanding my Power BI skills, but in exploring the vast amounts of data publicly available through DataSF. In the process of gathering this data, I learned about Vision Zero and became more familiar with the city's extensive efforts to reduce traffic fatalities and create safer streets for pedestrians, bicyclists and drivers. This led me to the city's online dashboards on their progress and a recent benchmarking project to compare San Francisco's efforts on key metrics alongside its peer cities. These dashboards demonstrate the city's commitment to using and sharing data to help guide policy and action. For more information, I encourage you to go to https://www.sf.gov/data/vision-zero-benchmarking and https://www.sfmta.com/vision-zero-sf.
So what did I find? A few observations:
- The collision rate has decreased by 33.5% between 2013 and 2022. Although the rate has increased since the Covid-19 pandemic, the rate in 2022 is still lower than the pre-pandemic rate in 2019 by 6.8%.
- The Mission has consistently had the highest number of bicycle collisions per year followed by the South of Market and Financial District. Neighborhood collision rates were not easily available by based on options for summarizing US Census data.
- On average, September and October have the highest number of collisions while most collisions occur from 8 to 10am and 5 to 7pm. Interestingly, Wednesdays have the highest number of collisions.
- Unsafe speed by a cyclist and unsafe turn or lane change by a driver were the top two violations cited from bicycle collisions. Drivers were identified as the faulty party more than bicyclists every year except in 2020 and 2021. This may be due to the significant changes in the number of vehicles and bicycles on the road during the pandemic.
Some recommended areas of future work on this project include:
- Neighborhood Level Census Data - Collision rates specific to each neighborhood would allow a comparison of bicycle safety between neighborhoods.
- Automated Bike Counters - As more data is available over time, these counts would provide a more accurate measure of the collision rate and would allow a comparision of bicycle safety on specific streets and between neighborhoods.
- Vision Zero High Injury Network - Overlay high injury network on the map visual and analyze collision rates over time for the identified street sections. Map of network is available at https://sfgov.maps.arcgis.com/apps/webappviewer/index.html?id=b2743a3fc0b14dd9814cf6668fc34773.
- Vision Zero Protected Bike Lanes - Isolate collisions on streets before and after protected lanes were installed.
- Other Vision Zero Projects - Compare collision numbers before and after implementation of other projects including improvements to the High Injury Network, 20 mph corridors, intersections with no turn on red signs, intersections with turn calming, and traffic calming devices. For more information on these projects, visit https://www.sfmta.com/vision-zero-safe-streets-progress.
I foremost want to thank Maia Moran and Jacob Henke from the San Francisco Municipal Transportation Agency. They provided manual counts for 2018 and 2019, as well as automated bike counts from 2018 through 2022. They were extremely helpful in the midst of system upgrades and their daily work. Given the changes in the number and locations of automated bike counters over the years, I decided not to use this data as an indicator of the annual number of cyclists. They have published this data and more on the MTA website at https://www.sfmta.com/reports/average-weekday-bike-volumes-dashboard.
I'd also like to acknowledge the following resources:
- DataSF and the Department of Public Health (DPH)- The "Traffic Crashes Resulting in Injury" dataset provides detailed information regarding traffic collisions in San Francisco and was the primary data source. DPH obtains and analyzes data from the SFPD, and publishes it through DataSF. DataSF provides access to this data through file export, API, oData and querying/visualizing the data using their online tools. Visit https://www.sf.gov/departments/city-administrator/datasf to learn more about available data and their services.
- US Census Bureau - One of the many available datasets includes bicycle commuting as a means of transportation to work. This was used as an indicator of the annual number of bicyclists in San Francisco to calculate the collision rate and assess changes in the safety of bicycling over the years.
- Jeremy Carbaug - Jeremy is the author of the Python package, census, which is a simplye wrapper for the United States Census Bureau’s API. This package simplifies the process of importing census data through their API.
This project is licensed under the MIT license.