This is a team based project that explored a traffic accident data set.
- Dataset
- Project Outline
- Example Plots
- Findings Reports and Presentation
- Dependencies and Setup Required
- How to View / Run the code
- Jupyter Notebooks File Guide
- Repository Structure
- Team
We used the UK Road Safety: Traffic Accidents and Vehicles
Detailed dataset of road accidents and involved vehicles in the UK (2005-2017).
Available from Kaggle.com
There are 2 CSV files in this data set.
- Accident_Information.csv
- Vehicle_Information.csv
Both files should be placed in the Resources/ Directory:
Both CSV files were merged into a single dataframe. The resulting data file was extremely large so a decision was made to focus on Years 2010-2016. This data was filtered and placed into a New CSV all.CSV which is the main data used for all the investigation, analysis and plots.
Data Limitations
It must be noted that the data was limited in scope. Therefore, despite some interesting findings, the plots extracted from the data although "true", do not "tell the entire story".
We decided our client question would be "which factors contribute to accident risk?" and used this question to formulate some hypotheses and used these to target the data relevant to our hypotheses and attempt to turn that raw data in to meaningful information.
- Hypothesis 1: Greater volume of traffic increases the number of accidents
- Hypothesis 2: Urban areas have a greater number of accidents than rural areas
- Hypothesis 3: The time or day of the week does not affect the number of accidents
- Hypothesis 4: Speed limits do not influence the number of accidents
- Hypothesis 5: Biological factors like gender and age do not influence the number of accidents
- Hypothesis 6: Location on the road or vehicle manoeuvre does not influence the number of accidents
- Hypothesis 7: Poor weather conditions influence the number of accidents
For each hypotheses we created a number of visualisations to display the data in an easier to analyse format which helped us understand the information required.
Here are 2 examples plots we created from the data.
The plots can be found in the /Images folder after running the code in the Notebook files that are in the root directory.
The findings of this project can be found in the /Presentation directory.
There are 3 files:
- 01_Project_scope_notes.pdf
- 02_Presentation.pdf
- 02_Traffic Accidents Report.pdf
In order to run the files you will need to install the following packages.
- gmaps
pip install gmaps
- pandas
pip install pandas
- seaborn
pip install seaborn
- matplotlib
pip install matplotlib
- scipy
pip install scipy
- jupyter notebook
pip install notebook
Other Required Files:
Add the below 2 files into your local cloned repository!
- File 1: all.csv - Click to Download (accidents from 2010-2016) - File was not included in the repository due to the large file size.
The all.CSV must be placed in the "/Resources" directory.
Gmaps API Key requirement
For gmaps you will also need an API key from the Google Maps Platform. Please visit the Google maps platform to set up an API key if you do not already have one.
-
File 2: config.py - Click to Download
-
Open the file in a text editor or VS code and change "YOUR API KEY HERE" to your API key from the Google Maps API.
-
The config.py file should be stored in your local repository root folder.
The work was completed primarily using Jupyter Notebooks and the modules listed in the Dependencies section.
-
Clone the repository
-
Complete steps in the Dependencies and Setup Required section above.
-
Open any of the Jupyter Notebook files (.ipynb) in the root directory and run the cells in order.
The Jupyter notebook files have comments in the code and Markdown cells beneath each step explaining what was done in the cell above.
For a short description of what each notebook contains, please see the Jupyter Notebooks File Guide section below.
- 01_data_retrieval_step_1.ipynb - Initial data processing and filtering
- 01_data_retrieval_step_2.ipynb - Initial data processing and filtering
- 01_data_retrieval_step_3.ipynb - Initial data processing and filtering
- 02_traffic_vol_vs_accidents.ipynb - Volume of Traffic vs Number of Accidents
- 03_When_accidents_happen.ipynb - Days of the week, Time of day, Gender, Age, vs Accidents
- 04_Where - Heatmaps.ipynb - Google heatmaps of accidents across the UK and accidents in Birmingham
- 05_RoadSafety.ipynb - Number of Casualties vs Speed limit and Number of Casualties vs Time of day
- 06_Speed Limit Project - FINAL.ipynb - Number of accidents vs speed limit and Number of Accidents vs Vehicle Manouvre.
- 07_Accidents By Road Class and Road Type - Number of Accidents by Severity for Road Class and Road Type
- 08_weather.ipynb - Number of Accidents vs Weather Condition
- Notebook code files in the root directory root/
- Presentation and report files in the Presentation directory Presentation/
- Image and Plots in the Images directory Images
- Dataset files in the Resources directory Resources