Traffic accidents are one of the leading causes of death and injuries globally, and in Germany they cause over 3,000 deaths and 300,000 injuries yearly. Studying traffic accidents and where and why they happen can help policy makers make our roads safer. Therefore, we built and trained several machine learning models to predict the risk of occurrence of a traffic accident on a road segment for a given hour and day in Berlin. Form this analysis, local policy-makers can derive valuable insights into which factors determine dangerous road segments and when and where collisions are more likely to happen.
Take a look our project presentation slides to get a better overview about the project here!
We conducted this study using the example of accidents in Berlin using the following data sources:
-
Records of traffic accidents in Berlin: Open Data Berlin shares records of all traffic accidents that happened between 2018 and 2020; a total of 38,851 occurrences. For this study we used data on GPS-coordinates, year, month, day of the week, and hour of the accident.
-
Berlin road segment data: We used two datasets provided by the Open Data Informationsstelle (ODIS):
- To match locations of traffic accidents (Figure 1a) to road segments in Berlin (Figure 1b) we used the existing geometric information dataset on road segments in Berlin. From here we also used data on the length of the road segment.
- We used road segment surface dataset to extract the information on whether a road segment is a main road or a side street.
-
Weather data: Using the Wetterdienst API we collected data on temperature, humidity, precipitation duration, precipitation height, and visibility for every accident location, day of the week, and hour of the day between 2018 and 2020 from 5 Berlin weather stations.
-
Sun elevation: We used the Python API PySolar to collect data on sun elevation angles per date and location in Berlin.