This project provides a web platform that processes and visualizes incident data from the Norman, Oklahoma Police Department's daily incident reports. Users can upload reports in CSV, PDF, or URL formats, and the application extracts, augments, and visualizes the data to reveal insights about incident patterns. The visualizations aid in identifying trends such as peak times, high-incident areas, and weather conditions associated with incidents. The platform also enables users to download an augmented dataset with additional analytical fields.
Note: This application is designed to analyze and visualize Norman PD incident reports found on the Norman PD website.
- Multi-Format Upload Support: Users can upload files in CSV, PDF, or as URLs linking to incident reports.
- Data Augmentation: Each incident is enriched with additional fields, including Weather, Location Rank, and Incident Rank.
- Visual Analytics: Displays 7 key visualizations, including incident trends, weather patterns, and hotspot maps.
- Data Download: Users can download the augmented dataset for further analysis in CSV format.
- /app: Contains the core web app code, templates, and utilities.
- templates/: HTML templates for the frontend pages.
- utils/: Helper scripts for data processing, including geocoding, ranking, weather lookups, and data augmentation.
- resources/: Stores the SQLite database and augmented CSV files.
- /tests: Includes test cases for verifying data processing functions.
- run.py: The main entry point for running the Flask application.
- Pipfile: Manages dependencies using Pipenv.
Ensure Python 3.11 or later is installed on your system.
git clone https://github.com/Vveanta/norman-incident-vis.git
cd norman-incident-vis
pipenv install
pipenv shell
Copy the example environment file and update it with your actual Google Maps API key:
cp .example_env .env
Then, open .env
and replace the placeholder with your API key:
GOOGLE_API_KEY=YOUR_ACTUAL_API_KEY_HERE
Set up the environment and start the Flask server:
export FLASK_APP=run
flask run --port 5200
The application will be accessible at http://localhost:5200
.
Users can upload incident data in multiple formats. Each file should follow the data structure specified by the Norman PD, containing fields such as Date/Time, Incident Number, Location, Nature, and Incident ORI.
- CSV Files: Direct upload of CSV files containing URLs to incident PDFs hosted on the Norman PD website.
- PDF Files: Upload PDFs from the Norman PD’s incident summary reports.
- URLs: Provide URLs directly to the PDF incident reports hosted on the Norman PD website.
- Default PDFs: Select from a set of pre-loaded default PDFs available on the platform.
The platform processes up to three incident reports at a time. For further processing, please contact the project owner.
The initial data includes the following fields extracted from the Norman PD reports:
- Date/Time: Timestamp of the incident.
- Incident Number: Unique identifier for each incident.
- Location: Place where the incident occurred.
- Nature: Type/category of the incident.
- Incident ORI: Identifier indicating the originating agency or unit.
The application augments this data with additional fields to enable comprehensive analysis:
- Day of the Week: Numeric value from 1 (Sunday) to 7 (Saturday).
- Time of Day: Hour of the incident in 24-hour format.
- Weather: WMO (World Meteorological Organization) code indicating weather conditions at the time of the incident.
- Location Rank: Rank of the location based on incident frequency.
- Location: Name of the incident location from the original data.
- Side of Town: Directional classification (e.g., N, S, E) relative to Norman’s town center.
- Incident Rank: Rank of the incident type based on frequency.
- Nature: Category or type of the incident (from the original data).
- EMSSTAT: Boolean indicating if the incident ORI is "EMSSTAT" or if related incidents occur at the same time/location.
The main goal of this project is to enhance the structured dataset extracted from Norman PD reports by adding attributes to each record. Below is a breakdown of each attribute, including how it is calculated and integrated into the dataset.
The "Day of the Week" and "Time of Day" attributes are derived using Python's datetime
module. By converting the incident_time
field to a datetime
object, we extract both the day of the week and the hour of the incident. The isoweekday()
method provides the day as an integer (1 for Monday, 7 for Sunday), while the hour
attribute gives the hour of the incident.
The "Weather" attribute is determined by querying the Open-Meteo API using the latitude, longitude, and date of each incident. The API response includes historical weather data, from which we extract the WMO weather code for the specific hour of the incident. This process is handled in the fetch_weather_data
function.
Both "Location Rank" and "Incident Rank" are based on the frequency of each unique location and incident type, respectively. The calculate_frequencies
function counts occurrences, while assign_ranks
assigns a rank based on frequency, accounting for ties.
To categorize each incident's "Side of Town," we first geocode the location to obtain latitude and longitude (using the Google Maps API). The bearing from the town center (35.220833, -97.443611) is calculated, and based on this angle, the determine_side_of_town
function classifies each incident as being in one of eight directional categories (N, S, E, W, NW, NE, SW, SE).
The "Nature" field is a direct copy from the original data, representing the type or category of the incident.
The "EMSSTAT" field is a boolean indicating whether the incident ORI is EMSSTAT
or if the two following records share the same time and location with an ORI of EMSSTAT
. This check is implemented in the check_emsstat
function.
The augment_data
function coordinates the data augmentation process. It retrieves incidents from the database, augments each record with the new attributes, and saves the results as a new dataset for visualization and download.
The platform provides a comprehensive visualization dashboard to explore incident trends and patterns. Each visualization leverages Plotly
for interactive graphs and Leaflet
for maps.
-
Incidents by Day of the Week:
- Chart Type: Bar Chart
- Description: Shows the frequency of incidents by day of the week to highlight weekly trends.
-
Incidents by Time of Day:
- Chart Type: Line Chart
- Description: Visualizes incident frequency by hour, identifying peak times.
-
Weather Conditions:
- Chart Type: Bar Chart with Icons
- Description: Displays incident frequency by WMO weather codes. Each weather type includes a descriptive icon from the WMO Code Table.
-
Incident Hotspots Map:
- Visualization Tool: Leaflet
- Description: Geographical map showing the distribution of incidents across the city. Color intensity indicates incident density, with red as the highest frequency.
-
Top Incident Types:
- Chart Type: Horizontal Bar Chart
- Description: Lists the top 20 incident types, with the option to adjust the number of types displayed.
-
Side of Town Distribution:
- Chart Type: Pie Chart
- Description: Depicts the proportion of incidents across different areas (e.g., N, S, E) within Norman, Oklahoma.
-
EMSSTAT Incidents:
- Chart Type: Pie Chart
- Description: Shows the proportion of incidents categorized as EMSSTAT (emergency medical service) versus non-emergency incidents.
After processing, users can download the augmented dataset from the results page. The CSV file includes all original fields plus the newly added fields, offering a rich dataset for further analysis.
- Geocoding Accuracy: Accuracy depends on the Google Maps API, which has rate limits that may affect response times.
- Weather Data Efficiency: Weather data retrieval can be slow, especially for large datasets. Caching mechanisms could improve efficiency.
- File Processing Limit: Only three files are processed per session to manage API calls and processing load.
- Google Maps API: Used for geocoding locations.
- Open-Meteo API: Provides historical weather data by location.
- WMO Code Table: Used for interpreting weather conditions.
- Name: Vedansh Maheshwari
- LinkedIn: Vedansh Maheshwari