My attempts at using the R language to collect, save, and visualize daily police reports, which are listed on the Newport News Police Open Data page.
Using the daily-collection.R
file, you can run the runDailyCollection
function to automatically download all the daily Newport News Police Open Data reports and append them to CSV file data sets. Notice, you must tell the function your working directory where these repo files are stored.
repo <- "/Users/adam/Documents/Newport News Open Police Data"
source("/Users/adam/Documents/Newport News Open Police Data/daily-collection.R")
runDailyCollection(repo)
If you're saavy, you could use a scheduler like cron
to run this on a daily basis after midnight. This would allow you to create your own daily snapshot of police reports.
Using the plot-reports.R
file, you can run the plotReports
function to automatically plot on a Leaflet map all police activity stored in the CSV data sets. It saves the plot in an HTML file that you can view from a web server. I use MAMP on my Mac for this. On Windows, you can use WAMP or XAMMP to serve the HTML Leaflet plot. Notice, you must tell the function your working directory where these repo files are stored.
You can view a sample of the Leaflet plot here.
repo <- "/Users/adam/Documents/Newport News Open Police Data"
source("/Users/adam/Documents/Newport News Open Police Data/plot-reports.R")
plotReports(repo)
I've included some scripts for automating the entire process of downloading and cleaning the daily CSVs, uploading them to a Google Cloud Storage account (via API keys), and creating the HTML Leaflet plot.
Modify them for your needs, but here's how they work for me:
cron-job.R
: This R script automates the jobs. You must change therepo
variable to point to your repo directory.cronScript
: This bash shell script will call thecron-job.R
R script to run the jobs. It then pushes the updatedindex.html
Leaflet plot to your fork of this repo. I call this script daily, as you'd guess, viacron
on my Mac.
Just one caveat...
Before running the Leaflet plot via R on a command-line, I suggest you also install Pandoc. On a Mac, you'd need to do this via Homebrew. Pandoc is responsible for combining and encoding all the Leaflet HTML and JavaScript assets into the single index.html
file. Pandoc is included with RStudio's binaries and runs automatically via the GUI, but it's unavailable if you're running R headless.
If you don't install Pandoc, you'll still get the index.html
file, but you'll also get a subfolder index_files
in the repo with all the assets needed to run Leaflet--it's quite a lot of files.
Public versions of the appended daily reports are available as CSV files on Google Cloud Storage, which I maintain:
- Accident Reports
- Arrest Reports
- Juvenile Reports
- Offenses Reports
- Field Contacts Reports
- Theft from Vehicle Reports
Need to incorporate these daily arrest and offense reports, since they list additional charges that don't show up on the CSV data sets (See Things I've learned below):
Daily Offense Reports:
Daily Arrest Reports:
-
Arrest IDs can have multiple charges, which are all shown on the Daily Arrest Reports. In the
Daily Arrest Report (24 hours)
on the Newport News Police Open Data page, only the first charge is listed. -
The times of arrest on the Daily Arrest Reports do not necessarily match the times listed on the
Daily Arrest Report (24 hours)
. For my purposes, the earliest times will be kept. -
The Daily Offense Reports show the
Beat
number, whereas theDaily Offenses
report on the Newport News Police Open Data page does not.
- Important: check for and handle blank reports
- Include
Precinct
andBeat
in data sets
- Replace empty cells with NA values
- Try to make this more DRY--Put config variables like file and column names into vectored lists that can be passed to granular worker functions