Skip to content

My attempts at using the R language to collect, save, and visualize daily police reports, which are listed on the Newport News Police Open Data page.

Notifications You must be signed in to change notification settings

adamcarrier/Newport-News-Open-Police-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Newport-News-Open-Police-Data

My attempts at using the R language to collect, save, and visualize daily police reports, which are listed on the Newport News Police Open Data page.

Usage

Daily report collection

Using the daily-collection.R file, you can run the runDailyCollection function to automatically download all the daily Newport News Police Open Data reports and append them to CSV file data sets. Notice, you must tell the function your working directory where these repo files are stored.

repo <- "/Users/adam/Documents/Newport News Open Police Data"
source("/Users/adam/Documents/Newport News Open Police Data/daily-collection.R")
runDailyCollection(repo)

If you're saavy, you could use a scheduler like cron to run this on a daily basis after midnight. This would allow you to create your own daily snapshot of police reports.

Plotting the reports

Using the plot-reports.R file, you can run the plotReports function to automatically plot on a Leaflet map all police activity stored in the CSV data sets. It saves the plot in an HTML file that you can view from a web server. I use MAMP on my Mac for this. On Windows, you can use WAMP or XAMMP to serve the HTML Leaflet plot. Notice, you must tell the function your working directory where these repo files are stored.

You can view a sample of the Leaflet plot here.

repo <- "/Users/adam/Documents/Newport News Open Police Data"
source("/Users/adam/Documents/Newport News Open Police Data/plot-reports.R")
plotReports(repo)

Automatic data collection

I've included some scripts for automating the entire process of downloading and cleaning the daily CSVs, uploading them to a Google Cloud Storage account (via API keys), and creating the HTML Leaflet plot.

Modify them for your needs, but here's how they work for me:

  • cron-job.R: This R script automates the jobs. You must change the repo variable to point to your repo directory.
  • cronScript: This bash shell script will call the cron-job.R R script to run the jobs. It then pushes the updated index.html Leaflet plot to your fork of this repo. I call this script daily, as you'd guess, via cron on my Mac.

Just one caveat...

Before running the Leaflet plot via R on a command-line, I suggest you also install Pandoc. On a Mac, you'd need to do this via Homebrew. Pandoc is responsible for combining and encoding all the Leaflet HTML and JavaScript assets into the single index.html file. Pandoc is included with RStudio's binaries and runs automatically via the GUI, but it's unavailable if you're running R headless.

If you don't install Pandoc, you'll still get the index.html file, but you'll also get a subfolder index_files in the repo with all the assets needed to run Leaflet--it's quite a lot of files.

Public Data Sets

Public versions of the appended daily reports are available as CSV files on Google Cloud Storage, which I maintain:

Roadmap

Need to incorporate these daily arrest and offense reports, since they list additional charges that don't show up on the CSV data sets (See Things I've learned below):

Daily Offense Reports:

Daily Arrest Reports:

Things I've learned

  • Arrest IDs can have multiple charges, which are all shown on the Daily Arrest Reports. In the Daily Arrest Report (24 hours) on the Newport News Police Open Data page, only the first charge is listed.

  • The times of arrest on the Daily Arrest Reports do not necessarily match the times listed on the Daily Arrest Report (24 hours). For my purposes, the earliest times will be kept.

  • The Daily Offense Reports show the Beat number, whereas the Daily Offenses report on the Newport News Police Open Data page does not.

More to do

  • Important: check for and handle blank reports
  • Include Precinct and Beat in data sets

Tidy data tasks

  • Replace empty cells with NA values
  • Try to make this more DRY--Put config variables like file and column names into vectored lists that can be passed to granular worker functions

About

My attempts at using the R language to collect, save, and visualize daily police reports, which are listed on the Newport News Police Open Data page.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published