- Codeholics Edition | Mother Jones Mass Shooter Database 1982-2023
- Changes To Original Data
- Dependencies
- Executing The Script
- Output
- Check out the Statistics
- Links
This project was initiated in response to the lack of consensus among media companies and government agencies regarding statistical reporting on mass shootings. The difficulty in sharing statistics arises from the controversial nature of defining what constitutes a mass shooting, compounded by the fact that record sources are not being shared.
To address this issue, extensive research was conducted to align on a definition of a mass shooting, resulting in the identification of the National Center for Victims of Crime's definition of three or more victims (excluding other crimes that include gun violence) as the most appropriate. This definition was adopted after previously being set at four victims prior to 2013.
Despite this authoritative definition, it remains unclear why other government agencies or media outlets have not adhered to it, opting instead to create their own definitions.
To address this issue, the Mother Jones dataset, which adheres to the VOC's definition, was selected as the primary data source. Only media reports that meet this definition are included in the dataset, which is updated regularly and publicly available to download as a CSV.
However, the original Mother Jones dataset had a few issues that were addressed before being uploaded to Data.World. These changes were previously done manually, requiring significant time and attention to detail to ensure their accuracy. Our goal is to make this valuable dataset more useful for other data scientists and preserve its integrity.
The original dataset used in this project had some inconsistencies, including headers and capitalization errors. Upon closer inspection, it was also found that some columns lacked data consistency, and some data was missing but available in the sources. One such column, "weapon_type," required multiple changes to ensure consistent data structure.
To address these issues, we have automated the process using PowerShell. This allows us to ensure consistency in the changes made to the dataset while being transparent about what was modified. The PowerShell script includes the following steps:
- Downloading the latest public copy of the Mother Jones dataset.
- Renaming duplicate headers.
- Fixing columns as a whole (trimming, splitting, and forcing case).
- Updating data to include any record updates.
- Adding the Mother Jones Dataset and Codeholics version to a SQLite database (with record IDs created starting with the oldest record).
- Exporting CSVs.
We no longer manually update the spreadsheet, as the modifications are now automatically executed by the script. The changes made to the data are clearly documented in the PowerShell script, SQLite database, and a new column on the final CSV output, ensuring full transparency. Once the changes have been made, the updated dataset is uploaded to Data.World, where it is available for use by other data scientists.
It is not required, but I found it best to review the data in the SQLite file by using DB Browser (SQLite).
- Download the source code from the project
https://github.com/Codeholics/US-Mass-Shootings.git
- Update the variable
$CPSScriptRoot
instart.ps1
to be the path to the Repo project folder you created with step 1. - Execute
start.ps1
All artifacts after running the script can be found in the /Export
folder.
Path | Purpose |
---|---|
Mother Jones Raw.csv | The Mother Jones dataset, in its original form, is provided without any modifications. However, it should be noted that the CSV file contains duplicate headers, which renders it unsuitable for direct use in this project. |
Mother Jones - Mass Shootings Database 1982-2023.csv | The duplicated header in the Mother Jones Raw.csv file has been corrected to ensure that the dataset can be used accurately and efficiently in this project. |
Codeholics - Mass Shootings Database 1982-2023.csv | Final report after data changes made by start.ps1 . |
MassShooterDatabase.sqlite | The final results of this project have been stored in a SQLite database, which includes both the original dataset from Mother Jones and the Codeholics Edition. This database serves as a reliable and efficient resource for data scientists seeking to analyze and report on mass shootings. To facilitate the use of the database, sample statistics queries have been provided in the /SQL folder. These queries offer a useful starting point for data scientists seeking to conduct statistical analyses on mass shootings data. |
Check out the Statistics
This is an auto generated Markdown documents that clearly displays our SQL values and a Summary of stats.