fixes #6

thegargiulian · Mar 21, 2024 · 6869d42 · 6869d42
1 parent 9d21219
commit 6869d42
Show file tree

Hide file tree

Showing 9 changed files with 660,541 additions and 648,493 deletions.
diff --git a/README.md b/README.md
@@ -3,9 +3,9 @@
 [doi]: https://doi.org/10.17605/OSF.IO/U8DC3
 
 
-# Monthly municipal-level homicide rates in Mexico from January 2000 to December 2021
+# Monthly municipal-level homicide rates in Mexico from January 2000 to December 2022
 
-Data on crude monthly municipal-level homicide rates is available in `mexico-muni-month-homicide-rates-2000-2021.csv`; state-level aggregations are available in `mexico-state-month-homicide-rates-2000-2021.csv`. Note that both files use `|` as the separator.
+Data on crude monthly municipal-level homicide rates is available in `mexico-muni-month-homicide-rates-2000-2022.csv`; state-level aggregations are available in `mexico-state-month-homicide-rates-2000-2022.csv`. Note that both files use `|` as the separator.
 
 If you use `R` you can use `readr` package to load the file and specify the separator with `readr::read_delim("PATH_TO_FILE", delim = "|")` 
 
@@ -18,24 +18,24 @@ To replicate the results, first run the `import` sub-task within the `census-dat
 - iter_00_cpv2010.csv, retreived from https://www.inegi.org.mx/programas/ccpv/2010/#Datos_abiertos (download data for "Estados Unidos Mexicanos")
 - conjunto_de_datos_iter_00CSV20.csv, retreived from https://www.inegi.org.mx/programas/ccpv/2020/#Datos_abiertos (download data for "Estados Unidos Mexicanos")
 
-Next run the `interpolate` sub-task within the `census-task` using the `Makefile` in the `census-data/interpolate` directory. This task uses the population counts from the `import` sub-task to linearly interpolate mid-year (1 July) population counts for each municipality from 2000-2021.
+Next run the `interpolate` sub-task within the `census-task` using the `Makefile` in the `census-data/interpolate` directory. This task uses the population counts from the `import` sub-task to linearly interpolate mid-year (1 July) population counts for each municipality from 2000-2022.
 
-After running both sub-tasks in the `census-data` task, run the sub-tasks in the `deaths-data` directory. Again, this task begins with an `import` sub-task, which you can run using the `Makefile`. This task reads in death certificate files published in `.dbf` format by INEGI and writes their contents to `.csv` files. This task expects death certificate files from 2000-2021 in a sub-directory called `death-certificates` within the top-level `data` directory. These files can be downloaded from https://www.inegi.org.mx/programas/mortalidad/#Microdatos.
+After running both sub-tasks in the `census-data` task, run the sub-tasks in the `deaths-data` directory. Again, this task begins with an `import` sub-task, which you can run using the `Makefile`. This task reads in death certificate files published in `.dbf` format by INEGI and writes their contents to `.csv` files. This task expects death certificate files from 2000-2022 in a sub-directory called `death-certificates` within the top-level `data` directory. These files can be downloaded from https://www.inegi.org.mx/programas/mortalidad/#Microdatos.
 
-Next, run the `homicide-counts` sub-task using the `Makefile`. This task uses the death certificate files imported in the `deaths-data/import` task to generate counts of homicide deaths in each municipality in each month from January 2000-December 2021. The cause of death classification file, found in the `hand` subdirectory follows the cause of death classification scheme used by [Elo, Beltrán-Sánchez and Macinko (2014)](https://pubmed.ncbi.nlm.nih.gov/24554793/). Note that deaths that occurred outside of Mexico and deaths that were missing cause of death, country of occurrence, or municipality of occurrence were excluded from these calculations. We also opted to use data from the location where the death occurred rather than the location where the individual was from because this information was more complete for homicides. One day we might impute this information and recalculate the counts accordingly.
+Next, run the `homicide-counts` sub-task using the `Makefile`. This task uses the death certificate files imported in the `deaths-data/import` task to generate counts of homicide deaths in each municipality in each month from January 2000-December 2022. The cause of death classification file, found in the `hand` subdirectory follows the cause of death classification scheme used by [Elo, Beltrán-Sánchez and Macinko (2014)](https://pubmed.ncbi.nlm.nih.gov/24554793/). Note that deaths that occurred outside of Mexico and deaths that were missing cause of death, country of occurrence, or municipality of occurrence were excluded from these calculations. We also opted to use data from the location where the death occurred rather than the location where the individual was from because this information was more complete for homicides. One day we might impute this information and recalculate the counts accordingly.
 
-Finally, run the top-level `homicide-rates` task to calculate the monthly municipal-level crude homicide rates for January 2000-December 2021.
+Finally, run the top-level `homicide-rates` task to calculate the monthly municipal-level crude homicide rates for January 2000-December 2022.
 
 If you use this data please use the BibTeX entry below or see the [OSF repository](https://osf.io/u8dc3/) for other citation formats:
 
 ```
 @misc{Gargiulo_Aburto_Floridi_2023,
-  title={Monthly municipal-level homicide rates in Mexico (January 2000–December 2021)},
+  title={Monthly municipal-level homicide rates in Mexico (January 2000–December 2022)},
   url={osf.io/u8dc3},
   DOI={10.17605/OSF.IO/U8DC3},
   publisher={OSF},
   author={Gargiulo, Maria and Aburto, José Manuel and Floridi, Ginevra},
-  year={2023},
-  month={Feb}
+  year={2024},
+  month={March}
 }
 ```
diff --git a/code/census-data/interpolate/src/interpolate.R b/code/census-data/interpolate/src/interpolate.R
@@ -33,7 +33,7 @@ interpolation_wrapper <- function(ent) {
 
     # use 1 July as mid-year date
     mid_years_1 <- seq(ymd(20000701), ymd(20090701), by = "year")
-    mid_years_2 <- seq(ymd(20100701), ymd(20210701), by = "year") # apply same slope through 2021
+    mid_years_2 <- seq(ymd(20100701), ymd(20220701), by = "year") # apply same slope through 2022
 
     estimates_1 <- map_dfr(.x = mid_years_1,
                            ~interpolate_population(pop_1 = ent$total_pop_2000,

diff --git a/code/deaths-data/homicide-counts/Makefile b/code/deaths-data/homicide-counts/Makefile
@@ -9,12 +9,12 @@ DEATHS := $(wildcard $(HERE)/code/deaths-data/import/output/DEFUN*.csv)
 
 .PHONY: all clean
 
-all: output/muni-month-homicides-2000-2021.csv
+all: output/muni-month-homicides-2000-2022.csv
 
 clean:
 		-rm output/*
 
-output/muni-month-homicides-2000-2021.csv: \
+output/muni-month-homicides-2000-2022.csv: \
 		src/calculate.R \
 		$(DEATHS)
 	-mkdir output

diff --git a/code/deaths-data/homicide-counts/src/calculate.R b/code/deaths-data/homicide-counts/src/calculate.R
@@ -45,15 +45,15 @@ homicide_codes <- cod_mapping %>%
     filter(cod_group == "Homicides")
 
 # collect all death certificate file paths
-years <- 2000:2021
+years <- 2000:2022
 input_files <- glue("{args$import_stub}/DEFUN{years}.csv")
 
 # read in and concatenate records from all files
 deaths_data <- map_dfr(input_files, read_file)
 
 homicide_deaths <- deaths_data %>%
     # filter out deaths that occurred outside of time period or are missing year information
-    filter(between(year, 2000, 2021)) %>%
+    filter(between(year, 2000, 2022)) %>%
     # filter out deaths missing month information
     filter(month != 99) %>%
     # filter out deaths that occurred outside of Mexico or are missing state info

diff --git a/code/homicide-rates/Makefile b/code/homicide-rates/Makefile
@@ -8,29 +8,29 @@ HERE := $(shell git rev-parse --show-toplevel)
 
 .PHONY: all clean
 
-all: output/mexico-muni-month-homicide-rates-2000-2021.csv \
-	 output/mexico-state-month-homicide-rates-2000-2021.csv
+all: output/mexico-muni-month-homicide-rates-2000-2022.csv \
+	 output/mexico-state-month-homicide-rates-2000-2022.csv
 
 clean:
 		-rm output/*
 
-output/mexico-muni-month-homicide-rates-2000-2021.csv: \
+output/mexico-muni-month-homicide-rates-2000-2022.csv: \
 		src/muni-calculate.R \
-		$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
+		$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
 		$(HERE)/code/census-data/interpolate/output/population-estimates.csv
 	-mkdir output
 	Rscript --vanilla $< \
-			--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
+			--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
 			--population_estimates=$(HERE)/code/census-data/interpolate/output/population-estimates.csv \
 			--output=$@
 
-output/mexico-state-month-homicide-rates-2000-2021.csv: \
+output/mexico-state-month-homicide-rates-2000-2022.csv: \
 		src/state-calculate.R \
-		$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
+		$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
 		$(HERE)/code/census-data/interpolate/output/population-estimates.csv
 	-mkdir output
 	Rscript --vanilla $< \
-			--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
+			--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
 			--population_estimates=$(HERE)/code/census-data/interpolate/output/population-estimates.csv \
 			--output=$@
 

diff --git a/code/homicide-rates/src/muni-calculate.R b/code/homicide-rates/src/muni-calculate.R
@@ -12,11 +12,11 @@ pacman::p_load(argparse, here, dplyr, readr, tidyr, lubridate)
 
 parser <- ArgumentParser()
 parser$add_argument("--homicides_data",
-                    default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv"))
+                    default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv"))
 parser$add_argument("--population_estimates",
                     default = here::here("code/census-data/interpolate/output/population-estimates.csv"))
 parser$add_argument("--output",
-                    default = "output/mexico-muni-month-homicide-rates-2000-2021.csv")
+                    default = "output/mexico-muni-month-homicide-rates-2000-2022.csv")
 
 args <- parser$parse_args()
 
@@ -45,9 +45,9 @@ population <- read_delim(args$population_estimates, delim = "|") %>%
     select(-month, -day, -est_date)
 
 # start by creating a grid with all municipalities and months between January
-# 2000 and December 2021
+# 2000 and December 2022
 munis <- union(homicides$ent_mun, population$ent_mun)
-months <- seq(ym("200001"), ym("202112"), by = "month")
+months <- seq(ym("200001"), ym("202212"), by = "month")
 
 homicide_rates <- crossing(munis, months) %>% # expand grid
     mutate(year = as.numeric(year(months)),

diff --git a/code/homicide-rates/src/state-calculate.R b/code/homicide-rates/src/state-calculate.R
@@ -12,7 +12,7 @@ pacman::p_load(argparse, here, dplyr, readr, tidyr, lubridate, assertr, stringr)
 
 parser <- ArgumentParser()
 parser$add_argument("--homicides_data",
-                    default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv"))
+                    default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv"))
 parser$add_argument("--population_estimates",
                     default = here::here("code/census-data/interpolate/output/population-estimates.csv"))
 parser$add_argument("--output",
@@ -61,9 +61,9 @@ population <- population %>%
     ungroup()
 
 # start by creating a grid with all states and months between January
-# 2000 and December 2021
+# 2000 and December 2022
 states <- union(homicides$cve_ent, population$cve_ent)
-months <- seq(ym("200001"), ym("202112"), by = "month")
+months <- seq(ym("200001"), ym("202212"), by = "month")
 
 homicide_rates <- crossing(states, months) %>% # expand grid
     mutate(year = as.numeric(year(months)),