Skip to content

Commit

Permalink
fixes #6
Browse files Browse the repository at this point in the history
  • Loading branch information
thegargiulian committed Mar 21, 2024
1 parent 9d21219 commit 6869d42
Show file tree
Hide file tree
Showing 9 changed files with 660,541 additions and 648,493 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
[doi]: https://doi.org/10.17605/OSF.IO/U8DC3


# Monthly municipal-level homicide rates in Mexico from January 2000 to December 2021
# Monthly municipal-level homicide rates in Mexico from January 2000 to December 2022

Data on crude monthly municipal-level homicide rates is available in `mexico-muni-month-homicide-rates-2000-2021.csv`; state-level aggregations are available in `mexico-state-month-homicide-rates-2000-2021.csv`. Note that both files use `|` as the separator.
Data on crude monthly municipal-level homicide rates is available in `mexico-muni-month-homicide-rates-2000-2022.csv`; state-level aggregations are available in `mexico-state-month-homicide-rates-2000-2022.csv`. Note that both files use `|` as the separator.

If you use `R` you can use `readr` package to load the file and specify the separator with `readr::read_delim("PATH_TO_FILE", delim = "|")`

Expand All @@ -18,24 +18,24 @@ To replicate the results, first run the `import` sub-task within the `census-dat
- iter_00_cpv2010.csv, retreived from https://www.inegi.org.mx/programas/ccpv/2010/#Datos_abiertos (download data for "Estados Unidos Mexicanos")
- conjunto_de_datos_iter_00CSV20.csv, retreived from https://www.inegi.org.mx/programas/ccpv/2020/#Datos_abiertos (download data for "Estados Unidos Mexicanos")

Next run the `interpolate` sub-task within the `census-task` using the `Makefile` in the `census-data/interpolate` directory. This task uses the population counts from the `import` sub-task to linearly interpolate mid-year (1 July) population counts for each municipality from 2000-2021.
Next run the `interpolate` sub-task within the `census-task` using the `Makefile` in the `census-data/interpolate` directory. This task uses the population counts from the `import` sub-task to linearly interpolate mid-year (1 July) population counts for each municipality from 2000-2022.

After running both sub-tasks in the `census-data` task, run the sub-tasks in the `deaths-data` directory. Again, this task begins with an `import` sub-task, which you can run using the `Makefile`. This task reads in death certificate files published in `.dbf` format by INEGI and writes their contents to `.csv` files. This task expects death certificate files from 2000-2021 in a sub-directory called `death-certificates` within the top-level `data` directory. These files can be downloaded from https://www.inegi.org.mx/programas/mortalidad/#Microdatos.
After running both sub-tasks in the `census-data` task, run the sub-tasks in the `deaths-data` directory. Again, this task begins with an `import` sub-task, which you can run using the `Makefile`. This task reads in death certificate files published in `.dbf` format by INEGI and writes their contents to `.csv` files. This task expects death certificate files from 2000-2022 in a sub-directory called `death-certificates` within the top-level `data` directory. These files can be downloaded from https://www.inegi.org.mx/programas/mortalidad/#Microdatos.

Next, run the `homicide-counts` sub-task using the `Makefile`. This task uses the death certificate files imported in the `deaths-data/import` task to generate counts of homicide deaths in each municipality in each month from January 2000-December 2021. The cause of death classification file, found in the `hand` subdirectory follows the cause of death classification scheme used by [Elo, Beltrán-Sánchez and Macinko (2014)](https://pubmed.ncbi.nlm.nih.gov/24554793/). Note that deaths that occurred outside of Mexico and deaths that were missing cause of death, country of occurrence, or municipality of occurrence were excluded from these calculations. We also opted to use data from the location where the death occurred rather than the location where the individual was from because this information was more complete for homicides. One day we might impute this information and recalculate the counts accordingly.
Next, run the `homicide-counts` sub-task using the `Makefile`. This task uses the death certificate files imported in the `deaths-data/import` task to generate counts of homicide deaths in each municipality in each month from January 2000-December 2022. The cause of death classification file, found in the `hand` subdirectory follows the cause of death classification scheme used by [Elo, Beltrán-Sánchez and Macinko (2014)](https://pubmed.ncbi.nlm.nih.gov/24554793/). Note that deaths that occurred outside of Mexico and deaths that were missing cause of death, country of occurrence, or municipality of occurrence were excluded from these calculations. We also opted to use data from the location where the death occurred rather than the location where the individual was from because this information was more complete for homicides. One day we might impute this information and recalculate the counts accordingly.

Finally, run the top-level `homicide-rates` task to calculate the monthly municipal-level crude homicide rates for January 2000-December 2021.
Finally, run the top-level `homicide-rates` task to calculate the monthly municipal-level crude homicide rates for January 2000-December 2022.

If you use this data please use the BibTeX entry below or see the [OSF repository](https://osf.io/u8dc3/) for other citation formats:

```
@misc{Gargiulo_Aburto_Floridi_2023,
title={Monthly municipal-level homicide rates in Mexico (January 2000–December 2021)},
title={Monthly municipal-level homicide rates in Mexico (January 2000–December 2022)},
url={osf.io/u8dc3},
DOI={10.17605/OSF.IO/U8DC3},
publisher={OSF},
author={Gargiulo, Maria and Aburto, José Manuel and Floridi, Ginevra},
year={2023},
month={Feb}
year={2024},
month={March}
}
```
2 changes: 1 addition & 1 deletion code/census-data/interpolate/src/interpolate.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ interpolation_wrapper <- function(ent) {

# use 1 July as mid-year date
mid_years_1 <- seq(ymd(20000701), ymd(20090701), by = "year")
mid_years_2 <- seq(ymd(20100701), ymd(20210701), by = "year") # apply same slope through 2021
mid_years_2 <- seq(ymd(20100701), ymd(20220701), by = "year") # apply same slope through 2022

estimates_1 <- map_dfr(.x = mid_years_1,
~interpolate_population(pop_1 = ent$total_pop_2000,
Expand Down
4 changes: 2 additions & 2 deletions code/deaths-data/homicide-counts/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ DEATHS := $(wildcard $(HERE)/code/deaths-data/import/output/DEFUN*.csv)

.PHONY: all clean

all: output/muni-month-homicides-2000-2021.csv
all: output/muni-month-homicides-2000-2022.csv

clean:
-rm output/*

output/muni-month-homicides-2000-2021.csv: \
output/muni-month-homicides-2000-2022.csv: \
src/calculate.R \
$(DEATHS)
-mkdir output
Expand Down
4 changes: 2 additions & 2 deletions code/deaths-data/homicide-counts/src/calculate.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,15 @@ homicide_codes <- cod_mapping %>%
filter(cod_group == "Homicides")

# collect all death certificate file paths
years <- 2000:2021
years <- 2000:2022
input_files <- glue("{args$import_stub}/DEFUN{years}.csv")

# read in and concatenate records from all files
deaths_data <- map_dfr(input_files, read_file)

homicide_deaths <- deaths_data %>%
# filter out deaths that occurred outside of time period or are missing year information
filter(between(year, 2000, 2021)) %>%
filter(between(year, 2000, 2022)) %>%
# filter out deaths missing month information
filter(month != 99) %>%
# filter out deaths that occurred outside of Mexico or are missing state info
Expand Down
16 changes: 8 additions & 8 deletions code/homicide-rates/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,29 @@ HERE := $(shell git rev-parse --show-toplevel)

.PHONY: all clean

all: output/mexico-muni-month-homicide-rates-2000-2021.csv \
output/mexico-state-month-homicide-rates-2000-2021.csv
all: output/mexico-muni-month-homicide-rates-2000-2022.csv \
output/mexico-state-month-homicide-rates-2000-2022.csv

clean:
-rm output/*

output/mexico-muni-month-homicide-rates-2000-2021.csv: \
output/mexico-muni-month-homicide-rates-2000-2022.csv: \
src/muni-calculate.R \
$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
$(HERE)/code/census-data/interpolate/output/population-estimates.csv
-mkdir output
Rscript --vanilla $< \
--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
--population_estimates=$(HERE)/code/census-data/interpolate/output/population-estimates.csv \
--output=$@

output/mexico-state-month-homicide-rates-2000-2021.csv: \
output/mexico-state-month-homicide-rates-2000-2022.csv: \
src/state-calculate.R \
$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
$(HERE)/code/census-data/interpolate/output/population-estimates.csv
-mkdir output
Rscript --vanilla $< \
--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv \
--homicides_data=$(HERE)/code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv \
--population_estimates=$(HERE)/code/census-data/interpolate/output/population-estimates.csv \
--output=$@

Expand Down
8 changes: 4 additions & 4 deletions code/homicide-rates/src/muni-calculate.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ pacman::p_load(argparse, here, dplyr, readr, tidyr, lubridate)

parser <- ArgumentParser()
parser$add_argument("--homicides_data",
default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv"))
default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv"))
parser$add_argument("--population_estimates",
default = here::here("code/census-data/interpolate/output/population-estimates.csv"))
parser$add_argument("--output",
default = "output/mexico-muni-month-homicide-rates-2000-2021.csv")
default = "output/mexico-muni-month-homicide-rates-2000-2022.csv")

args <- parser$parse_args()

Expand Down Expand Up @@ -45,9 +45,9 @@ population <- read_delim(args$population_estimates, delim = "|") %>%
select(-month, -day, -est_date)

# start by creating a grid with all municipalities and months between January
# 2000 and December 2021
# 2000 and December 2022
munis <- union(homicides$ent_mun, population$ent_mun)
months <- seq(ym("200001"), ym("202112"), by = "month")
months <- seq(ym("200001"), ym("202212"), by = "month")

homicide_rates <- crossing(munis, months) %>% # expand grid
mutate(year = as.numeric(year(months)),
Expand Down
6 changes: 3 additions & 3 deletions code/homicide-rates/src/state-calculate.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ pacman::p_load(argparse, here, dplyr, readr, tidyr, lubridate, assertr, stringr)

parser <- ArgumentParser()
parser$add_argument("--homicides_data",
default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2021.csv"))
default = here::here("code/deaths-data/homicide-counts/output/muni-month-homicides-2000-2022.csv"))
parser$add_argument("--population_estimates",
default = here::here("code/census-data/interpolate/output/population-estimates.csv"))
parser$add_argument("--output",
Expand Down Expand Up @@ -61,9 +61,9 @@ population <- population %>%
ungroup()

# start by creating a grid with all states and months between January
# 2000 and December 2021
# 2000 and December 2022
states <- union(homicides$cve_ent, population$cve_ent)
months <- seq(ym("200001"), ym("202112"), by = "month")
months <- seq(ym("200001"), ym("202212"), by = "month")

homicide_rates <- crossing(states, months) %>% # expand grid
mutate(year = as.numeric(year(months)),
Expand Down
Loading

0 comments on commit 6869d42

Please sign in to comment.