smoothHP
is an R package with a set of conveince functions and data
for making population projections using a modified version of the
Hamilton-Perry (HP) method outlined by Takashi
Inoue 2016.
Unlike the more prevalent cohort component method, the HP method allows
for population projections to be made in the absence of fertility,
mortality, and migration data, only relying on changes in population.
The version utilized in this R package allows for hierarchical smoothing
comparable to an empirical Bayes approach such that sub-population
projections may more closely resemble their group average.
# Install directly from GitHub
remotes::install_github("nmmarquez/smoothHP")
smoothHP
is designed to allow you to make projections testing multiple
hierarchical smoothing strategies. To demonstrate this we use data from
the Washington State Office of Fincance and Management official
population
estimates
to make small area population projections for king county.
Currently smoothHP
requires strict input data requirements which are
as follows
- A column named
Sex
indicating either “Female” or “Male” - A column named
Year
which is numeric, and contains at least two years that are five years apart - A column names
Age5
which is a factor with 18 levels or integers 1 through 18 representing 5 year age groups from 0 to 85+ - A column named
value
representing a population estimates of the corresponding demographic group. - At least one more column indicating the level to make projections at
To help with this process the package provides a utility function to assess compatibility with the package functions.
library(smoothHP)
check_smooth_input(kc_pop_data)
The smoothHP
package is capable of making projections at multiple
levels with various smoothing strategies using the main function
multi_stage_group_HP_project
. Using the provided data lets say we were
interested in making projections for all provided racial and ethnic
groups without smoothing. The code would be the following
unsmooth_DF <- multi_stage_group_HP_project(
kc_pop_data, stages = list(c("County", "Race")))
Here we provided the multi_stage_group_HP_project
a stages parameter
which is a list of length 1 which the function interprets to only do
only one layer of HP projections with no smoothing. If we instead we
wanted to smooth race specific estimates to the county average we would
provide a list of length 2 as seen below.
smooth_DF <- multi_stage_group_HP_project(
kc_pop_data, stages = list("County", "Race"))
We can compare the smooth to the non-smoothed estimates visually to see the effects that the smoothing process had on the projections.
library(ggplot2)
raceth_DF <- kc_pop_data[
,.(value = sum(value)), by = list(Year, Sex, Race, Age5, County)]
raceth_DF[,Type:="Data"]
smooth_DF[,Type:="Smoothed"]
unsmooth_DF[,Type:="Direct"]
bind_rows(raceth_DF, smooth_DF, unsmooth_DF) %>%
group_by(Year, Race, Type) %>%
summarize(Population = sum(value) / 1000000) %>%
ggplot(aes(x = Year, y = Population, color = Race, linetype = Type)) +
geom_line() +
theme_classic() +
labs(
y = "Population Estimate (in millions)",
color = "Racial and Ethnic Categories") +
ggtitle(
"Population Forecasts by Race and Ethnicity",
"Comparison of Smoothing Approaches")
Direct estimates using the HP method seem to provide implausible results in this case while using a smoothing approach gives us more reasonable growth projections for each group.
We may also want to do additional layers of smoothing to make race and ethnicity specific estimates at smaller areas (and subsequently smaller population estimates) which may be more unstable with direct estimates. Here again we demonstrate two approaches, one where we look at each small area estimate racial and ethic group and smooth them to the county population totals and another which introduces a two tiered smoothing approach. In this case rather than looking directly at the projections we instead examine the components of the projections, specifically the CWR estimate of the HP method.
bind_rows(
multi_stage_CWR_estimates(
kc_pop_data, stages = list("County", c("Race", "GEOID"))) %>%
mutate(Method = "One Stage"),
multi_stage_CWR_estimates(
kc_pop_data, stages = list("County", "Race", "GEOID")) %>%
mutate(Method = "Two Stage")) %>%
filter(Sex == "Female" & Race != "Two or More Races") %>%
ggplot(aes(x = CWR, fill = Method, color = Method)) +
geom_density(alpha = .4) +
facet_wrap(~Race, scales = "free") +
theme_classic() +
ggtitle(
"Distribution of Female CWR estimates by Race and Ethnicity",
"Comparison of Smoothing Approaches")
In this example by using a one stage smoothing approach, we may over-smooth estimates of CWR to the county average while a two stage approach allows more variation between racial and ethnic groups.
All of the functions provided by the smoothHP
package allow you to do
any number of layers of smoothing for Projections or CWR and CCR
estimates directly.
The development of this R
package and the forthcoming research
articles associated with it were made possible by financial support from
the following organizations.