Skip to content

Latest commit

 

History

History
1070 lines (962 loc) · 57.1 KB

wealth-and-environmental-stewardship.md

File metadata and controls

1070 lines (962 loc) · 57.1 KB
title author date output
Is it true that _as people get richer, they care and do more for the environment?_
Felipe Valencia
2023-04-05
html_document
keep_md toc toc_float code_folding fig_height fig_width fig_align
true
true
true
hide
6
12
center
# Import data

############# Survey of owners of BEVs in California #############
# Read xlsx
# Temporary file
temp_excel <- tempfile()
# Download using downloader package
download("https://datadryad.org/stash/downloads/file_stream/153034",
         dest = temp_excel, # Save temp file
         mode = "wb")
# Read xlxs from temp file
BEV_FCV <- read_excel(temp_excel)

############# GDP per capita (current US$) #############
# Download zip using downloader package
# Temporary file
temp_zip <- tempfile()
# Temporary directory
temp_dir <- tempdir()
# Download using downloader library
download("https://api.worldbank.org/v2/en/indicator/NY.GDP.PCAP.CD?downloadformat=csv",
         destfile = temp_zip, # Save temp file
         mode = "wb")
# Unzip temporary file in temporary directory
unzip(temp_zip, exdir = temp_dir)

# Read csv file using foreign package
GDP <- read_csv(paste(temp_dir, "API_NY.GDP.PCAP.CD_DS2_en_csv_v2_5358417.csv", sep = "/"), skip = 3)

############# CO2 emissions (metric tons per capita) #############
# Download zip using downloader package
# Temporary file
temp_zip <- tempfile()
# Temporary directory
temp_dir <- tempdir()
# Download using downloader library
download("https://api.worldbank.org/v2/en/indicator/EN.ATM.CO2E.PC?downloadformat=csv",
         destfile = temp_zip, # Save temp file
         mode = "wb")
# Unzip temporary file in temporary directory
unzip(temp_zip, exdir = temp_dir)

# Read csv file using foreign package
CO2_emissions <- read_csv(paste(temp_dir, "API_EN.ATM.CO2E.PC_DS2_en_csv_v2_5358914.csv", sep = "/"), skip = 3)

############# ISO 3166 Countries with Regional Codes #############

# Read csv of the ISO 3166 Countries with Regional Codes (taken from Luke Duncalfe's GitHub reop)
countries_ISO3166 <- read.csv("https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true")

Executive Summary

In this project, we tried to discover if people CARE more about the environment and DO something for sustainability as they get richer. To gauge whether they CARE, we explored data from a survey of thousands of battery electric vehicle (BEV) owners in the US state of California. They are already helping the environment by owning an electric vehicle but we found that BEV owners with higher incomes CARE about reducing greenhouse gas emissions, but they CARE less than lower-income BEV owners. To assess whether people DO something for environmental sustainability as they got richer, we explored the relationship between GDP per capita and CO2 emissions (metric tons per capita) over the years with data available from every country in the world. We found that when people get richer, their CO2 emissions decrease.

CARE: Owners of BEVs in California

This data set contains Sociodemographic data for battery electric vehicle owning households in California (From NCST Project "Understanding the Early Adopters of Fuel Cell Vehicles"). We are only going to focus on the: Information on vehicle owned, Household Income, Highest Level of Education, Age, Gender, Number of vehicles in the household, and a scale of the importance of reducing Greenhouse Gas Emissions (GGE). It contains around 906 Fuel-Cell Vehicles (FCV) respondents data, but we ignore them to focus on the results from Battery Electric Vehicle (BEV) respondents. We assumed that those that own a BEV are "environmentally friendly", and we are going to see if their wealth affects their perception of care for the environment.

Data Wrangling

Here's the data set we have cleaned and prepared for our analysis:

# Clean & wrangle data

BEV <- BEV_FCV %>%
  filter(!is.na(`submitdate. Date submitted`)) %>%
  rename(Household_Income = `Household Income`,
         ID = `id. Response ID`,
         importance_reduce_GGE = `Importance of reducing greenhouse gas emissions (-3 not important, 3 important)`,
         edu_level = `Highest Level of Education`,
         number_vehicles_H = `Number of vehicles in the household`,
         gender = `Gender (Male 1)`) %>%
  mutate(Household_Income_Thousands = Household_Income / 1000,
         edu_level = str_replace_all(as.character(edu_level), c("1" = "Some High School", "2" = "High School Graduate", "3" = "College Graduate", "4" = "Masters, Doctorate, \nor Professional Degree")),
         gender = str_replace_all(as.character(gender), c("1" = "Male", "0" = "Female"))) %>%
  select(ID, Carmain, Household_Income_Thousands, importance_reduce_GGE, edu_level, number_vehicles_H, gender, Age) %>%
  filter(grepl('PHASE', ID))

# Turn Household income in thousands, gender and edu_level to factors
BEV$Household_Income_Thousands <- as.factor(BEV$Household_Income_Thousands)
BEV$gender <- as.factor(BEV$gender)
BEV$edu_level <- as.factor(BEV$edu_level)
# Custom order for the edu_level factor
BEV$edu_level <- factor(BEV$edu_level, levels = c("Some High School", "High School Graduate", "College Graduate", "Masters, Doctorate, \nor Professional Degree"))
BEV
## # A tibble: 19,357 × 8
##    ID         Carmain               House…¹ impor…² edu_l…³ numbe…⁴ gender   Age
##    <chr>      <chr>                 <fct>     <dbl> <fct>     <dbl> <fct>  <dbl>
##  1 PHASE_1_54 2012 Nissan Leaf      500        2.63 "Maste…       3 Female    65
##  2 PHASE_1_48 2013 Toyota Prius Pl… <NA>       2.76 "Maste…       2 Male      65
##  3 PHASE_1_49 2013 Honda Fit EV     125       -2.97 "Colle…       3 Female    55
##  4 PHASE_1_52 2013 Toyota Prius Pl… <NA>      NA     <NA>         1 <NA>      NA
##  5 PHASE_1_57 2013 Nissan Leaf      325        2.87 "Maste…       5 Female    75
##  6 PHASE_1_58 2015 Ford Fusion Ene… 275        1.9  "Maste…       3 Female    45
##  7 PHASE_1_59 2014 Chevrolet Spark… 225        1.58 "Maste…       2 Female    55
##  8 PHASE_1_62 2013 Nissan Leaf      375        2.57 "Colle…       2 Female    45
##  9 PHASE_1_63 2014 Tesla Model S    225        2.33 "High …       2 Female    35
## 10 PHASE_1_64 2013 Tesla Model S    275        2.64 "Maste…       3 Female    75
## # … with 19,347 more rows, and abbreviated variable names
## #   ¹​Household_Income_Thousands, ²​importance_reduce_GGE, ³​edu_level,
## #   ⁴​number_vehicles_H

Summary Statistics

# Get Summary Statistics
BEV %>% st(title = "Owners of BEVs - socioeconomic variables and data on attitudes towards sustainability")
Owners of BEVs - socioeconomic variables and data on attitudes towards sustainability
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
Household_Income_Thousands 16175
... 50 585 4%
... 75 2395 15%
... 125 3902 24%
... 175 3366 21%
... 225 2260 14%
... 275 1432 9%
... 325 747 5%
... 375 374 2%
... 425 265 2%
... 475 179 1%
... 500 670 4%
importance_reduce_GGE 17376 1.7 1.6 -3 1.1 2.8 3
edu_level 18117
... Some High School 43 0%
... High School Graduate 2307 13%
... College Graduate 7300 40%
... Masters, Doctorate, or Professional Degree 8467 47%
number_vehicles_H 19357 2.3 0.97 1 2 3 5
gender 18029
... Female 13309 74%
... Male 4720 26%
Age 18045 50 13 18 35 55 80

Data Visualization and Discussion

# Plot & visualize data
BEV %>% ggplot(aes(x = importance_reduce_GGE)) +
  geom_histogram(binwidth = 0.35, color = "green4", fill = "gray80") +
  theme_classic() +
  scale_x_continuous(breaks = c(-3, -2, -1, 0, 1, 2, 3),
                     expand = expansion(mult = c(0, 0))) +
  scale_y_continuous(expand = expansion(mult = c(0, 0))) +
  theme(panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif"),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif")) +
  labs(title = "Distribution of attitudes toward sustainability", subtitle = "Owners of BEVs in California were asked about how important is reducing greenhouse gas emissions. \nMeasured with a continuous scale from -3= “Not important” to 3= “Important”", y = "Count", x = "Not important                                                                                                                                                                                 Important", caption = "Source: National Center for Sustainable Transportation\nhttps://doi.org/10.25338/B8P313")

Going back to the summary statistics table, the mean of this scale is around 1.7, which tells us there's an attitude of care for the sustainability of the environment from owners of BEVs in California. With the histogram, we can see the majority of the respondents consider "Important" to a certain extent reducing greenhouse gas emissions.

According to the summary statistics, in terms of wealth, the majority of owners of BEVs in California have an income between $75k and $225k, but we have enough data for incomes greater than $225k to see if on average people with higher income care more about the environment than those with lower incomes among those that we consider "environmentally friendly" due to owning a BEV.

# Plot & visualize data

# Turn Household income in thousands factor to a numeric data type
BEV$Household_Income_Thousands <- as.numeric(as.character(BEV$Household_Income_Thousands))

BEV %>%
  filter(!is.na(Household_Income_Thousands)) %>%
  filter(!is.na(edu_level)) %>%
  ggplot(aes(y = importance_reduce_GGE, x = Household_Income_Thousands)) +
  geom_smooth(color = "green4") +
  theme_classic() +
  #coord_cartesian(ylim = c(-3, 3), xlim = c(50, 500))
  scale_x_continuous(breaks = c(50, 75, 125, 175, 225, 275, 325, 375, 425, 475, 500),
                     expand = expansion(mult = c(0, 0))) +
  theme(panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif", hjust = c(0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1)),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif")) +
  labs(title = "Correlation between household income and attitude toward environmental sustainability", subtitle = "Owners of BEVs in California were asked about how important is reducing greenhouse gas emissions. \nMeasured with a continuous scale from -3= “Not important” to 3= “Important”", y = "Mean importance of reducing GGE", x = "Income in thousands of dollars", caption = "Source: National Center for Sustainable Transportation\nhttps://doi.org/10.25338/B8P313")

Although the mean is between 1.5 and 1.8, which leans towards the “Important” attitude, we see that, on average, more affluent owners of BEVs in California consider it less important than those with lower incomes.

# Plot & visualize data
BEV %>%
  filter(!is.na(Household_Income_Thousands)) %>%
  filter(!is.na(edu_level)) %>%
  # Remove "Some High School" observations because they were too small N=43 to be significant
  filter(edu_level != "Some High School") %>%
  ggplot(aes(y = importance_reduce_GGE, x = Household_Income_Thousands)) +
  geom_smooth(color = "green4") +
  facet_wrap(~edu_level, nrow = 1) +
  theme_classic() +
  scale_x_continuous(breaks = c(50, 75, 125, 175, 225, 275, 325, 375, 425, 475, 500)) +
  theme(panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif", angle = 90, hjust = 1, vjust = 0.5),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif"),
        strip.text.x = element_text(color = "gray10", size = 14, family = "serif"),
        strip.background = element_rect(color = "white", fill = "white")) +
  labs(title = "Correlation between household income and attitude toward environmental sustainability\nby the highest level of education", subtitle = "Owners of BEVs in California were asked about how important is reducing greenhouse gas emissions. \nMeasured with a continuous scale from -3= “Not important” to 3= “Important”", y = "Mean importance of reducing GGE", x = "Income in thousands of dollars", caption = "Source: National Center for Sustainable Transportation\nhttps://doi.org/10.25338/B8P313")

First, it's relevant to say that "Some High School" observations were removed because they were too small N = 43 to be significant for the analysis.

We recognized several patterns from the correlation between household income and the attitude towards environmental sustainability segmented by the highest educational level for the owners of BEVs in California. We can see that people with a higher level of education have, on average, a higher "Important" score for the perception of reducing greenhouse gas emissions than those with a lower education level. Also, we discovered that more affluent people consider it less important than those with lower incomes when segmented by education level; However, we cannot be sure for the High School Graduates group because the confidence interval is too large at the highest incomes. One of the reasons we might have this wide confidence interval for the perception of reducing GGE of the High School Graduate group with the highest incomes is that we do not have enough observations for this group with these incomes.

An important question to move forward in the analysis is why the wealthier owners of BEVs in California seem to care less about GGE than those at the same educational level but with less affluence, is there something they know that less affluent people in their educational level group don't know about reducing GGE, or is it something that has to do with the psyche of the wealthiest?

Since this wasn't a longitudinal study, we cannot conclude that as people get richer they CARE more or less about the environment, but we do see that owners of BEVs in California with higher incomes, (that are doing something good for the environment by using battery electric vehicles instead of fuel-based ones) on average, CARE about reducing GGE, but they CARE less than less affluent owners.

to further the analysis it will be interesting to find data from owners of fuel-based vehicles, to compare if there are differences with BEV owners in their perspectives and their incomes.

DO: GDP and CO2 Emissions

The GDP per capita and CO2 emissions (metric tons per capita) data sets allowed us to see the correlation between people's wealth and CO2 emissions. This helped us analyze whether as people get richer they do more for the environment.

Data Wrangling

Here are the data sets we have cleaned and prepared for our analysis:

# Clean & wrangle data

# GDP per Capita tidy
GDP_long <- GDP %>%
  select(-`Indicator Name`, -`Indicator Code`) %>%
  pivot_longer(!`Country Name`:`Country Code`, names_to = "year", values_to = "GDP_per_cap", values_drop_na = TRUE)

# CO2 Emissions tidy
CO2emissions_long <- CO2_emissions %>%
  select(-`Indicator Name`, -`Indicator Code`) %>%
  pivot_longer(!`Country Name`:`Country Code`, names_to = "year", values_to = "CO2_m_tons_per_cap", values_drop_na = TRUE)

# Join between GDP per Capita and CO2 Emissions tidy
GDP_CO2Emissions <- inner_join(GDP_long, CO2emissions_long, by = c("Country Name", "Country Code", "year")) %>%
  mutate(year = as.double(year))
GDP_CO2Emissions
## # A tibble: 6,937 × 5
##    `Country Name`              `Country Code`  year GDP_per_cap CO2_m_tons_per…¹
##    <chr>                       <chr>          <dbl>       <dbl>            <dbl>
##  1 Africa Eastern and Southern AFE             1990        817.            0.982
##  2 Africa Eastern and Southern AFE             1991        858.            0.938
##  3 Africa Eastern and Southern AFE             1992        729.            0.903
##  4 Africa Eastern and Southern AFE             1993        705.            0.905
##  5 Africa Eastern and Southern AFE             1994        697.            0.906
##  6 Africa Eastern and Southern AFE             1995        763.            0.926
##  7 Africa Eastern and Southern AFE             1996        739.            0.937
##  8 Africa Eastern and Southern AFE             1997        758.            0.958
##  9 Africa Eastern and Southern AFE             1998        696.            0.958
## 10 Africa Eastern and Southern AFE             1999        670.            0.897
## # … with 6,927 more rows, and abbreviated variable name ¹​CO2_m_tons_per_cap
# Get country regions using ISO3166
code_region <- countries_ISO3166 %>%
  select(alpha.3, region)

# Join GDP per Capita and CO2 Emissions with the country regions
GDP_CO2Emissions_Regions <- right_join(code_region, GDP_CO2Emissions, by = c("alpha.3" = "Country Code")) %>%
  filter(region != is.na(region)) %>% tibble()
GDP_CO2Emissions_Regions
## # A tibble: 5,497 × 6
##    alpha.3 region `Country Name`  year GDP_per_cap CO2_m_tons_per_cap
##    <chr>   <chr>  <chr>          <dbl>       <dbl>              <dbl>
##  1 AFG     Asia   Afghanistan     2002        184.             0.0490
##  2 AFG     Asia   Afghanistan     2003        200.             0.0539
##  3 AFG     Asia   Afghanistan     2004        222.             0.0437
##  4 AFG     Asia   Afghanistan     2005        255.             0.0635
##  5 AFG     Asia   Afghanistan     2006        274.             0.0692
##  6 AFG     Asia   Afghanistan     2007        375.             0.0683
##  7 AFG     Asia   Afghanistan     2008        388.             0.135 
##  8 AFG     Asia   Afghanistan     2009        444.             0.178 
##  9 AFG     Asia   Afghanistan     2010        555.             0.252 
## 10 AFG     Asia   Afghanistan     2011        622.             0.305 
## # … with 5,487 more rows
# Compute means for each region
Totals_GDP_CO2Emissions_Regions <- GDP_CO2Emissions_Regions %>%
  group_by(region, year) %>%
  summarise(mean_GDP_per_cap = mean(GDP_per_cap),
            mean_CO2_m_tons_per_cap = mean(CO2_m_tons_per_cap))
Totals_GDP_CO2Emissions_Regions
## # A tibble: 150 × 4
## # Groups:   region [5]
##    region  year mean_GDP_per_cap mean_CO2_m_tons_per_cap
##    <chr>  <dbl>            <dbl>                   <dbl>
##  1 Africa  1990            1098.                   0.819
##  2 Africa  1991            1119.                   0.830
##  3 Africa  1992            1112.                   0.841
##  4 Africa  1993            1057.                   0.859
##  5 Africa  1994             989.                   0.884
##  6 Africa  1995            1066.                   0.925
##  7 Africa  1996            1113.                   0.957
##  8 Africa  1997            1128.                   0.965
##  9 Africa  1998            1094.                   1.02 
## 10 Africa  1999            1145.                   1.07 
## # … with 140 more rows

Summary Statistics

# Get Summary Statistics
GDP_CO2Emissions_Regions %>% st(title = "All years' data for each country with the region, GDP per capita, and CO2 emissions (metric tons per capita)")
All years' data for each country with the region, GDP per capita, and CO2 emissions (metric tons per capita)
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
region 5497
... Africa 1541 28%
... Americas 1045 19%
... Asia 1345 24%
... Europe 1180 21%
... Oceania 386 7%
year 5497 2005 8.6 1990 1997 2012 2019
GDP_per_cap 5497 10252 17380 23 942 11081 179458
CO2_m_tons_per_cap 5497 4.3 5.5 0 0.59 6.3 48
# Get Summary Statistics
Totals_GDP_CO2Emissions_Regions %>% st(title = "All years' data for each region, mean GDP per capita and mean CO2 emissions (metric tons per capita)")
All years' data for each region, mean GDP per capita and mean CO2 emissions (metric tons per capita)
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
region 150
... Africa 30 20%
... Americas 30 20%
... Asia 30 20%
... Europe 30 20%
... Oceania 30 20%
year 150 2004 8.7 1990 1997 2012 2019
mean_GDP_per_cap 150 10246 9109 989 4188 12631 36680
mean_CO2_m_tons_per_cap 150 4.3 2.3 0.82 2.7 6.4 9.1

Data Visualization and Discussion

# Plot & visualize data

Totals_GDP_CO2Emissions_Regions %>%
  ggplot(aes(x = year)) +
  geom_line(aes(y = mean_GDP_per_cap), color = "green4", size = 1) +
  geom_line(aes(y = mean_CO2_m_tons_per_cap * 1500), color = "red4", size = 1) +
  facet_wrap(~region, nrow = 1) +
  scale_y_continuous(name = "Mean GDP per capita (current US$)",
                     breaks = c(0, 10000, 20000, 30000, 40000, 50000, 60000),
                     label = comma,
                     sec.axis = sec_axis(~.x /1500, name = "Mean CO2 emissions (metric tons per capita)")) +
  theme_classic() +
  scale_x_continuous(breaks = c(1990, 2000, 2010, 2019)) +
  theme(axis.title.y = element_text(color = "green4"),
        axis.title.y.right = element_text(color = "red4"),
        panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif", angle = 90, hjust = 1, vjust = 0.5),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif"),
        strip.text.x = element_text(color = "gray10", size = 14, family = "serif"),
        strip.background = element_rect(color = "white", fill = "white")) +
  labs(title = "Evolution of GDP per capita and CO2 emissions in metric tons per capita (5 regions)", subtitle = "Only Europe seems to follow the pattern that as people get richer their CO2 emissions decrease", x = "Year", caption = "Source: World Bank CC BY-4.0")

Looking at the evolution of both metrics, we can determine that Europe follows the trend and we could say that as people in Europe get richer, their CO2 emissions decrease; however, this does not explain why, but it is a phenomenon that we observe. This pattern in Europe is in line with the initial hypothesis of this project.

On the contrary, other large regions of the world do not follow this trend, but we cannot deny the hypothesis because none of the other regions has reached the GDP per capita growth of Europe, in the future when they reach a similar GDP per capita, we can compare them and determine whether or not this trend occurs when a region reaches a certain level of wealth.

To deepen the analysis and take into consideration the fact that the regions have a wide range of rich and poor countries that will move the averages, we use smaller regions from the World Bank which could helped us see if the phenomenon was also present in smaller groups.

# Plot & visualize data
GDP_CO2Emissions %>%
  filter(`Country Name` == "North America" | `Country Name` == "Latin America & Caribbean" | `Country Name` == "Europe & Central Asia" | `Country Name` == "East Asia & Pacific" | `Country Name` == "South Asia" | `Country Name` == "Middle East & North Africa" | `Country Name` == "Sub-Saharan Africa") %>%
  ggplot(aes(x = year)) +
  geom_line(aes(y = GDP_per_cap), color = "green4", size = 1) +
  geom_line(aes(y = CO2_m_tons_per_cap * 1500), color = "red4", size = 1) +
  facet_wrap(~`Country Name`, nrow = 1, labeller = label_wrap_gen(width = 15, multi_line = TRUE)) +
  scale_y_continuous(name = "GDP per capita (current US$)",
                     breaks = c(0, 10000, 20000, 30000, 40000, 50000, 60000),
                     label = comma,
                     sec.axis = sec_axis(~.x /1500, name = "CO2 emissions (metric tons per capita)")) +
  theme_classic() +
  scale_x_continuous(breaks = c(1990, 2000, 2010, 2019)) +
  theme(axis.title.y = element_text(color = "green4"),
        axis.title.y.right = element_text(color = "red4"),
        panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif", angle = 90, hjust = 1, vjust = 0.5),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif"),
        strip.text.x = element_text(color = "gray10", size = 14, family = "serif"),
        strip.background = element_rect(color = "white", fill = "white")) +
  labs(title = "Evolution of GDP per capita and CO2 emissions in metric tons per capita (7 regions)", subtitle = "The richest regions in terms of GDP per capita, while having higher CO2 emissions in metric tons per capita, have\nexperienced a decline in CO2 emissions per capita since 1990.", x = "Year", caption = "Source: World Bank CC BY-4.0")

As expected, by dividing the large regions into smaller regions, we found that North America was hit hard when grouped with Latin America & Caribbean. We saw that it has a higher GDP per capita than Europe and also supports the claim that when people become richer their CO2 emissions are reduced, however, the CO2 emissions in metric tons per capita are almost double that of Europe. Additionally, we can see that Latin America & Caribbean have remained constant, but it tends to seem as if it's going down, we'll need to follow up on the evolution of CO2 emissions in metric tons per capita in this region closer, this to be more conclusive with our analysis.

Fortunately, the World Bank classifies countries by income as well, so we explored how GDP per capita and CO2 emissions in metric tons per capita behave under such categories.

# Plot & visualize data
GDP_CO2Emissions$`Country Name` <- as.factor(GDP_CO2Emissions$`Country Name`)
GDP_CO2Emissions$`Country Name` <- factor(GDP_CO2Emissions$`Country Name`, levels = c("Low income", "Low & middle income", "Lower middle income", "Middle income", "Upper middle income", "High income"))
GDP_CO2Emissions %>%
  filter(`Country Name` == "High income" | `Country Name` == "Upper middle income" | `Country Name` == "Middle income" | `Country Name` == "Lower middle income" | `Country Name` == "Low & middle income" | `Country Name` == "Low income") %>%
  ggplot(aes(x = year)) +
  geom_line(aes(y = GDP_per_cap), color = "green4", size = 1) +
  geom_line(aes(y = CO2_m_tons_per_cap * 1500), color = "red4", size = 1) +
  facet_wrap(~`Country Name`, nrow = 1, labeller = label_wrap_gen(width = 15, multi_line = TRUE)) +
  scale_y_continuous(name = "GDP per capita (current US$)",
                     breaks = c(0, 10000, 20000, 30000, 40000),
                     label = comma,
                     sec.axis = sec_axis(~.x /1500, name = "CO2 emissions (metric tons per capita)")) +
  theme_classic() +
  scale_x_continuous(breaks = c(1990, 2000, 2010, 2019)) +
  theme(axis.title.y = element_text(color = "green4"),
        axis.title.y.right = element_text(color = "red4"),
        panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif", angle = 90, hjust = 1, vjust = 0.5),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif"),
        strip.text.x = element_text(color = "gray10", size = 14, family = "serif"),
        strip.background = element_rect(color = "white", fill = "white")) +
  labs(title = "Evolution of GDP per capita and CO2 emissions in metric tons per capita (Income level)", subtitle = "High-income countries as they get richer in terms of GDP per capita experienced a reduction in CO2 emissions in\nmetric tons per capita.", x = "Year", caption = "Source: World Bank CC BY-4.0")

We can see that high-income countries follow the hypothesis, so we will expect that when other countries move to the high-income group, they'll experience a similar reduction in CO2 emissions in metric tons per capita.

# Plot & visualize data

# labels
first_and_last <- GDP_CO2Emissions %>%
  filter(`Country Name` == "High income", year == 1990 | year == 2019) %>%
  mutate(GDP_per_cap = round(GDP_per_cap, 0),
         CO2_m_tons_per_cap = round(CO2_m_tons_per_cap, 1))

GDP_CO2Emissions %>%
  filter(`Country Name` == "High income") %>%
  ggplot(aes(x = year)) +
  geom_line(aes(y = GDP_per_cap), color = "green4", size = 1) +
  geom_line(aes(y = CO2_m_tons_per_cap * 1500), color = "red4", size = 1) +
  geom_text(data = first_and_last, aes(x = year, y = GDP_per_cap, label = comma(GDP_per_cap)), nudge_y = 2000, color = "green4", size = 5, family = "serif") +
  geom_point(data = first_and_last, aes(x = year, y = GDP_per_cap), color = "green4") +
  geom_text(data = first_and_last, aes(x = year, y = CO2_m_tons_per_cap * 1500, label = CO2_m_tons_per_cap), nudge_y = -2000, color = "red4", size = 5, family = "serif") +
  geom_point(data = first_and_last, aes(x = year, y = CO2_m_tons_per_cap * 1500), color = "red4") +
  scale_y_continuous(name = "GDP per capita (current US$)",
                     breaks = c(0, 10000, 20000, 30000, 40000),
                     label = comma,
                     sec.axis = sec_axis(~.x /1500, name = "CO2 emissions (metric tons per capita)")) +
  coord_cartesian(ylim = c(0, 46500)) +
  theme_classic() +
  scale_x_continuous(breaks = c(1990, 2000, 2010, 2019)) +
  theme(axis.title.y = element_text(color = "green4"),
        axis.title.y.right = element_text(color = "red4"),
        panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif"),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif")) +
  labs(title = "When high-income countries get richer there is a reduction in CO2 emissions per capita", subtitle = "More than double GDP per capita was required over 30 years to achieve a reduction of just over half a metric ton per\ncapita of CO2", x = "Year", caption = "Source: World Bank CC BY-4.0")

By looking at the first and last measurements, we noticed more than double GDP per capita was increased over 30 years while the reduction of just over half a metric ton per capita of CO2 took the same amount of time.

# Plot & visualize data

GDP_CO2Emissions_Regions %>%
  filter(year == 1990 | year == 2000 | year == 2010 | year == 2019, region != is.na(GDP_CO2Emissions_Regions$region)) %>%
  ggplot(aes(x = GDP_per_cap, y = CO2_m_tons_per_cap)) +
  geom_smooth(color = "grey30") +
  facet_wrap(~year, nrow = 1) +
  geom_point(aes(color = region)) +
  theme_classic() +
  scale_x_continuous(breaks = c(0, 75000, 150000),
    label = comma) +
  theme(panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif"),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif"),
        legend.title = element_text(color = "gray10", size = 14, family = "serif", face = "bold"),
        legend.text = element_text(color = "gray30", size = 14, family = "serif"),
        strip.text.x = element_text(color = "gray10", size = 14, family = "serif"),
        strip.background = element_rect(color = "white", fill = "white")) +
  labs(title = "When people get richer, their CO2 emissions in metric tons decrease", subtitle = "Europeans are leading the race to be rich and reduce CO2 emissions in metric tons per capita", y = "CO2 emissions (metric tons per capita)", x = "GDP per capita (current US$)", caption = "Source: World Bank CC BY-4.0") +
  guides(col = guide_legend(title = "Region"))

Overall, the trend the world's individuals have had in the past 30 years is a decrease in CO2 emissions (metric tons per capita) as they got richer.

# Plot & visualize data

GDP_CO2Emissions_Regions %>%
  filter(year == 1990 | year == 2000 | year == 2010 | year == 2019, region != is.na(GDP_CO2Emissions_Regions$region)) %>%
  ggplot(aes(x = GDP_per_cap, y = CO2_m_tons_per_cap)) +
  geom_smooth(method = "loess", color = "grey30") +
  geom_point(aes(color = region)) +
  facet_grid(vars(region), vars(year), scales = "free_y") +
  theme_classic() +
  scale_x_continuous(breaks = c(0, 75000, 150000),
    label = comma) +
  theme(panel.grid.major.y = element_line(color = "gray", linetype = 1),
        plot.title.position = "plot",
        plot.title = element_text(size = 25, family = "serif", color = "black"),
        plot.subtitle = element_text(size = 18, family = "serif", color = "gray30"),
        plot.caption = element_text(hjust = 1, family = "serif", color = "gray30", size = 12),
        axis.title = element_text(color = "gray30", size = 16, family = "serif"),
        axis.text.x = element_text(color = "gray30", size = 14, family = "serif"),
        axis.text.y = element_text(color = "gray30", size = 14, family = "serif"),
        legend.title = element_text(color = "gray10", size = 14, family = "serif", face = "bold"),
        legend.text = element_text(color = "gray30", size = 14, family = "serif"),
        legend.position = "none",
        strip.text.x = element_text(color = "gray10", size = 14, family = "serif"),
        strip.text.y = element_text(color = "gray10", size = 14, family = "serif"),
        strip.background = element_rect(color = "white", fill = "white")) +
  labs(title = "When people get richer, their CO2 emissions in metric tons decrease", subtitle = "At their scale, many countries in Europe and Asia followed by the Americas are reducing CO2 emissions (metric tons\nper capita) as their GDP per capita increases", y = "CO2 emissions (metric tons per capita)", x = "GDP per capita (current US$)", caption = "Source: World Bank CC BY-4.0") +
  guides(col = guide_legend(title = "Region"))

At their scale and on average, individuals in Europe and Asia, followed by those from the Americas, are reducing CO2 emissions (metric tons per capita) as their GDP per capita increases. To further the analysis, each country could be reviewed over time to see how they are doing and if they are following the trend.

Something important to mention is that the per capita decrease in CO2 emissions is not a representation of how a country is doing in terms of its environmental impact and sustainability. Each country varies in population size, so a country that has relatively small CO2 emissions (metric tons per capita) might impact a lot due to a large population.

The analysis we have done had the purpose of allowing us to see a pattern in the behavior of individuals represented by the per capita average metrics we selected instead of a measurement of the countries' impact and efforts.

Conclusions

  • Among those in California that are doing something good for the environment by using battery electric vehicles instead of fuel-based ones, on average CARE about reducing GGE, but the more affluent CARE less than less affluent owners.
  • When distinguished by the highest level of education, we found the most educated owners of BEVs in California consider more important reducing GGE than less educated owners.
  • When looking at the correlation between the household income in thousands and the mean importance of reducing GGE of BEV owners in California grouped by educational level, the wealthiest CARE a little bit less than the less affluent people in their educational level group.
  • Europe and North America show that as people get richer, they DO do something for the environment in terms of reducing CO2 emissions per capita, but the amounts of CO2 emissions (metric tons per capita) are still large compared to other countries. Despite that, the pattern of the hypothesis is seen.
  • Individuals in high-income countries also follow the pattern of a reduction in CO2 emissions as they got richer; However, the reduction is small and slow.
  • Overall, the trend the world's individuals have had in the past 30 years is a decrease in CO2 emissions (metric tons per capita) as they got richer.
  • At their scale and on average, individuals in Europe and Asia, followed by those from the Americas, are reducing CO2 emissions (metric tons per capita) as their GDP per capita increases.

Impact on Actions and Decisions

I think that my visualizations impact actions and decisions. The data visualization on the importance of reducing GGE and income by educational level shows that as people receive higher education they CARE more for the environment, so if we want to foster CARE for the environment we should also support education. The data visualizations about GDP per capita and CO2 emission (metric tons per capita) show us that if we want people to DO more for the environment in terms of reducing CO2 emissions, we might want to take them out of poverty fast so that they can DO their part to reduce their emissions while still prospering.

Techniques

  • Load data: I used temporary files, unzip(), and read from the web using the raw=true approach.

  • Clean & Wrangle: I used filter() and select() to get the data I needed. I used str_replace_all() to edit data and mutate() to compute data. I used factor() to custom order my factor data for better display of data in data visualizations. I used pivot_longer() to tidy data into a long form. I used inner_join() and right_join() to merge three data sets. I used group_by() and summarise() to aggregate data.

  • Plot & Visualize: I used ggplot2 to create all my data visualizations for this project. I used aes() to define the aesthetics of my data visualizations. I used multiple geometries like geom_line(), geom_smooth(), geom_histogram(), geom_point(), and geom_text(). I used facets with facet_wrap() and facet_grid(). I personalized my plots with theme() to my liking. I tried for the first time adding a secondary y-axis.

  • Interpret: I used sumtable() to get summary statistics that helped me to understand and explain better the data sets and my data visualizations.

Appendix

Data Sources

Here are the data sets we used for this project:

  • GDP per capita - current US dollars from The World Bank.
    • This data set contains the GDP per capita in current US dollars for almost all countries with data from 1960 to 2021. According to Investopedia GDP per capita "measures the economic output of a nation per person. It seeks to determine the prosperity of a nation by economic growth per person in that nation. Per capita income measures the amount of money earned per person in a nation."
    • GDP per capita is our socioeconomic metric.
  • CO2 emissions (metric tons per capita) from The World Bank.
    • This data set contains the CO2 emissions (metric tons per capita) for almost all countries with data from 1990 to 2019. The World Bank says: "Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring."
    • CO2 emissions (metric tons per capita) is our environmental metric.
  • Sociodemographic data for battery electric vehicle owning households in California (From NCST Project "Understanding the Early Adopters of Fuel Cell Vehicles") from Scott Hardman of the University of California.
    • This data set contains "Sociodemographic data for fuel cell and battery electric vehicle owning households in California."
    • This data set is from owners of BEVs and FCVs in California, it has their estimated income and a score measuring their perception of the importance of reducing greenhouse gas emissions.