Skip to content

Latest commit

 

History

History
97 lines (72 loc) · 4.39 KB

readme.md

File metadata and controls

97 lines (72 loc) · 4.39 KB

Logo for the Water Point Data Exchange

Water Sources

The data this week comes from Water Point Data Exchange. Note that the data is limited to some core columns as the full dataset is ~300 Mb. Also data is largely filtered down to African sources.

Katy Sill and Adam Kariv are speaking about this data source at csv,conf.

Per WPDX

The amount of water point data being collected is growing rapidly as governments and development partners increasingly monitor water points over time. Without harmonization among these different data sources, the opportunity for learning will be limited, with the true potential of this information remaining untapped.

By establishing a platform for sharing water point data throughout the global water sector, WPDx adds value to the data already being collected. By bringing together diverse data sets, the water sector can establish an unprecedented understanding of water services.

Sharing this data has the potential to improve water access for millions of people as a result of better information available to governments, service providers, researchers, NGOs, and others.

Get the data here

# Get the Data

# Read in with tidytuesdayR package 
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest

# Either ISO-8601 date or year/week works!

tuesdata <- tidytuesdayR::tt_load('2021-05-04')
tuesdata <- tidytuesdayR::tt_load(2021, week = 19)

water <- tuesdata$water

# Or read in the data manually

water <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-05-04/water.csv')

Data Dictionary

water.csv

variable class description
row_id double Unique ID
lat_deg double Latitude
lon_deg double Longitude
report_date character Date water source was reported on
status_id character Identify if any water is available on the day of the visit, recognizing that it may be a limited flow
water_source_clean character Describe the water source (e.g. shallow well, spring, borehole, river, pond, etc.)
water_tech character Describe the system being used to transport the water from the source to the point of collection (e.g. Handpump (include manufacturer, i.e. Afridev, IndiaMark II, Malda,etc.),Kiosk, Tapstand,etc.)
facility_type character Categorized facility type based on JMP definitions
country_name character Country name
install_year double Install year
installer character Installer
pay character Provide the payment amount and basis (e.g. monthly, per jerry can, when broken,etc.). If no amount is provided, the basis can be provided alone. An amount without a payment basis cannot be included
status character Provide a status of the physical/mechanical condition of thewater point

Cleaning Script

library(tidyverse)

raw_df <- read_csv("2021/2021-05-04/Water_Point_Data_Exchange__WPDx-Basic_.csv")

clean_df <- raw_df %>% 
  janitor::clean_names() %>%
  rename_with(~str_remove(.x, "number_")) %>% 
  select(
    row_id:status_id, water_source_clean:subjective_quality,rehab_year,
    install_year,
    -contains("adm"), -contains("rehab"), -contains("fecal"),
    -count, -new_georeferenced_column, -source, -management,-subjective_quality
      ) %>% 
  mutate(
    status_id = case_when(
      status_id == "Yes"~ "y",
      status_id == "No" ~ "n",
      status_id == "Unknown" ~ "u"
    ),
    report_date = str_sub(report_date, 1, 10)
    ) %>% 
  filter(
    !country_name %in% c(
      "NA", "Slovenia", "El Salvador", "Iraq", "Malawi", "Bangladesh",
      "Benin", "India", "Bolivia", "Honduras", "Guatemala", "Nicaragua",
      "Afghanistan", "Nepal", "Haiti", "Mexico", "Indonesia", "Sri Lanka",
      "Dominican Republublic", "Belize", "Myanmar", "Vanuatu", "Jordan"
      )
    ) %>% 
  select(-water_tech) %>% 
  rename(water_source = water_source_clean, water_tech = water_tech_clean)

clean_df %>% 
  write_csv("2021/2021-05-04/water.csv")