Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Standard Survey Creation #12

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d633275
Begin refactor of standard survey data creation script
jhelsel11 Oct 8, 2018
a3f9458
Refactor standard db creation to method library for standardization.
jhelsel11 Oct 8, 2018
f433934
Update bespoke survey process
jhelsel11 Oct 9, 2018
5696fcf
Update to standard database code
jhelsel11 Oct 9, 2018
2fbb38d
Update build standard database
jhelsel11 Oct 10, 2018
dc5c04a
Completed first pass through Build Standard Database
jhelsel11 Oct 10, 2018
5563ee0
Move standard database path lookup into rmd. New users will need to a…
jhelsel11 Oct 10, 2018
ac440f1
Remove legacy decomposition script
jhelsel11 Oct 10, 2018
00f3853
Create new decomposition rproject, add new author to metadata.
jhelsel11 Oct 10, 2018
d5f6f3b
Refactor decomposition analysis.
jhelsel11 Oct 10, 2018
3d5a745
fix path update
jhelsel11 Oct 10, 2018
a22e0e2
trying to get rid of decomp analysis in this branch
jhelsel11 Oct 10, 2018
8d53617
Merge branch 'refactor_survey' of https://github.com/BayAreaMetro/onb…
jhelsel11 Oct 10, 2018
07f0257
Revert "trying to get rid of decomp analysis in this branch"
jhelsel11 Oct 10, 2018
ac7e7f0
dave pass at making tidy and easier to read
DavidOry Oct 11, 2018
7c63246
added temp compare script
DavidOry Oct 11, 2018
392a247
Resolve Issue - Save in RDS
jhelsel11 Oct 12, 2018
7704659
Create check against levels recorded in survey but dropped from Dicti…
jhelsel11 Oct 12, 2018
43a1822
update standard database.
jhelsel11 Oct 15, 2018
497e8f4
renamed RDS output, fleshed out compare
DavidOry Oct 16, 2018
b34f4ca
start on ac transit to standard
DavidOry Oct 24, 2018
48e635f
initial pass through dictionary
DavidOry Oct 24, 2018
c14d8d7
Merge branch 'add-ac-transit-standard-REBASED' into refactor_survey
jhelsel11 Oct 31, 2018
a0d9a39
put file reads in alphabetical order.
jhelsel11 Oct 31, 2018
12c1415
add AC transit to survey tech csv
jhelsel11 Oct 31, 2018
2f742f3
minor adjustments to order of code.
jhelsel11 Oct 31, 2018
bdda105
Fixed AC Transit worker code.
jhelsel11 Nov 1, 2018
a33460f
Update number_transfers_orig_board and number_transfer_alight_dest fo…
jhelsel11 Nov 1, 2018
b7f67d7
remove shuttle from bespoke survey tech
jhelsel11 Nov 1, 2018
5fe891a
small changes to build standard database
jhelsel11 Nov 1, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions make-uniform/production/Build Standard Database.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,8 @@ user_list <- data.frame(
When adding a new operator, the user must: add the path to the survey data in
the code block below, e.g., `f_bart_survey_path`
```{r file-names}
me <- Sys.getenv("USERNAME")
dir_path <- user_list %>%
filter(user == me) %>%
filter(user == Sys.getenv("USERNAME")) %>%
.$path

f_spatial_to_be_geocoded_path <- paste0(dir_path,
Expand Down Expand Up @@ -108,19 +107,19 @@ f_vta_survey_path <- paste0(dir_path,
"VTA/As CSV/VTA_DRAFTFINAL_20171114 NO POUND OR SINGLE QUOTE.csv")

f_output_rdata_path <- paste0(dir_path,
"_data Standardized/survey_standard.RData")
"_data Standardized/survey_standard.RDS")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhelsel11: When using saveRDS, we should use the RDS extension, so users know that it's an RDS file and not an Rdata file.


f_output_csv_path <- paste0(dir_path,
"_data Standardized/survey_standard.csv")

f_ancillary_output_rdata_path <- paste0(dir_path,
"_data Standardized/ancillary_variables.RData")
"_data Standardized/ancillary_variables.RDS")

f_ancillary_output_csv_path <- paste0(dir_path,
"_data Standardized/ancillary_variables.csv")

f_output_decom_rdata_path <- paste0(dir_path,
"_data Standardized/decomposition/survey_decomposition.RData")
"_data Standardized/decomposition/survey_decomposition.RDS")

f_output_decom_csv_path <- paste0(dir_path,
"_data Standardized/decomposition/survey_decomposition.csv")
Expand Down
28 changes: 21 additions & 7 deletions make-uniform/production/compare_against_previous.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ load("~/GitHub/onboard-surveys/Data and Reports/_data Standardized/survey_standa
previous_df <- survey.standard %>%
rename(unique_ID = Unique_ID)

load("~/GitHub/onboard-surveys/Data and Reports/_data Standardized/survey_standard.Rdata")
current_df <- survey_standard
current_df <- readRDS("~/GitHub/onboard-surveys/Data and Reports/_data Standardized/survey_standard.Rdata")

find_differences <- function(anti_outcomes_df, diffed_df) {

Expand Down Expand Up @@ -47,11 +46,26 @@ find_differences <- function(anti_outcomes_df, diffed_df) {

}

anti_df <- anti_join(previous_df, current_df, by = c("unique_ID"))
diff_df <- find_differences(anti_df, current_df)
# do both ways
anti_previous_df <- anti_join(previous_df, current_df, by = c("unique_ID"))
diff_previous_df <- find_differences(anti_df, current_df)

relevant_df <- diff_df %>%
filter(!(previous_outcome == "missing" & current_outcome == "NA"))
anti_current_df <- anti_join(current_df, previous_df, by = c("unique_ID"))
diff_current_df <- find_differences(anti_current_df, previous_df)

table(thin_df$var_name)
# update the Caltrain IDs and do again
update_current_df <- current_df %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhelsel11: you are correct re: Caltrain bug. When the IDs are updated to remove the leading "S", the previous and current data sets match.

mutate(ID = ifelse(str_detect(ID, "S"), str_replace(ID, "S", ""), ID)) %>%
mutate(unique_ID = paste(ID, operator, survey_year, sep = "---"))

anti_previous_df <- anti_join(previous_df, update_current_df, by = c("unique_ID"))
diff_previous_df <- find_differences(anti_df, current_df)

anti_current_df <- anti_join(update_current_df, previous_df, by = c("unique_ID"))
diff_current_df <- find_differences(anti_current_df, previous_df)

# okay they now match