Helper needed when setting up standard survey entry #13

DavidOry · 2018-10-09T18:14:44Z

When adding an operator to the Standard Database Builder, a helper method or two is needed to make sure the dictionary file is complete. Previous approach was iterative and hand checked using code snippets.

DavidOry · 2018-10-11T14:09:19Z

This code block should be removed when helper code is established.

ref_count <- survey_cat %>% 
  group_by(ID, operator, survey_year, survey_tech, survey_variable) %>% 
  summarise(count = n())

mult_ref_count  <- ref_count %>%
  filter(count > 1)

stopifnot(nrow(mult_ref_count) == 0)

jhelsel11 · 2018-10-12T18:44:22Z

@DavidOry, should the helper function assume that every variable in the survey needs to be in the standard database or should I only check that, if at least one level of a variable has been included in the Standard Database, then all levels are included?

I'm going to start with the second assumption (since there are so many unused variables) and can modify from there.

jhelsel11 · 2018-10-12T21:27:59Z

In starting to code the second approach, I believe the current code is missing numerous variable levels.

For instance, in looking just at BART, we can see that there are 183 distinct combinations survey_variable and survey_response in the Dictionary for Standard Database.csv that are noncategorical. However, if we filter the the BART survey to look only at the survey_variables in the Dictionary, there are still 290 combinations.

The data dictionary is missing generic conversions of 92 actual survey responses. (e.g. 3 for BART_TICKET_CODE, 4 for HH_INCOME_CODE, etc.)

Thoughts?

jhelsel11 · 2018-10-12T21:35:16Z

Look at the function check_dropped_variables for the source.

DavidOry · 2018-10-16T19:09:31Z

In starting to code the second approach, I believe the current code is missing numerous variable levels.
For instance, in looking just at BART, we can see that there are 183 distinct combinations survey_variable and survey_response in the Dictionary for Standard Database.csv that are noncategorical. However, if we filter the the BART survey to look only at the survey_variables in the Dictionary, there are still 290 combinations.
The data dictionary is missing generic conversions of 92 actual survey responses. (e.g. 3 for BART_TICKET_CODE, 4 for HH_INCOME_CODE, etc.)
Thoughts?

I'm not sure I fully understand. Can you please flesh out a full example?

DavidOry · 2018-10-16T19:28:09Z

Okay. So what I think you're saying is that, for example, the dictionary has the following entries for BART_TICKET_CODE --> fare_category crosswalk:

BART | BART_TICKET_CODE | 1 | fare_category | adult
BART | BART_TICKET_CODE | 2 | fare_category | adult
BART | BART_TICKET_CODE | 3 | fare_category | senior
BART | BART_TICKET_CODE | 4 | fare_category | disabled
BART | BART_TICKET_CODE | 5 | fare_category | youth
BART | BART_TICKET_CODE | 6 | fare_category | student
BART | BART_TICKET_CODE | 7 | fare_category | adult

And the BART survey data also includes an entry for BART_TICKET_CODE = 102. So when we do the crosswalk we are coding the fare_category for records with BART_TICKET_CODE = 102 as NA. My memory -- which is fuzzy -- is that this is intentional. By trying to simplify the data, there are going to be categories that we leave behind. For example, BART employees ride free on BART (as on more transit systems) and the BART survey may have had an employee fare category. When we translate the BART data to the standard data, this response is effectively recoded as NA. Which seems right, as it allows the standard survey to remain tidy and, for regional analysis, not much information is lost. The details will remain in the BART survey itself. @shimonisrael: please let us know your thoughts on this if you disagree.

I do think you're helper method is useful. But it would be better to, instead of stopping the run, have the method return something useful, like the missing_variables dataframe.

I think we can wait on this, though, as we'll eventually have you enter an operator and at that time you'll be able to craft a handful of helper methods that will help you create the dictionary file.

jhelsel11 · 2018-10-16T19:39:34Z

It's not that they are coded as missing. They are currently removed entirely from the resulting dataframe.

shimonisrael · 2018-10-16T19:51:49Z

@DavidOry, I think my preference might be something like "Other", with NA reserved for missing variables. It would be helpful in quantifying variables that have responses, though unused ones, versus non-response values. Would there be a problem with this approach?

DavidOry · 2018-10-17T13:48:48Z

It's not that they are coded as missing. They are currently removed entirely from the resulting dataframe.

What do you mean removed entirely?

DavidOry · 2018-10-17T13:52:35Z

@DavidOry, I think my preference might be something like "Other", with NA reserved for missing variables. It would be helpful in quantifying variables that have responses, though unused ones, versus non-response values. Would there be a problem with this approach?

That makes sense. This could be accomplished via the dictionary by explicitly mapping all of the entries to a "other" category. It would make the dictionary much larger, but would also make it a more faithful record of the translation from the survey to the standard database. I've added an Asana task for this.

jhelsel11 · 2018-10-17T16:08:34Z

It's not that they are coded as missing. They are currently removed entirely from the resulting dataframe.

What do you mean removed entirely?

I was mistaken.

jhelsel11 · 2018-11-16T00:59:59Z

@DavidOry Is this issue resolved to your satisfaction?

DavidOry · 2018-11-16T18:44:28Z

Let's leave it open. I'd like us to iterate an adding a few surveys and see what other functions we may need.

DavidOry assigned jhelsel11 Oct 9, 2018

DavidOry added the enhancement label Oct 9, 2018

DavidOry mentioned this issue Oct 9, 2018

Refactor Standard Survey Creation #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helper needed when setting up standard survey entry #13

Helper needed when setting up standard survey entry #13

DavidOry commented Oct 9, 2018

DavidOry commented Oct 11, 2018 •

edited

Loading

jhelsel11 commented Oct 12, 2018 •

edited

Loading

jhelsel11 commented Oct 12, 2018

jhelsel11 commented Oct 12, 2018

DavidOry commented Oct 16, 2018 •

edited

Loading

DavidOry commented Oct 16, 2018

jhelsel11 commented Oct 16, 2018

shimonisrael commented Oct 16, 2018

DavidOry commented Oct 17, 2018 •

edited

Loading

DavidOry commented Oct 17, 2018

jhelsel11 commented Oct 17, 2018

jhelsel11 commented Nov 16, 2018

DavidOry commented Nov 16, 2018

Helper needed when setting up standard survey entry #13

Helper needed when setting up standard survey entry #13

Comments

DavidOry commented Oct 9, 2018

DavidOry commented Oct 11, 2018 • edited Loading

jhelsel11 commented Oct 12, 2018 • edited Loading

jhelsel11 commented Oct 12, 2018

jhelsel11 commented Oct 12, 2018

DavidOry commented Oct 16, 2018 • edited Loading

DavidOry commented Oct 16, 2018

jhelsel11 commented Oct 16, 2018

shimonisrael commented Oct 16, 2018

DavidOry commented Oct 17, 2018 • edited Loading

DavidOry commented Oct 17, 2018

jhelsel11 commented Oct 17, 2018

jhelsel11 commented Nov 16, 2018

DavidOry commented Nov 16, 2018

DavidOry commented Oct 11, 2018 •

edited

Loading

jhelsel11 commented Oct 12, 2018 •

edited

Loading

DavidOry commented Oct 16, 2018 •

edited

Loading

DavidOry commented Oct 17, 2018 •

edited

Loading