-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helper needed when setting up standard survey entry #13
Comments
This code block should be removed when helper code is established.
|
@DavidOry, should the helper function assume that every variable in the survey needs to be in the standard database or should I only check that, if at least one level of a variable has been included in the Standard Database, then all levels are included? I'm going to start with the second assumption (since there are so many unused variables) and can modify from there. |
In starting to code the second approach, I believe the current code is missing numerous variable levels. For instance, in looking just at BART, we can see that there are 183 distinct combinations survey_variable and survey_response in the Dictionary for Standard Database.csv that are noncategorical. However, if we filter the the BART survey to look only at the survey_variables in the Dictionary, there are still 290 combinations. The data dictionary is missing generic conversions of 92 actual survey responses. (e.g. 3 for BART_TICKET_CODE, 4 for HH_INCOME_CODE, etc.) Thoughts? |
Look at the function |
I'm not sure I fully understand. Can you please flesh out a full example? |
Okay. So what I think you're saying is that, for example, the dictionary has the following entries for BART_TICKET_CODE --> fare_category crosswalk:
And the BART survey data also includes an entry for I do think you're helper method is useful. But it would be better to, instead of stopping the run, have the method return something useful, like the I think we can wait on this, though, as we'll eventually have you enter an operator and at that time you'll be able to craft a handful of helper methods that will help you create the dictionary file. |
It's not that they are coded as missing. They are currently removed entirely from the resulting dataframe. |
@DavidOry, I think my preference might be something like "Other", with NA reserved for missing variables. It would be helpful in quantifying variables that have responses, though unused ones, versus non-response values. Would there be a problem with this approach? |
What do you mean removed entirely? |
That makes sense. This could be accomplished via the dictionary by explicitly mapping all of the entries to a "other" category. It would make the dictionary much larger, but would also make it a more faithful record of the translation from the survey to the standard database. I've added an Asana task for this. |
I was mistaken. |
@DavidOry Is this issue resolved to your satisfaction? |
Let's leave it open. I'd like us to iterate an adding a few surveys and see what other functions we may need. |
When adding an operator to the Standard Database Builder, a helper method or two is needed to make sure the dictionary file is complete. Previous approach was iterative and hand checked using code snippets.
The text was updated successfully, but these errors were encountered: