Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop data validation and QC job #34

Open
nmarchio opened this issue Jan 20, 2022 · 2 comments
Open

Develop data validation and QC job #34

nmarchio opened this issue Jan 20, 2022 · 2 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@nmarchio
Copy link
Member

nmarchio commented Jan 20, 2022

Run a job that checks through the raw data files on S3 and the output files on RDS to answer the following questions:

Question checks (requires resolution of raw data S3 Issue)

  • Return list of questions present in raw data not present in output data
  • Return list of questions present in output data not present in raw data

Response checks (requires resolution of raw data S3 Issue)

  • Return list of responses present in raw data not present in output data
  • Return list of responses present in output data not present in raw data

Time series continuity checks

  • Return question occurrences showing gaps in the time series, end point of the series, and start point of the series. A table in this format: Columns: question variable, week, reported_in_survey (new column with a flag if the question is reported)

Proportion checks

  • Report questions and time periods when the responses do not sum to 100%. For question_type == "Select one" or "Yes / No" summing the response value proportions by variable question ID and crosstab subgroup should give 100%.
  • When the question_type == "Select all" will equal 100% when each response item is summed with -99 response values.

Unexpected value checks

  • Check for missing values like NaN, 0, -99, -88, , etc. Return questions and date when this is the case.

The output of this job can be a log or txt file containing descriptive information written to S3 or emailed in an attachment.

@nmarchio nmarchio added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 20, 2022
@nmarchio
Copy link
Member Author

Link to interface: https://householdpulse.com/food

@nmarchio
Copy link
Member Author

nmarchio commented Mar 18, 2022

@baltierra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants