Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we include tests of build dataset in dataset_test ? #71

Closed
ehwenk opened this issue Sep 17, 2023 · 3 comments
Closed

Should we include tests of build dataset in dataset_test ? #71

ehwenk opened this issue Sep 17, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@ehwenk
Copy link
Collaborator

ehwenk commented Sep 17, 2023

Right now dataset_test only looks for errors in the metadata file.

To be more helpful for traits.build users would it be a good idea to test both the metadata file and the output?

For instance, right now, dataset_test will indicate if an incorrect substitution has been added to the metadata file, but not if there are categorical values requiring substitutions.

In a sense, many of the errors that are documented in excluded data would also be documented as errors in dataset_test, such as values out of range, unsupported trait values, etc.

(I thought about this as I'm writing the tutorials for traits.build.)

@ehwenk ehwenk added the enhancement New feature or request label Sep 17, 2023
@yangsophieee
Copy link
Collaborator

yangsophieee commented Sep 20, 2023

When I added the test to pivot wider, I added code to build the output in dataset_test, so it will be easy now to test things about the output like you suggested.

@yangsophieee
Copy link
Collaborator

Already messaged @ehwenk about this but putting this here so we don't forget:

Should we really make it so that dataset_test has failures for excluded_data rows? Like what if those rows should legitimately be excluded_data because they are out of range, or don't fit the categorical values in our trait dictionary? Do you think the user should instead manually exclude them in the exclude_observations metadata section? I can imagine that could get tedious.

@yangsophieee
Copy link
Collaborator

@ehwenk has similar concerns. If we add these as tests in dataset_test the unsupported trait values in AusTraits will cause failures and the tests will never pass.

There can't be tests where "failure" is an allowed outcome. It is almost like we need two tests - 1 of things that can't fail; 1 of things that can fail and are just for the user.

So dataset_test is the tests that aren't allowed to fail, and then the dataset_check functions suggested in #137 are allowed to fail and are just for the user to check potential data problems.

Closing issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants