Should we include tests of build dataset in `dataset_test` ? #71

ehwenk · 2023-09-17T01:49:51Z

Right now dataset_test only looks for errors in the metadata file.

To be more helpful for traits.build users would it be a good idea to test both the metadata file and the output?

For instance, right now, dataset_test will indicate if an incorrect substitution has been added to the metadata file, but not if there are categorical values requiring substitutions.

In a sense, many of the errors that are documented in excluded data would also be documented as errors in dataset_test, such as values out of range, unsupported trait values, etc.

(I thought about this as I'm writing the tutorials for traits.build.)

The text was updated successfully, but these errors were encountered:

yangsophieee · 2023-09-20T02:37:09Z

When I added the test to pivot wider, I added code to build the output in dataset_test, so it will be easy now to test things about the output like you suggested.

yangsophieee · 2023-10-27T09:41:08Z

Already messaged @ehwenk about this but putting this here so we don't forget:

Should we really make it so that dataset_test has failures for excluded_data rows? Like what if those rows should legitimately be excluded_data because they are out of range, or don't fit the categorical values in our trait dictionary? Do you think the user should instead manually exclude them in the exclude_observations metadata section? I can imagine that could get tedious.

yangsophieee · 2023-11-23T03:27:05Z

@ehwenk has similar concerns. If we add these as tests in dataset_test the unsupported trait values in AusTraits will cause failures and the tests will never pass.

There can't be tests where "failure" is an allowed outcome. It is almost like we need two tests - 1 of things that can't fail; 1 of things that can fail and are just for the user.

So dataset_test is the tests that aren't allowed to fail, and then the dataset_check functions suggested in #137 are allowed to fail and are just for the user to check potential data problems.

Closing issue now.

ehwenk added the enhancement New feature or request label Sep 17, 2023

ehwenk added the dataset test label Oct 16, 2023

ehwenk mentioned this issue Nov 22, 2023

[traits.build adding studies functions] Better options for checking data quality #137

Open

yangsophieee closed this as completed Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we include tests of build dataset in `dataset_test` ? #71

Should we include tests of build dataset in `dataset_test` ? #71

ehwenk commented Sep 17, 2023

yangsophieee commented Sep 20, 2023 •

edited

Loading

yangsophieee commented Oct 27, 2023

yangsophieee commented Nov 23, 2023

Should we include tests of build dataset in dataset_test ? #71

Should we include tests of build dataset in dataset_test ? #71

Comments

ehwenk commented Sep 17, 2023

yangsophieee commented Sep 20, 2023 • edited Loading

yangsophieee commented Oct 27, 2023

yangsophieee commented Nov 23, 2023

Should we include tests of build dataset in `dataset_test` ? #71

Should we include tests of build dataset in `dataset_test` ? #71

yangsophieee commented Sep 20, 2023 •

edited

Loading