Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working #60

Open
yangsophieee opened this issue Sep 6, 2023 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@yangsophieee
Copy link
Collaborator

yangsophieee commented Sep 6, 2023

Currently location-level metadata is overwriting trait-level metadata, e.g. for Test_2023_2 - Alstonia scholaris, leaf_mass_per_area should have basis_of_value as measurement but it's currently expert_score (the value for the location Cape Tribulation).

I think this is occurring in the loop around Line 161 of process.R. It's happening for both long (see Test_2023_4) and wide datasets.

In addition, column metadata at the dataset level is being overwritten by trait-level metadata for Test_2023_2 and Test_2023_4. E.g. in Test_2023_2, Homalanthus novoguineensis should have basis_of_value as model_derived but instead it's overwritten by the trait-level value, measurement.

@yangsophieee yangsophieee added the bug Something isn't working label Sep 6, 2023
@yangsophieee yangsophieee self-assigned this Sep 6, 2023
@ehwenk
Copy link
Collaborator

ehwenk commented Sep 6, 2023

This is actually intentional - we had to code either trait-level or location-level metadata to take precedence over the other and fairly randomly picked location. If you have a good reason to swap it around to trait-level metadata taking precedence, you're welcome to change it.

@yangsophieee yangsophieee changed the title Precedence of trait-level metadata over location-level metadata not working Precedence of column-level metadata over trait-level metadata over location-level metadata not working Sep 6, 2023
@yangsophieee
Copy link
Collaborator Author

@ehwenk According to the original tests before the big test structure overhaul, the intention was that column metadata (dataset level) > trait metadata > location metadata.
Also with the new functionality of reading in columns from trait level, I guess that's column metadata (trait level) > location metadata > dataset metadata.
Or without columns (i.e. fixed value): trait metadata > location metadata > dataset metadata.

Let me know if you agree with these precedences, and I can change them.

@ehwenk
Copy link
Collaborator

ehwenk commented Sep 6, 2023

@yangsophieee I am very happy for you to switch it to trait metadata > location metadata > dataset metadata!

@yangsophieee
Copy link
Collaborator Author

@ehwenk thanks! what about the column precedences?

@ehwenk
Copy link
Collaborator

ehwenk commented Sep 6, 2023

@yangsophieee I think it probably would always be the same - you shouldn't fill in anything at the trait_level if there is a universal value or column given at the dataset level. Filling in at the trait_level (or location_level) should be a clear signal that this is the correct variable value/column for that trait/location

@ehwenk
Copy link
Collaborator

ehwenk commented Oct 9, 2023

@yangsophieee I hadn't thought about the different workflows for fixed values vs values from columns.

For fixed values, we decided values specified in the traits-section overwrites values from the dataset-section; and location-section values overwrite traits-section values. (Although I have no preference actually for the order or traits vs location.)

Meanwhile dataset-level column-specified values should have ultimate precedence - otherwise what does it mean to read in a value from a column if you're going to overwrite it? Then that cell in the data.csv file should be edited (either actually edited, or using custom_R_code). Is it realisitic to add an ifelse that only allows the location-level or trait-level overwriting of values if the dataset-level is a fixed value?

@ehwenk
Copy link
Collaborator

ehwenk commented Oct 12, 2023

@yangsophieee

I've just spent a long time staring at Test_2023_4 and Test_2023_2 outputs and am not sure anything needs to be changed. As in, I don't think issue #60 and issue #38 are issues.

What are specific variables, datasets that you think are giving erroneous outcomes?

I also think that I misstated something in my last comment.

In order of information being read in (i.e. lower in this list overwrites items at top):

  1. Read in dataset-level fixed value
  2. Read in dataset-level values from column
  3. Read in traits-level fixed value, over-writing dataset-level values for that trait
  4. Read in traits-level values from column, over-writing all dataset-level values for that trait
  5. Read in location-level fixed value, over-writing all pre-existing values for that location
  6. Read in location-level values from column, over-writing all pre-existing values for that location

As I said before it was arbitrary is 3-4 or 5-6 occurs first, but I don't want to change that right now.

But, I have some (possibly unfounded) concerns with this list of variables in Line 137 of process.R

    vars <- c("basis_of_record", "life_stage", "collection_date", "measurement_remarks", "unit_in", "entity_type",
              "value_type", "basis_of_value", "replicates", "population_id", "individual_id")

I don't think unit_in & replicates should be location-aligned variables. And probably also value_type and basis_of_value. I can see examples where, for 1 trait (or even all traits), they'd be different by location, but then there should be a column that indicates those differences, by location, and that is read in for the specific traits to which it applies (or at the dataset level), but reading in fixed values for "value metadata" for different locations seems like a bit of a sloppy shortcut - you're using the fields that should be about the "locations" to document information that is actually about "sampling methods" (either a dataset or trait sort of metadata). Information like collection_date, basis_of_record and measurement_remarks can be location-level properties.

@yangsophieee
Copy link
Collaborator Author

  1. Read in location-level values from column, over-writing all pre-existing values for that location

I'm pretty sure the code doesn't allow you to read in a column of metadata from a location right now. I discussed with @ehwenk and we agree it's not something we need to implement.

@yangsophieee
Copy link
Collaborator Author

In terms of precedence of trait vs location metadata, both @ehwenk and I agree that we think trait metadata should have precedence over location metadata. But in terms of urgency this can be fixed later if it's too difficult.

yangsophieee added a commit that referenced this issue Oct 18, 2023
Addresses #60, checking precedence of metadata fields across various sections of metadata are working correctly (e.g. location > trait metadata)
- Added additional testing for checking precedence of metadata
- Fixed an error with `dataset_test` where I didn't realise the data variable already uses `process_custom_code` so it was applying it twice
- Removed old testing files
- @ehwenk and I agree that trait metadata should probably take precedence over location metadata but we will leave this to a later date (trait metadata is read in via `process_parse_data` and location metadata replaces it afterwards in `dataset_process` around Line 141, so you'd have to move the location metadata part into `process_parse_data`, which will presumably require splitting location data into location properties vs the other variables and having them input at different times)
@ehwenk ehwenk added on hold and removed bug Something isn't working labels Oct 19, 2023
@ehwenk ehwenk removed the on hold label Jul 31, 2024
@ehwenk ehwenk changed the title Precedence of column-level metadata over trait-level metadata over location-level metadata not working [traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working Jul 31, 2024
@ehwenk ehwenk added this to AusTraits Jul 31, 2024
@ehwenk ehwenk moved this to Backlog in AusTraits Jul 31, 2024
@ehwenk ehwenk added the bug Something isn't working label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

2 participants