[traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working #60

yangsophieee · 2023-09-06T09:14:26Z

Currently location-level metadata is overwriting trait-level metadata, e.g. for Test_2023_2 - Alstonia scholaris, leaf_mass_per_area should have basis_of_value as measurement but it's currently expert_score (the value for the location Cape Tribulation).

I think this is occurring in the loop around Line 161 of process.R. It's happening for both long (see Test_2023_4) and wide datasets.

In addition, column metadata at the dataset level is being overwritten by trait-level metadata for Test_2023_2 and Test_2023_4. E.g. in Test_2023_2, Homalanthus novoguineensis should have basis_of_value as model_derived but instead it's overwritten by the trait-level value, measurement.

The text was updated successfully, but these errors were encountered:

ehwenk · 2023-09-06T09:19:55Z

This is actually intentional - we had to code either trait-level or location-level metadata to take precedence over the other and fairly randomly picked location. If you have a good reason to swap it around to trait-level metadata taking precedence, you're welcome to change it.

yangsophieee · 2023-09-06T09:33:26Z

@ehwenk According to the original tests before the big test structure overhaul, the intention was that column metadata (dataset level) > trait metadata > location metadata.
Also with the new functionality of reading in columns from trait level, I guess that's column metadata (trait level) > location metadata > dataset metadata.
Or without columns (i.e. fixed value): trait metadata > location metadata > dataset metadata.

Let me know if you agree with these precedences, and I can change them.

ehwenk · 2023-09-06T09:37:57Z

@yangsophieee I am very happy for you to switch it to trait metadata > location metadata > dataset metadata!

yangsophieee · 2023-09-06T09:39:12Z

@ehwenk thanks! what about the column precedences?

ehwenk · 2023-09-06T10:41:01Z

@yangsophieee I think it probably would always be the same - you shouldn't fill in anything at the trait_level if there is a universal value or column given at the dataset level. Filling in at the trait_level (or location_level) should be a clear signal that this is the correct variable value/column for that trait/location

ehwenk · 2023-10-09T04:54:57Z

@yangsophieee I hadn't thought about the different workflows for fixed values vs values from columns.

For fixed values, we decided values specified in the traits-section overwrites values from the dataset-section; and location-section values overwrite traits-section values. (Although I have no preference actually for the order or traits vs location.)

Meanwhile dataset-level column-specified values should have ultimate precedence - otherwise what does it mean to read in a value from a column if you're going to overwrite it? Then that cell in the data.csv file should be edited (either actually edited, or using custom_R_code). Is it realisitic to add an ifelse that only allows the location-level or trait-level overwriting of values if the dataset-level is a fixed value?

ehwenk · 2023-10-12T21:30:13Z

@yangsophieee

I've just spent a long time staring at Test_2023_4 and Test_2023_2 outputs and am not sure anything needs to be changed. As in, I don't think issue #60 and issue #38 are issues.

What are specific variables, datasets that you think are giving erroneous outcomes?

I also think that I misstated something in my last comment.

In order of information being read in (i.e. lower in this list overwrites items at top):

Read in dataset-level fixed value
Read in dataset-level values from column
Read in traits-level fixed value, over-writing dataset-level values for that trait
Read in traits-level values from column, over-writing all dataset-level values for that trait
Read in location-level fixed value, over-writing all pre-existing values for that location
Read in location-level values from column, over-writing all pre-existing values for that location

As I said before it was arbitrary is 3-4 or 5-6 occurs first, but I don't want to change that right now.

But, I have some (possibly unfounded) concerns with this list of variables in Line 137 of process.R

    vars <- c("basis_of_record", "life_stage", "collection_date", "measurement_remarks", "unit_in", "entity_type",
              "value_type", "basis_of_value", "replicates", "population_id", "individual_id")

I don't think unit_in & replicates should be location-aligned variables. And probably also value_type and basis_of_value. I can see examples where, for 1 trait (or even all traits), they'd be different by location, but then there should be a column that indicates those differences, by location, and that is read in for the specific traits to which it applies (or at the dataset level), but reading in fixed values for "value metadata" for different locations seems like a bit of a sloppy shortcut - you're using the fields that should be about the "locations" to document information that is actually about "sampling methods" (either a dataset or trait sort of metadata). Information like collection_date, basis_of_record and measurement_remarks can be location-level properties.

yangsophieee · 2023-10-13T01:09:30Z

Read in location-level values from column, over-writing all pre-existing values for that location

I'm pretty sure the code doesn't allow you to read in a column of metadata from a location right now. I discussed with @ehwenk and we agree it's not something we need to implement.

yangsophieee · 2023-10-13T01:10:18Z

In terms of precedence of trait vs location metadata, both @ehwenk and I agree that we think trait metadata should have precedence over location metadata. But in terms of urgency this can be fixed later if it's too difficult.

@ehwenk

Addresses #60, checking precedence of metadata fields across various sections of metadata are working correctly (e.g. location > trait metadata) - Added additional testing for checking precedence of metadata - Fixed an error with `dataset_test` where I didn't realise the data variable already uses `process_custom_code` so it was applying it twice - Removed old testing files - @ehwenk and I agree that trait metadata should probably take precedence over location metadata but we will leave this to a later date (trait metadata is read in via `process_parse_data` and location metadata replaces it afterwards in `dataset_process` around Line 141, so you'd have to move the location metadata part into `process_parse_data`, which will presumably require splitting location data into location properties vs the other variables and having them input at different times)

yangsophieee added the bug Something isn't working label Sep 6, 2023

yangsophieee self-assigned this Sep 6, 2023

yangsophieee changed the title ~~Precedence of trait-level metadata over location-level metadata not working~~ Precedence of column-level metadata over trait-level metadata over location-level metadata not working Sep 6, 2023

yangsophieee mentioned this issue Oct 18, 2023

Check precedences of metadata #99

Merged

ehwenk added on hold and removed bug Something isn't working labels Oct 19, 2023

ehwenk removed the on hold label Jul 31, 2024

ehwenk changed the title ~~Precedence of column-level metadata over trait-level metadata over location-level metadata not working~~ [traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working Jul 31, 2024

ehwenk added this to AusTraits Jul 31, 2024

ehwenk moved this to Backlog in AusTraits Jul 31, 2024

ehwenk added the bug Something isn't working label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working #60

[traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working #60

yangsophieee commented Sep 6, 2023 •

edited

Loading

ehwenk commented Sep 6, 2023

yangsophieee commented Sep 6, 2023

ehwenk commented Sep 6, 2023

yangsophieee commented Sep 6, 2023

ehwenk commented Sep 6, 2023

ehwenk commented Oct 9, 2023

ehwenk commented Oct 12, 2023

yangsophieee commented Oct 13, 2023

yangsophieee commented Oct 13, 2023

[traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working #60

[traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working #60

Comments

yangsophieee commented Sep 6, 2023 • edited Loading

ehwenk commented Sep 6, 2023

yangsophieee commented Sep 6, 2023

ehwenk commented Sep 6, 2023

yangsophieee commented Sep 6, 2023

ehwenk commented Sep 6, 2023

ehwenk commented Oct 9, 2023

ehwenk commented Oct 12, 2023

yangsophieee commented Oct 13, 2023

yangsophieee commented Oct 13, 2023

yangsophieee commented Sep 6, 2023 •

edited

Loading