Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute with repeated data type. #991

Merged
merged 21 commits into from
Sep 13, 2024
Merged

Conversation

marikomedlock
Copy link
Collaborator

  • This is needed for the aouCT/variant entity. The prep_vat table that the entity is based on has string array columns.
  • Updated indexing jobs to handle repeated attribute types: ValidateDataTypes, CreateEntityMain, WriteEntityLevelDisplayHints, WriteTextSearchField.
  • Query engine & OpenAPI changes
    • Select repeated attribute field (all values)
    • Select hints for repeated attribute field (each value separate hint)
    • Group by repeated attribute field (contains value)
    • Filter on repeated attribute field (contains value)

type: array
items:
$ref: "#/components/schemas/Literal"
isRepeatedValue:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? Doesn't the existence of the repeatedValue (or it being non-empty) already indicate this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to distinguish an empty array from a non-repeated/scalar value. Some of the rows in the variant table have empty arrays.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there potentially be other types as well and this should be an enum? Should value be deprecated and replaced with values instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have plans for any other types, though if they come up then an enum certainly sounds like a good choice to consider.

I'm happy to remove value here and just always populate a single element of values instead -- would that be easier for you? I didn't do it initially because it wouldn't be backwards compatible, so I figured you'd need to make a bunch of changes on the frontend to accommodate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, we decided to:

  • Always populate the values array, so the UI can move over to always using that, without breaking backwards compatibility immediately.
  • Continue populating the is repeated flag, just in case the UI hits a case where it needs to distinguish between a single element array and a non-repeated value. If it turns out the UI doesn't need to distinguish between those two cases, we can remove this flag from OpenAPI in the future.

if (valueDisplay == null) {
return apiObject;
} else if (valueDisplay.isRepeatedValue()) {
return apiObject
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the existence of a display field not independent of whether the value is an array?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it's impossible to have both a display string and a repeated value. In the future, we could support that though, or maybe it would be better to support an array of value-display pairs, not sure. For the variant entity, we only have arrays of strings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's what I'm wondering and trying to see if we're baking in the way variant search uses arrays the right amount. The array of values as used here seems like it would map to a single display string. Presumably this works differently when a DisplayValue object is returned as part of an enum hint and there wouldn't be repeated values there, but maybe that's not always true? Should some places have a list of ValueDisplays instead of a single one? Should the array be inside Literal instead? Maybe there are two versions, this isn't a general array but specifically support for a list of repeated, individual items, and there could exist another where the array is treated as a single unit?

I don't have a particular proposal and I'm just trying to understand the model and how it will work going forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the backend (indexing & query engine) does not support repeated values that have both a value & display component. I think that would be a good enhancement, but it's not needed for variant search and we don't have any other immediate use cases, so in the interest of time I decided to tackle this smaller change first.

I don't think array should be inside literal, because then we'd need a new data type for each array (e.g. string-array, int64-array).

For hints, we don't return repeated values. The indexing calculates counts considering each array element separately. e.g. If a condition could be included in >1 vocabulary (condition.vocabulary is a repeated attribute), then we count the condition once per vocabulary. The returned hints are still vocabulary-count pairs, not array of vocabularies-count pairs.

Currently, the only time the API could return a list of values is when fetching an attribute in a list query. I think we may want to expand this in the future, though probably not all at once here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, we decided that in the future it might be useful to support:

  • A single display string for an array type attribute. The backend doesn't support that now, but the existing definition of the ValueDisplay OpenAPI object would support that.
  • A list of related entity instances. This would support the case where there is more than just a list of "tags" (e.g. a list of locations, each of which has an id, a display name, and maybe also a description). Supporting this larger use case obviates the need to support a list of value-display pairs here, since that would be possible with the related entity instances, with each instance returning two attributes. As a general rule, there are 2 cases we'd like to support: "tagging" where it's just a list of values (could be strings or ints or any other data type), and "related instances" where it's more than a list values (could be one or more additional pieces of information per element).

@marikomedlock marikomedlock merged commit a4e913d into main Sep 13, 2024
8 checks passed
@marikomedlock marikomedlock deleted the mm-repeated-data-type branch September 13, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants