-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AJ-1959 Expression evaluation in WDS #931
Conversation
// plus one to see if there are more records | ||
pageSize + 1, | ||
offset); | ||
// the query results may have an extra record to see if there are more records | ||
// if there are more records, remove the extra record | ||
var hasNext = | ||
resultsByQuery.values().stream().map(LinkedHashMap::size).anyMatch(s -> s > pageSize); | ||
if (hasNext) { | ||
removeExtraRecords(pageSize, resultsByQuery); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious what people think about this hasNext
implementation. It seems a little shifty because of the removeExtraRecords
thing and reliance on LinkedHashMap
. OTOH, it avoids another db call and associated code to get a full count that either needs to happen on each page or in a different api call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about it, and so would like to see what other AJ people think, but it seems that getting a full count is arguably a more intuitive approach, and that might be worth something from a code readability point of view. Whether it's worth another db call / more code, I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Info I'd want to know:
- Is there benefit to returning the actual count to the API caller instead of just a
hasNext: true|false
? - What's the difference in computation cost to evaluate one extra expression (to test for hasNext) vs. to calculate the actual count?
I'm guessing this approach is slimmer, so I like it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to leave this in. I think the main thing to optimize for here is code burden and I think this is less code and not hard to follow. I don't think performance will be much of a factor either way (although I am sure the current code has the best performance).
* ""str" is invalid. | ||
*/ | ||
|
||
grammar TerraExpression; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -117,6 +126,39 @@ public RecordQueryResponse queryForRecords( | |||
instanceId, recordType, version, searchRequest); | |||
} | |||
|
|||
@PostMapping("/{instanceid}/records/{version}/{recordType}/{recordId}/evaluateExpressions") | |||
public EvaluateExpressionsResponse evaluateExpressions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response will look like
{
"evaluations": {
"<expression name>": <expression value>,
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're moving away from the use of instance
to collection
- it's not reflected in these other APIs because they're the last to be updated since that requires releasing a new API version, but for new APIs we should avoid use of the term "instance". Just renaming variables
|
||
@PostMapping( | ||
"/{instanceid}/records/{version}/{recordType}/{recordId}/evaluateExpressionsWithArray") | ||
public EvaluateExpressionsWithArrayResponse evaluateExpressionsWithArray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response will look like
{
"hasNext": boolean,
"results": [
{"recordId": string, "evaluations": {"<expression name>": <expression value>, ... }
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot going on here -- it might be worth a walkthrough to go over the new ANTLR-related classes, or more commentary about how those new classes work for those that aren't already familiar with it from the Rawls implementation, but I wouldn't insist on it for my sake alone. I'll leave that up to you and the other members of AJ who may be more familiar.
@@ -229,6 +229,74 @@ paths: | |||
'application/json': | |||
schema: | |||
$ref: '#/components/schemas/ErrorResponse' | |||
/{instanceid}/records/{v}/{type}/{id}/evaluateExpressions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DataBiosphere/analysisjourneys , do we want this to be part of the V0.2 or V1 API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V1
// plus one to see if there are more records | ||
pageSize + 1, | ||
offset); | ||
// the query results may have an extra record to see if there are more records | ||
// if there are more records, remove the extra record | ||
var hasNext = | ||
resultsByQuery.values().stream().map(LinkedHashMap::size).anyMatch(s -> s > pageSize); | ||
if (hasNext) { | ||
removeExtraRecords(pageSize, resultsByQuery); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about it, and so would like to see what other AJ people think, but it seems that getting a full count is arguably a more intuitive approach, and that might be worth something from a code readability point of view. Whether it's worth another db call / more code, I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment details inline. Thank you so much for all the contribution. My main points for review are:
- move to v1 APIs, which use generated models and controller interfaces
- do we need both APIs or would one do?
For follow-on work after this PR:
- add Micrometer
Observation
s (which will inherit tracing when we tackle that)
service/src/main/java/org/databiosphere/workspacedataservice/controller/RecordController.java
Outdated
Show resolved
Hide resolved
* LinkedHashMap is used to maintain the order of the records. | ||
*/ | ||
public LinkedHashMap<String, List<Record>> queryRelatedRecordsWithArray( | ||
UUID collectionId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where possible, we've been trying to use CollectionId
and WorkspaceId
instead of raw UUIDs. The codebase is still pretty split and we have lots of raw UUIDs, so take your pick
request.expressionsMap(), | ||
request.pageSize(), | ||
request.offset()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: do we need both APIs? Can they be collapsed into one?
|
||
/** Test query resulting from an expressions like `this.relationAttr.xxx` */ | ||
@Test | ||
@Transactional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to remove @Transactional
from our test cases; is it actually needed for any of these? You might just be following existing code patterns which we're trying to clean up :)
$ref: '#/components/schemas/ErrorResponse' | ||
/{instanceid}/records/{v}/{type}/{id}/evaluateExpressionsWithArray: | ||
post: | ||
summary: Evaluate expressions on array of records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need both APIs? Could we collapse them into a single API, potentially treating "single record" as an array of size 1? Or, to rephrase the question: what's the benefit of having both APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the benefit is in clarity of the API. If we collapse them then recordType
and recordId
are handled differently based on if arrayExpression
is present. Also pageSize
and offset
are unused if arrayExpression
is absent.
The benefit of collapsing them for the WDS code is minimal because the code paths diverge early on.
service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java
Show resolved
Hide resolved
"Expected a single value for attribute %s but got %s" | ||
.formatted(lookup.attribute(), attributeValues)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider returning the count of values found instead of the values themselves:
"Expected a single value for attribute %s but got %s" | |
.formatted(lookup.attribute(), attributeValues)); | |
"Expected a single value for attribute %s but got %s" | |
.formatted(lookup.attribute(), attributeValues.size())); |
/** | ||
* If the attribute is of the form [recordType]_id, return the record id. Otherwise, return the | ||
* attribute value. | ||
*/ | ||
private Object lookupAttributeValue( | ||
RecordType recordType, AttributeLookup lookup, Record record, List<Relation> relations) { | ||
return lookup.attribute().equalsIgnoreCase(getIdName(recordType, relations)) | ||
? record.getId() | ||
: record.getAttributeValue(lookup.attribute()); | ||
} | ||
|
||
/** Get the name of the id attribute for the record type in the form [recordType]_id. */ | ||
private String getIdName(RecordType recordType, List<Relation> relations) { | ||
return (relations.isEmpty() | ||
? recordType.getName() | ||
: relations.get(relations.size() - 1).relationRecordType().getName()) | ||
+ "_id"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, we probably need this for backwards compatibility, huh? We'd been trying to do away with magic naming like this. I wonder if it's actually needed … if we migrated existing Rawls-powered workspaces to WDS, those tables would end up with a physical column named ${recordType}_id
, so we wouldn't need special handling. See also PrimaryKeyDao.getPrimaryKeyColumn()
.
} else if (attributeValue instanceof JsonNode) { | ||
return (JsonNode) attributeValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (attributeValue instanceof JsonNode) { | |
return (JsonNode) attributeValue; | |
} else if (attributeValue instanceof JsonNode jsonNode) { | |
return jsonNode; |
throw new IllegalArgumentException("Array relations must not be empty"); | ||
} | ||
|
||
var rootRecordType = arrayRelations.get(arrayRelations.size() - 1).relationRecordType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment in ExpressionService about validating that all relations have the same root Type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they will not all have the same type and here we care only about the type of the last one because we are only selecting columns for that type
I would also love a walkthrough! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! Two hopefully not-big comments inline; I'm giving this a proactive 👍 assuming you can tackle those two changes
this.recordOrchestratorService = recordOrchestratorService; | ||
this.permissionService = permissionService; | ||
this.expressionService = expressionService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes are now obsolete - RecordController
doesn't need ExpressionService
any more
HttpStatus.BAD_REQUEST, | ||
"The relation %s in expression %s does not exist" | ||
.formatted(relationName, arrayRelationExpression))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, I like this!
@@ -279,7 +279,66 @@ paths: | |||
application/json: | |||
schema: | |||
$ref: '#/components/schemas/DeleteRecordsResponse' | |||
|
|||
/records/v1/{collectionId}/{recordType}/{recordId}/evaluateExpressions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to expose these APIs in the actual live swagger UIs, you'll need to add $ref
s to the main YAML file, cwds-api-docs-preview.yaml. See here for an example.
Our OpenAPI yaml is currently pretty complicated. We have:
- apis-v1.yaml: source for autogeneration, containing new-style APIs, but never displayed to users directly
- openapi-docs.yaml: APIs for the single-tenant data plane, includes $refs to apis-v1.yaml
- cwds-api-docs.yaml: APIs for the multi-tenant control plane IN PROD, includes $refs to apis-v1.yaml
- cwds-api-docs-preview.yaml: APIs for the multi-tenant control plane IN DEV, includes $refs to apis-v1.yaml
cwds-api-docs.yaml and cwds-api-docs-preview.yaml both exist for a short time while we're in appsec review and have some APIs turned on in dev but not in prod. The -preview will go away soon after we get approvals.
openapi-docs.yaml would go away once we migrate out of the data plane and into the multi-tenant control plane
so, if you want these APIs to appear in the dev env, but not in prod until we get approved, cwds-api-docs-preview.yaml is the right place.
Quality Gate passedIssues Measures |
https://broadworkbench.atlassian.net/browse/AJ-1959