Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AJ-1959 Expression evaluation in WDS #931

Merged
merged 17 commits into from
Sep 19, 2024
Merged

AJ-1959 Expression evaluation in WDS #931

merged 17 commits into from
Sep 19, 2024

Conversation

dvoet
Copy link
Contributor

@dvoet dvoet commented Sep 10, 2024

@davidangb davidangb self-requested a review September 10, 2024 15:45
@dvoet dvoet changed the title Expressions AJ-1959 Expression evaluation in WDS Sep 12, 2024
@dvoet dvoet marked this pull request as ready for review September 12, 2024 19:23
Comment on lines +129 to +138
// plus one to see if there are more records
pageSize + 1,
offset);
// the query results may have an extra record to see if there are more records
// if there are more records, remove the extra record
var hasNext =
resultsByQuery.values().stream().map(LinkedHashMap::size).anyMatch(s -> s > pageSize);
if (hasNext) {
removeExtraRecords(pageSize, resultsByQuery);
}
Copy link
Contributor Author

@dvoet dvoet Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious what people think about this hasNext implementation. It seems a little shifty because of the removeExtraRecords thing and reliance on LinkedHashMap. OTOH, it avoids another db call and associated code to get a full count that either needs to happen on each page or in a different api call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it, and so would like to see what other AJ people think, but it seems that getting a full count is arguably a more intuitive approach, and that might be worth something from a code readability point of view. Whether it's worth another db call / more code, I don't know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info I'd want to know:

  • Is there benefit to returning the actual count to the API caller instead of just a hasNext: true|false?
  • What's the difference in computation cost to evaluate one extra expression (to test for hasNext) vs. to calculate the actual count?

I'm guessing this approach is slimmer, so I like it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to leave this in. I think the main thing to optimize for here is code burden and I think this is less code and not hard to follow. I don't think performance will be much of a factor either way (although I am sure the current code has the best performance).

* ""str" is invalid.
*/

grammar TerraExpression;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -117,6 +126,39 @@ public RecordQueryResponse queryForRecords(
instanceId, recordType, version, searchRequest);
}

@PostMapping("/{instanceid}/records/{version}/{recordType}/{recordId}/evaluateExpressions")
public EvaluateExpressionsResponse evaluateExpressions(
Copy link
Contributor Author

@dvoet dvoet Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response will look like

{
  "evaluations": {
    "<expression name>": <expression value>,
    ...
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're moving away from the use of instance to collection - it's not reflected in these other APIs because they're the last to be updated since that requires releasing a new API version, but for new APIs we should avoid use of the term "instance". Just renaming variables


@PostMapping(
"/{instanceid}/records/{version}/{recordType}/{recordId}/evaluateExpressionsWithArray")
public EvaluateExpressionsWithArrayResponse evaluateExpressionsWithArray(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response will look like

{
  "hasNext": boolean,
  "results":  [
    {"recordId": string, "evaluations": {"<expression name>": <expression value>, ... }
  ]
} 

Copy link
Contributor

@mspector mspector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot going on here -- it might be worth a walkthrough to go over the new ANTLR-related classes, or more commentary about how those new classes work for those that aren't already familiar with it from the Rawls implementation, but I wouldn't insist on it for my sake alone. I'll leave that up to you and the other members of AJ who may be more familiar.

@@ -229,6 +229,74 @@ paths:
'application/json':
schema:
$ref: '#/components/schemas/ErrorResponse'
/{instanceid}/records/{v}/{type}/{id}/evaluateExpressions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DataBiosphere/analysisjourneys , do we want this to be part of the V0.2 or V1 API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V1

Comment on lines +129 to +138
// plus one to see if there are more records
pageSize + 1,
offset);
// the query results may have an extra record to see if there are more records
// if there are more records, remove the extra record
var hasNext =
resultsByQuery.values().stream().map(LinkedHashMap::size).anyMatch(s -> s > pageSize);
if (hasNext) {
removeExtraRecords(pageSize, resultsByQuery);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it, and so would like to see what other AJ people think, but it seems that getting a full count is arguably a more intuitive approach, and that might be worth something from a code readability point of view. Whether it's worth another db call / more code, I don't know.

Copy link
Collaborator

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment details inline. Thank you so much for all the contribution. My main points for review are:

  • move to v1 APIs, which use generated models and controller interfaces
  • do we need both APIs or would one do?

For follow-on work after this PR:

  • add Micrometer Observations (which will inherit tracing when we tackle that)

* LinkedHashMap is used to maintain the order of the records.
*/
public LinkedHashMap<String, List<Record>> queryRelatedRecordsWithArray(
UUID collectionId,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where possible, we've been trying to use CollectionId and WorkspaceId instead of raw UUIDs. The codebase is still pretty split and we have lots of raw UUIDs, so take your pick

request.expressionsMap(),
request.pageSize(),
request.offset());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: do we need both APIs? Can they be collapsed into one?


/** Test query resulting from an expressions like `this.relationAttr.xxx` */
@Test
@Transactional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to remove @Transactional from our test cases; is it actually needed for any of these? You might just be following existing code patterns which we're trying to clean up :)

$ref: '#/components/schemas/ErrorResponse'
/{instanceid}/records/{v}/{type}/{id}/evaluateExpressionsWithArray:
post:
summary: Evaluate expressions on array of records
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need both APIs? Could we collapse them into a single API, potentially treating "single record" as an array of size 1? Or, to rephrase the question: what's the benefit of having both APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the benefit is in clarity of the API. If we collapse them then recordType and recordId are handled differently based on if arrayExpression is present. Also pageSize and offset are unused if arrayExpression is absent.

The benefit of collapsing them for the WDS code is minimal because the code paths diverge early on.

Comment on lines 374 to 375
"Expected a single value for attribute %s but got %s"
.formatted(lookup.attribute(), attributeValues));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider returning the count of values found instead of the values themselves:

Suggested change
"Expected a single value for attribute %s but got %s"
.formatted(lookup.attribute(), attributeValues));
"Expected a single value for attribute %s but got %s"
.formatted(lookup.attribute(), attributeValues.size()));

Comment on lines 385 to 402
/**
* If the attribute is of the form [recordType]_id, return the record id. Otherwise, return the
* attribute value.
*/
private Object lookupAttributeValue(
RecordType recordType, AttributeLookup lookup, Record record, List<Relation> relations) {
return lookup.attribute().equalsIgnoreCase(getIdName(recordType, relations))
? record.getId()
: record.getAttributeValue(lookup.attribute());
}

/** Get the name of the id attribute for the record type in the form [recordType]_id. */
private String getIdName(RecordType recordType, List<Relation> relations) {
return (relations.isEmpty()
? recordType.getName()
: relations.get(relations.size() - 1).relationRecordType().getName())
+ "_id";
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, we probably need this for backwards compatibility, huh? We'd been trying to do away with magic naming like this. I wonder if it's actually needed … if we migrated existing Rawls-powered workspaces to WDS, those tables would end up with a physical column named ${recordType}_id, so we wouldn't need special handling. See also PrimaryKeyDao.getPrimaryKeyColumn().

Comment on lines 407 to 408
} else if (attributeValue instanceof JsonNode) {
return (JsonNode) attributeValue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (attributeValue instanceof JsonNode) {
return (JsonNode) attributeValue;
} else if (attributeValue instanceof JsonNode jsonNode) {
return jsonNode;

throw new IllegalArgumentException("Array relations must not be empty");
}

var rootRecordType = arrayRelations.get(arrayRelations.size() - 1).relationRecordType();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment in ExpressionService about validating that all relations have the same root Type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they will not all have the same type and here we care only about the type of the last one because we are only selecting columns for that type

@calypsomatic
Copy link
Contributor

There's a lot going on here -- it might be worth a walkthrough to go over the new ANTLR-related classes, or more commentary about how those new classes work for those that aren't already familiar with it from the Rawls implementation, but I wouldn't insist on it for my sake alone. I'll leave that up to you and the other members of AJ who may be more familiar.

I would also love a walkthrough!

Copy link
Collaborator

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! Two hopefully not-big comments inline; I'm giving this a proactive 👍 assuming you can tackle those two changes

this.recordOrchestratorService = recordOrchestratorService;
this.permissionService = permissionService;
this.expressionService = expressionService;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes are now obsolete - RecordController doesn't need ExpressionService any more

HttpStatus.BAD_REQUEST,
"The relation %s in expression %s does not exist"
.formatted(relationName, arrayRelationExpression)));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I like this!

@@ -279,7 +279,66 @@ paths:
application/json:
schema:
$ref: '#/components/schemas/DeleteRecordsResponse'

/records/v1/{collectionId}/{recordType}/{recordId}/evaluateExpressions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to expose these APIs in the actual live swagger UIs, you'll need to add $refs to the main YAML file, cwds-api-docs-preview.yaml. See here for an example.

Our OpenAPI yaml is currently pretty complicated. We have:

  • apis-v1.yaml: source for autogeneration, containing new-style APIs, but never displayed to users directly
  • openapi-docs.yaml: APIs for the single-tenant data plane, includes $refs to apis-v1.yaml
  • cwds-api-docs.yaml: APIs for the multi-tenant control plane IN PROD, includes $refs to apis-v1.yaml
  • cwds-api-docs-preview.yaml: APIs for the multi-tenant control plane IN DEV, includes $refs to apis-v1.yaml

cwds-api-docs.yaml and cwds-api-docs-preview.yaml both exist for a short time while we're in appsec review and have some APIs turned on in dev but not in prod. The -preview will go away soon after we get approvals.

openapi-docs.yaml would go away once we migrate out of the data plane and into the multi-tenant control plane

so, if you want these APIs to appear in the dev env, but not in prod until we get approved, cwds-api-docs-preview.yaml is the right place.

Copy link

sonarcloud bot commented Sep 19, 2024

@dvoet dvoet merged commit 467effe into main Sep 19, 2024
25 checks passed
@dvoet dvoet deleted the expressions branch September 19, 2024 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants