AJ-1959 Expression evaluation in WDS #931

dvoet · 2024-09-10T15:11:07Z

https://broadworkbench.atlassian.net/browse/AJ-1959

dvoet · 2024-09-12T19:32:25Z

service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java

+            // plus one to see if there are more records
+            pageSize + 1,
+            offset);
+    // the query results may have an extra record to see if there are more records
+    // if there are more records, remove the extra record
+    var hasNext =
+        resultsByQuery.values().stream().map(LinkedHashMap::size).anyMatch(s -> s > pageSize);
+    if (hasNext) {
+      removeExtraRecords(pageSize, resultsByQuery);
+    }


I am curious what people think about this hasNext implementation. It seems a little shifty because of the removeExtraRecords thing and reliance on LinkedHashMap. OTOH, it avoids another db call and associated code to get a full count that either needs to happen on each page or in a different api call.

I don't feel strongly about it, and so would like to see what other AJ people think, but it seems that getting a full count is arguably a more intuitive approach, and that might be worth something from a code readability point of view. Whether it's worth another db call / more code, I don't know.

Info I'd want to know:

Is there benefit to returning the actual count to the API caller instead of just a hasNext: true|false?

What's the difference in computation cost to evaluate one extra expression (to test for hasNext) vs. to calculate the actual count?

I'm guessing this approach is slimmer, so I like it

I am going to leave this in. I think the main thing to optimize for here is code burden and I think this is less code and not hard to follow. I don't think performance will be much of a factor either way (although I am sure the current code has the best performance).

dvoet · 2024-09-12T19:33:42Z

...ain/antlr/org/databiosphere/workspacedataservice/expressions/parser/antlr/TerraExpression.g4

+ * ""str" is invalid.
+ */
+
+grammar TerraExpression;


copied from https://github.com/broadinstitute/rawls/blob/develop/core/src/main/antlr4/org/broadinstitute/dsde/rawls/expressions/parser/antlr/TerraExpression.g4

dvoet · 2024-09-12T19:37:31Z

service/src/main/java/org/databiosphere/workspacedataservice/controller/RecordController.java

@@ -117,6 +126,39 @@ public RecordQueryResponse queryForRecords(
        instanceId, recordType, version, searchRequest);
  }

+  @PostMapping("/{instanceid}/records/{version}/{recordType}/{recordId}/evaluateExpressions")
+  public EvaluateExpressionsResponse evaluateExpressions(


response will look like

{ "evaluations": { "<expression name>": <expression value>, ... } }

We're moving away from the use of instance to collection - it's not reflected in these other APIs because they're the last to be updated since that requires releasing a new API version, but for new APIs we should avoid use of the term "instance". Just renaming variables

dvoet · 2024-09-12T19:40:38Z

service/src/main/java/org/databiosphere/workspacedataservice/controller/RecordController.java

+
+  @PostMapping(
+      "/{instanceid}/records/{version}/{recordType}/{recordId}/evaluateExpressionsWithArray")
+  public EvaluateExpressionsWithArrayResponse evaluateExpressionsWithArray(


response will look like

{ "hasNext": boolean, "results": [ {"recordId": string, "evaluations": {"<expression name>": <expression value>, ... } ] }

mspector

There's a lot going on here -- it might be worth a walkthrough to go over the new ANTLR-related classes, or more commentary about how those new classes work for those that aren't already familiar with it from the Rawls implementation, but I wouldn't insist on it for my sake alone. I'll leave that up to you and the other members of AJ who may be more familiar.

mspector · 2024-09-16T18:18:39Z

service/src/main/resources/static/swagger/openapi-docs.yaml

@@ -229,6 +229,74 @@ paths:
            'application/json':
              schema:
                $ref: '#/components/schemas/ErrorResponse'
+  /{instanceid}/records/{v}/{type}/{id}/evaluateExpressions:


@DataBiosphere/analysisjourneys , do we want this to be part of the V0.2 or V1 API?

mspector · 2024-09-16T19:00:36Z

service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java

+            // plus one to see if there are more records
+            pageSize + 1,
+            offset);
+    // the query results may have an extra record to see if there are more records
+    // if there are more records, remove the extra record
+    var hasNext =
+        resultsByQuery.values().stream().map(LinkedHashMap::size).anyMatch(s -> s > pageSize);
+    if (hasNext) {
+      removeExtraRecords(pageSize, resultsByQuery);
+    }


I don't feel strongly about it, and so would like to see what other AJ people think, but it seems that getting a full count is arguably a more intuitive approach, and that might be worth something from a code readability point of view. Whether it's worth another db call / more code, I don't know.

davidangb

See comment details inline. Thank you so much for all the contribution. My main points for review are:

move to v1 APIs, which use generated models and controller interfaces
do we need both APIs or would one do?

For follow-on work after this PR:

add Micrometer Observations (which will inherit tracing when we tackle that)

service/src/main/java/org/databiosphere/workspacedataservice/controller/RecordController.java

davidangb · 2024-09-17T13:57:14Z

service/src/main/java/org/databiosphere/workspacedataservice/dao/RecordDao.java

+   *     LinkedHashMap is used to maintain the order of the records.
+   */
+  public LinkedHashMap<String, List<Record>> queryRelatedRecordsWithArray(
+      UUID collectionId,


where possible, we've been trying to use CollectionId and WorkspaceId instead of raw UUIDs. The codebase is still pretty split and we have lots of raw UUIDs, so take your pick

davidangb · 2024-09-17T13:58:42Z

service/src/main/java/org/databiosphere/workspacedataservice/controller/RecordController.java

+        request.expressionsMap(),
+        request.pageSize(),
+        request.offset());
+  }


TODO: do we need both APIs? Can they be collapsed into one?

davidangb · 2024-09-17T14:07:25Z

service/src/test/java/org/databiosphere/workspacedataservice/dao/RecordDaoTest.java

+
+  /** Test query resulting from an expressions like `this.relationAttr.xxx` */
+  @Test
+  @Transactional


I've been trying to remove @Transactional from our test cases; is it actually needed for any of these? You might just be following existing code patterns which we're trying to clean up :)

davidangb · 2024-09-17T14:12:08Z

service/src/main/resources/static/swagger/openapi-docs.yaml

+                $ref: '#/components/schemas/ErrorResponse'
+  /{instanceid}/records/{v}/{type}/{id}/evaluateExpressionsWithArray:
+    post:
+      summary: Evaluate expressions on array of records


do we need both APIs? Could we collapse them into a single API, potentially treating "single record" as an array of size 1? Or, to rephrase the question: what's the benefit of having both APIs?

I think that the benefit is in clarity of the API. If we collapse them then recordType and recordId are handled differently based on if arrayExpression is present. Also pageSize and offset are unused if arrayExpression is absent.

The benefit of collapsing them for the WDS code is minimal because the code paths diverge early on.

service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java

davidangb · 2024-09-17T15:09:29Z

service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java

+                                "Expected a single value for attribute %s but got %s"
+                                    .formatted(lookup.attribute(), attributeValues));


consider returning the count of values found instead of the values themselves:

Suggested change

"Expected a single value for attribute %s but got %s"

.formatted(lookup.attribute(), attributeValues));

"Expected a single value for attribute %s but got %s"

.formatted(lookup.attribute(), attributeValues.size()));

davidangb · 2024-09-17T15:11:39Z

service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java

+  /**
+   * If the attribute is of the form [recordType]_id, return the record id. Otherwise, return the
+   * attribute value.
+   */
+  private Object lookupAttributeValue(
+      RecordType recordType, AttributeLookup lookup, Record record, List<Relation> relations) {
+    return lookup.attribute().equalsIgnoreCase(getIdName(recordType, relations))
+        ? record.getId()
+        : record.getAttributeValue(lookup.attribute());
+  }
+
+  /** Get the name of the id attribute for the record type in the form [recordType]_id. */
+  private String getIdName(RecordType recordType, List<Relation> relations) {
+    return (relations.isEmpty()
+            ? recordType.getName()
+            : relations.get(relations.size() - 1).relationRecordType().getName())
+        + "_id";
+  }


ah, we probably need this for backwards compatibility, huh? We'd been trying to do away with magic naming like this. I wonder if it's actually needed … if we migrated existing Rawls-powered workspaces to WDS, those tables would end up with a physical column named ${recordType}_id, so we wouldn't need special handling. See also PrimaryKeyDao.getPrimaryKeyColumn().

davidangb · 2024-09-17T15:14:47Z

service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java

+    } else if (attributeValue instanceof JsonNode) {
+      return (JsonNode) attributeValue;


Suggested change

} else if (attributeValue instanceof JsonNode) {

return (JsonNode) attributeValue;

} else if (attributeValue instanceof JsonNode jsonNode) {

return jsonNode;

davidangb · 2024-09-17T15:22:49Z

service/src/main/java/org/databiosphere/workspacedataservice/dao/RecordDao.java

+      throw new IllegalArgumentException("Array relations must not be empty");
+    }
+
+    var rootRecordType = arrayRelations.get(arrayRelations.size() - 1).relationRecordType();


see comment in ExpressionService about validating that all relations have the same root Type

they will not all have the same type and here we care only about the type of the last one because we are only selecting columns for that type

calypsomatic · 2024-09-17T17:58:43Z

There's a lot going on here -- it might be worth a walkthrough to go over the new ANTLR-related classes, or more commentary about how those new classes work for those that aren't already familiar with it from the Rawls implementation, but I wouldn't insist on it for my sake alone. I'll leave that up to you and the other members of AJ who may be more familiar.

I would also love a walkthrough!

davidangb

thank you! Two hopefully not-big comments inline; I'm giving this a proactive 👍 assuming you can tackle those two changes

davidangb · 2024-09-19T16:22:11Z

service/src/main/java/org/databiosphere/workspacedataservice/controller/RecordController.java

    this.recordOrchestratorService = recordOrchestratorService;
    this.permissionService = permissionService;
+    this.expressionService = expressionService;


these changes are now obsolete - RecordController doesn't need ExpressionService any more

davidangb · 2024-09-19T16:30:47Z

service/src/main/java/org/databiosphere/workspacedataservice/expressions/ExpressionService.java

+                    HttpStatus.BAD_REQUEST,
+                    "The relation %s in expression %s does not exist"
+                        .formatted(relationName, arrayRelationExpression)));
+  }


nice, I like this!

davidangb · 2024-09-19T16:45:50Z

service/src/main/resources/static/swagger/apis-v1.yaml

@@ -279,7 +279,66 @@ paths:
            application/json:
              schema:
                $ref: '#/components/schemas/DeleteRecordsResponse'
-
+  /records/v1/{collectionId}/{recordType}/{recordId}/evaluateExpressions:


to expose these APIs in the actual live swagger UIs, you'll need to add $refs to the main YAML file, cwds-api-docs-preview.yaml. See here for an example.

Our OpenAPI yaml is currently pretty complicated. We have:

apis-v1.yaml: source for autogeneration, containing new-style APIs, but never displayed to users directly

openapi-docs.yaml: APIs for the single-tenant data plane, includes $refs to apis-v1.yaml

cwds-api-docs.yaml: APIs for the multi-tenant control plane IN PROD, includes $refs to apis-v1.yaml

cwds-api-docs-preview.yaml: APIs for the multi-tenant control plane IN DEV, includes $refs to apis-v1.yaml

cwds-api-docs.yaml and cwds-api-docs-preview.yaml both exist for a short time while we're in appsec review and have some APIs turned on in dev but not in prod. The -preview will go away soon after we get approvals.

openapi-docs.yaml would go away once we migrate out of the data plane and into the multi-tenant control plane

so, if you want these APIs to appear in the dev env, but not in prod until we get approved, cwds-api-docs-preview.yaml is the right place.

sonarcloud · 2024-09-19T19:08:24Z

Quality Gate passed

Issues
18 New issues
0 Accepted issues

Measures
0 Security Hotspots
93.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

dvoet added 7 commits August 29, 2024 10:58

add antlr and TerraExpression grammar

e1858bc

wip

a4fa7ae

queries work

ce51818

works for single record

6c63afd

api layer

065d6ee

Merge remote-tracking branch 'origin/main' into expressions

6767b82

fixes post merge

5727087

davidangb self-requested a review September 10, 2024 15:45

dvoet added 5 commits September 10, 2024 14:20

spotless

d3c9f0c

controller tests

c92f4d1

swagger

5e629f1

Merge branch 'main' into expressions

2e649ac

implement hasNext

36b2bc8

dvoet changed the title ~~Expressions~~ AJ-1959 Expression evaluation in WDS Sep 12, 2024

dvoet marked this pull request as ready for review September 12, 2024 19:23

dvoet commented Sep 12, 2024

View reviewed changes

Merge branch 'main' into expressions

71b6d94

mspector approved these changes Sep 16, 2024

View reviewed changes

davidangb reviewed Sep 17, 2024

View reviewed changes

dvoet added 3 commits September 18, 2024 15:42

v1 api

a107928

PR

97f3ee1

Merge branch 'main' into expressions

f545058

davidangb approved these changes Sep 19, 2024

View reviewed changes

final PR

7207891

dvoet merged commit 467effe into main Sep 19, 2024
25 checks passed

dvoet deleted the expressions branch September 19, 2024 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AJ-1959 Expression evaluation in WDS #931

AJ-1959 Expression evaluation in WDS #931

dvoet commented Sep 10, 2024 •

edited

Loading

dvoet Sep 12, 2024 •

edited

Loading

mspector Sep 16, 2024

davidangb Sep 17, 2024

dvoet Sep 19, 2024

dvoet Sep 12, 2024

dvoet Sep 12, 2024 •

edited

Loading

calypsomatic Sep 17, 2024

dvoet Sep 12, 2024

mspector left a comment

mspector Sep 16, 2024

calypsomatic Sep 17, 2024

mspector Sep 16, 2024

davidangb left a comment

davidangb Sep 17, 2024

davidangb Sep 17, 2024

davidangb Sep 17, 2024

davidangb Sep 17, 2024

dvoet Sep 18, 2024

davidangb Sep 17, 2024

davidangb Sep 17, 2024

davidangb Sep 17, 2024

davidangb Sep 17, 2024

dvoet Sep 19, 2024

calypsomatic commented Sep 17, 2024

davidangb left a comment

davidangb Sep 19, 2024

davidangb Sep 19, 2024

davidangb Sep 19, 2024

sonarcloud bot commented Sep 19, 2024

		"Expected a single value for attribute %s but got %s"
		.formatted(lookup.attribute(), attributeValues));

		} else if (attributeValue instanceof JsonNode) {
		return (JsonNode) attributeValue;

AJ-1959 Expression evaluation in WDS #931

AJ-1959 Expression evaluation in WDS #931

Conversation

dvoet commented Sep 10, 2024 • edited Loading

dvoet Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dvoet Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mspector left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidangb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calypsomatic commented Sep 17, 2024

davidangb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Sep 19, 2024

Quality Gate passed

dvoet commented Sep 10, 2024 •

edited

Loading

dvoet Sep 12, 2024 •

edited

Loading

dvoet Sep 12, 2024 •

edited

Loading