Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate task metrics #476

Merged
merged 12 commits into from
Nov 21, 2023
Merged

Conversation

kathy-t
Copy link
Contributor

@kathy-t kathy-t commented Nov 10, 2023

Description
Corresponding PR: dockstore/dockstore#5738

This PR modifies the metrics aggregator so that it aggregates task execution metrics into a workflow-level execution metric so that it can be aggregated with the other user-provided workflow execution metrics and aggregated metrics.

I did some refactoring to AggregationHelper because it was getting hard to keep track of all the aggregation methods for each metric type and there was a lot of repetitive logic. I introduced an Aggregator interface that defines methods that are common when aggregating each metric type and each metric type now has an aggregator classes that implements this interface. A lot of existing code from AggregationHelper was moved into these classes, so my suggestion is to just focus on the task-related changes - I'll try to highlight them in the files.

How a list of task executions is aggregated into a workflow-level execution for each metric type:

  • ExecutionStatus:
    • If all task executions are successful, then the workflow execution is successful
    • If there are failed task executions, which may be either FAILED_RUNTIME_INVALID or FAILED_SEMANTIC_INVALID, the workflow execution status is the most frequent failed status
  • ExecutionTime:
    • This one is trickier and is a best guess because we can't accurately calculate the duration of a workflow execution from durations of task executions. For example: each task took 1 minute to run, but these tasks may have been executed concurrently. We currently have no way of distinguishing this from the durations provided in executionTime.
    • This PR calculates a best guess duration using the earliest and latest dateExecuted from the list of tasks.
    • Ideally, we would record startTime and endTime so that we can calculate this accurately, but I think it's best to leave that for a follow-up ticket because it may affect existing metrics data that we have.
  • CpuRequirements:
    • The workflow-level CPU requirement is the highest CPU requirement from the list of tasks
  • MemoryRequirements:
    • The workflow-level memory requirement is the highest memory requirement from the list of tasks
  • Cost:
    • The workflow-level cost is the sum of costs from each task

Misc.

  • There are failed tool-backup tests which I think is because I updated the dockstore version. Anyone have any ideas about how to resolve it?

Review Instructions
Submit task metrics in QA then run the metrics-aggregator. Confirm that the metrics show up in qa.dockstore.org for your workflow.

Issue
SEAB-5944

Security
If there are any concerns that require extra attention from the security team, highlight them here.

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install in the project that you have modified (until https://ucsc-cgl.atlassian.net/browse/SEAB-5300 adds multi-module support properly)
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • If you are changing dependencies, check with dependabot to ensure you are not introducing new high/critical vulnerabilities
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

}

@Override
public Optional<RunExecution> getWorkflowExecutionFromTaskExecutions(TaskExecutions taskExecutionsForOneWorkflowRun) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new function that aggregates tasks into a workflow execution for the cost metric

}

@Override
public Optional<RunExecution> getWorkflowExecutionFromTaskExecutions(TaskExecutions taskExecutionsForOneWorkflowRun) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new function that aggregates tasks into a workflow execution for the CPU metric

}

@Override
public Optional<RunExecution> getWorkflowExecutionFromTaskExecutions(TaskExecutions taskExecutionsForOneWorkflowRun) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new function that aggregates tasks into a workflow execution for the execution status metric

}

@Override
public Optional<RunExecution> getWorkflowExecutionFromTaskExecutions(TaskExecutions taskExecutionsForOneWorkflowRun) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new function that aggregates tasks into a workflow execution for the execution time metric

}

@Override
public Optional<RunExecution> getWorkflowExecutionFromTaskExecutions(TaskExecutions taskExecutionsForOneWorkflowRun) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new function that aggregates tasks into a workflow execution for the memory metric

Comment on lines 69 to 79
final List<RunExecution> workflowExecutions = new ArrayList<>(allSubmissions.getRunExecutions());

// If task executions are present, calculate the workflow RunExecution containing the overall workflow-level execution time for each list of tasks
if (!allSubmissions.getTaskExecutions().isEmpty()) {
final List<RunExecution> calculatedWorkflowExecutionsFromTasks = allSubmissions.getTaskExecutions().stream()
.map(taskExecutions -> getWorkflowExecutionFromTaskExecutions(taskExecutions))
.filter(Optional::isPresent)
.map(Optional::get)
.toList();
workflowExecutions.addAll(calculatedWorkflowExecutionsFromTasks);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change in the main aggregation function that aggregates task executions into workflow executions. The rest of the code in this function is existing logic that was moved over from each individual getAggregated<Metric> function in AggregationHelper

pom.xml Outdated
@@ -38,7 +38,7 @@

<github.url>scm:git:git@github.com:dockstore/dockstore-support.git</github.url>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<dockstore-core.version>1.15.0-alpha.5</dockstore-core.version>
<dockstore-core.version>1.15.0-SNAPSHOT</dockstore-core.version>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update this to a tag when dockstore/dockstore#5738 is merged

Copy link

codecov bot commented Nov 10, 2023

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (ff68caf) 42.42% compared to head (1bd65af) 52.79%.

Files Patch % Lines
...store/metricsaggregator/helper/CostAggregator.java 82.22% 2 Missing and 6 partials ⚠️
...ricsaggregator/helper/ExecutionTimeAggregator.java 86.53% 1 Missing and 6 partials ⚠️
...csaggregator/helper/ExecutionStatusAggregator.java 81.25% 2 Missing and 4 partials ⚠️
...kstore/metricsaggregator/helper/CpuAggregator.java 87.87% 1 Missing and 3 partials ⚠️
...ore/metricsaggregator/helper/MemoryAggregator.java 87.09% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             develop     #476       +/-   ##
==============================================
+ Coverage      42.42%   52.79%   +10.36%     
- Complexity       171      248       +77     
==============================================
  Files             24       30        +6     
  Lines           1591     1665       +74     
  Branches         131      141       +10     
==============================================
+ Hits             675      879      +204     
+ Misses           878      721      -157     
- Partials          38       65       +27     
Flag Coverage Δ
metricsaggregator 43.42% <87.16%> (+0.99%) ⬆️
toolbackup 42.76% <83.62%> (+10.64%) ⬆️
tooltester 33.39% <83.62%> (+1.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kathy-t kathy-t marked this pull request as ready for review November 10, 2023 21:34
Copy link
Contributor

@coverbeck coverbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are failed tool-backup tests which I think is because I updated the dockstore version. Anyone have any ideas about how to resolve it?

Not off-hand, I'm not familiar with the code. I'd be OK with disabling it for now, with a followup ticket. Or Denis should be back soon and should know more about it.

.max(Date::compareTo);

if (earliestTaskExecutionDate.isPresent() && latestTaskExecutionDate.isPresent()) {
long durationInMs = latestTaskExecutionDate.get().getTime() - earliestTaskExecutionDate.get().getTime();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're trying to calculate entry-level "wall clock" execution time, assuming the task executionTime is also wall clock time, and the execution dates are defined as the start time of task execution, you could compute a lastTaskEndedDate by incorporating the task's executionTime (if it exists), and from that, a more accurate duration as:

 long durationInMs = latestTaskEndedDate.get().getTime() - earliestTaskExecutionDate.get().getTime();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is to compute "wall clock" duration estimates via several different methods: maximum of all execution times, duration as difference between min/max execution dates, etc. If we know that each method produces a lower bound on the duration, take the max at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented your first suggesting by adding the executionTime of the last task to the calculated wall clock execution time

.minimum(newStatistic.getMinimum())
.maximum(newStatistic.getMaximum())
.average(newStatistic.getAverage())
.numberOfDataPointsForAverage(newStatistic.getNumberOfDataPoints()));
Copy link

@svonworl svonworl Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code repeats a lot, possibly we could create a function that does the same thing in the base Metric, so we could shorten to something like:

return Optional.of(new MemoryMetric().set(newStatistic))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted to create a helper function in Statistics that would do this, but it requires that the StatisticMetric subclasses (CpuMetric, MemoryMetric, ExecutionTimeMetric, CostMetric) define their inheritance structure in the openapi.yaml. I took a crack at this and it resulted in some weird side effects. It felt a little risky pursuing this any further since we're so close to the release of 1.15 and I didn't want to accidentally break anything.

Perhaps it can be an enhancement for 1.16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the changes that were needed to get the toolbackup tests to pass, FYI @denis-yuen

@denis-yuen
Copy link
Member

SonarCloud Quality Gate failed. Quality Gate failed

Think these are trivial but worth it if going in for something else

List<Cost> taskCosts = taskExecutions.stream()
.map(RunExecution::getCost)
.toList();
boolean containsMalformedCurrencies = taskCosts.stream().anyMatch(cost -> !isValidCurrencyCode(cost.getCurrency()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue yet since we don't have examples of cost data yet. But could be follow-up issue, would it be an improvement not to skip the total cost if I understand properly if there are any malformed currencies but to provide a total of the properly formed currencies with an asterisk or warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, it might be ok if this is just ignoring one workflow run with malformed currencies and not other runs. (which I think is the case)

So maybe, just confirm my understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be an improvement not to skip the total cost if I understand properly if there are any malformed currencies but to provide a total of the properly formed currencies with an asterisk or warning?

Hmm, in this function, I'm not sure if it would be worth it since these tasks make up one workflow execution.

On the other hand, it might be ok if this is just ignoring one workflow run with malformed currencies and not other runs. (which I think is the case)

In this function where it aggregates a list of task executions metrics belonging to a single workflow execution into one workflow-level execution metrics, your understanding is correct. It will just ignore the one set of task executions beloning to one workflow run.

However, in the function below where we get the aggregated metric from workflow executions, we don't aggregate the metric if one of the executions is malformed. This was an existing pattern prior to this PR and also applies to the execution time metric. It sounds like we should instead ignore the one malformed workflow run and aggregate the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (earliestTaskExecutionDate.isPresent() && latestTaskExecutionDate.isPresent() && latestTaskExecuted.isPresent()) {
// Execution dates are the start dates, calculate a rough duration from the execution dates of the earliest and latest tasks
long durationInMs = latestTaskExecutionDate.get().getTime() - earliestTaskExecutionDate.get().getTime();
Duration duration = Duration.of(durationInMs, ChronoUnit.MILLIS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I can't imagine people would care much about anything more granular than minutes. Assuming the most granular billing is per minute.

That said, this probably gets converted for display anyway

@@ -228,6 +229,11 @@
<groupId>org.glassfish.hk2</groupId>
<artifactId>hk2-api</artifactId>
</dependency>
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is one of the artifacts that is removed in Java 11
https://stackoverflow.com/questions/52502189/java-11-package-javax-xml-bind-does-not-exist

I think a follow-up ticket should update this to jakarta but probably not a high priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

sonarcloud bot commented Nov 21, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@kathy-t kathy-t merged commit f23a0a5 into develop Nov 21, 2023
10 of 11 checks passed
@kathy-t kathy-t deleted the feature/seab-5944/aggregate-task-metrics branch November 21, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants