Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Terra metrics #478

Merged
merged 20 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions THIRD-PARTY-LICENSES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -131,10 +131,10 @@ Lists of 417 third-party dependencies.
(The Apache Software License, Version 2.0) docker-java-core (com.github.docker-java:docker-java-core:3.3.0 - https://github.com/docker-java/docker-java)
(The Apache Software License, Version 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.3.0 - https://github.com/docker-java/docker-java)
(The Apache Software License, Version 2.0) docker-java-transport-httpclient5 (com.github.docker-java:docker-java-transport-httpclient5:3.3.0 - https://github.com/docker-java/docker-java)
(Apache Software License, Version 2.0) dockstore-common (io.dockstore:dockstore-common:1.15.0-alpha.13 - no url defined)
(Apache Software License, Version 2.0) dockstore-integration-testing (io.dockstore:dockstore-integration-testing:1.15.0-alpha.13 - no url defined)
(Apache Software License, Version 2.0) dockstore-language-plugin-parent (io.dockstore:dockstore-language-plugin-parent:1.15.0-alpha.13 - no url defined)
(Apache Software License, Version 2.0) dockstore-webservice (io.dockstore:dockstore-webservice:1.15.0-alpha.13 - no url defined)
(Apache Software License, Version 2.0) dockstore-common (io.dockstore:dockstore-common:1.15.0-SNAPSHOT - no url defined)
denis-yuen marked this conversation as resolved.
Show resolved Hide resolved
(Apache Software License, Version 2.0) dockstore-integration-testing (io.dockstore:dockstore-integration-testing:1.15.0-SNAPSHOT - no url defined)
(Apache Software License, Version 2.0) dockstore-language-plugin-parent (io.dockstore:dockstore-language-plugin-parent:1.15.0-SNAPSHOT - no url defined)
(Apache Software License, Version 2.0) dockstore-webservice (io.dockstore:dockstore-webservice:1.15.0-SNAPSHOT - no url defined)
(Apache License 2.0) Dropwizard (io.dropwizard:dropwizard-core:4.0.2 - http://www.dropwizard.io/4.0.2/dropwizard-bom/dropwizard-dependencies/dropwizard-parent/dropwizard-core)
(Apache License 2.0) Dropwizard Asset Bundle (io.dropwizard:dropwizard-assets:4.0.2 - http://www.dropwizard.io/4.0.2/dropwizard-bom/dropwizard-dependencies/dropwizard-parent/dropwizard-assets)
(Apache License 2.0) Dropwizard Authentication (io.dropwizard:dropwizard-auth:4.0.2 - http://www.dropwizard.io/4.0.2/dropwizard-bom/dropwizard-dependencies/dropwizard-parent/dropwizard-auth)
Expand Down Expand Up @@ -309,9 +309,9 @@ Lists of 417 third-party dependencies.
(MIT License) liquibase-slf4j (com.mattbertolini:liquibase-slf4j:5.0.0 - https://github.com/mattbertolini/liquibase-slf4j)
(Apache License 2.0) localstack-utils (cloud.localstack:localstack-utils:0.2.22 - http://localstack.cloud)
(Apache Software Licenses) Log4j Implemented Over SLF4J (org.slf4j:log4j-over-slf4j:2.0.9 - http://www.slf4j.org)
(Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Access Module (ch.qos.logback:logback-access:1.4.11 - http://logback.qos.ch/logback-access)
(Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.4.11 - http://logback.qos.ch/logback-classic)
(Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.4.11 - http://logback.qos.ch/logback-core)
(Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Access Module (ch.qos.logback:logback-access:1.4.12 - http://logback.qos.ch/logback-access)
(Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.4.12 - http://logback.qos.ch/logback-classic)
(Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.4.12 - http://logback.qos.ch/logback-core)
(Apache License, Version 2.0) (MIT License) Logstash Logback Encoder (net.logstash.logback:logstash-logback-encoder:4.11 - https://github.com/logstash/logstash-logback-encoder)
(Apache License, Version 2.0) Lucene Core (org.apache.lucene:lucene-core:8.7.0 - https://lucene.apache.org/lucene-parent/lucene-core)
(MIT) mbknor-jackson-jsonSchema (com.kjetland:mbknor-jackson-jsonschema_2.12:1.0.34 - https://github.com/mbknor/mbknor-jackson-jsonSchema)
Expand Down Expand Up @@ -354,7 +354,7 @@ Lists of 417 third-party dependencies.
(Apache License, Version 2.0) Objenesis (org.objenesis:objenesis:3.2 - http://objenesis.org/objenesis)
(The Apache Software License, Version 2.0) okhttp (com.squareup.okhttp3:okhttp:4.10.0 - https://square.github.io/okhttp/)
(The Apache Software License, Version 2.0) okio (com.squareup.okio:okio-jvm:3.0.0 - https://github.com/square/okio/)
(Apache Software License, Version 2.0) openapi-java-client (io.dockstore:openapi-java-client:1.15.0-alpha.13 - no url defined)
(Apache Software License, Version 2.0) openapi-java-client (io.dockstore:openapi-java-client:1.15.0-SNAPSHOT - no url defined)
(The Apache License, Version 2.0) OpenCensus (io.opencensus:opencensus-api:0.31.0 - https://github.com/census-instrumentation/opencensus-java)
(Apache 2) opencsv (com.opencsv:opencsv:5.7.1 - http://opencsv.sf.net)
(Apache 2.0) optics (io.circe:circe-optics_2.13:0.14.1 - https://github.com/circe/circe-optics)
Expand Down Expand Up @@ -395,7 +395,7 @@ Lists of 417 third-party dependencies.
(Apache License 2.0) swagger-core-jakarta (io.swagger.core.v3:swagger-core-jakarta:2.2.15 - https://github.com/swagger-api/swagger-core/modules/swagger-core-jakarta)
(Apache License 2.0) swagger-integration-jakarta (io.swagger.core.v3:swagger-integration-jakarta:2.2.15 - https://github.com/swagger-api/swagger-core/modules/swagger-integration-jakarta)
(Apache Software License, Version 2.0) swagger-java-bitbucket-client (io.dockstore:swagger-java-bitbucket-client:2.0.3 - no url defined)
(Apache Software License, Version 2.0) swagger-java-client (io.dockstore:swagger-java-client:1.15.0-alpha.13 - no url defined)
(Apache Software License, Version 2.0) swagger-java-client (io.dockstore:swagger-java-client:1.15.0-SNAPSHOT - no url defined)
(Apache Software License, Version 2.0) swagger-java-discourse-client (io.dockstore:swagger-java-discourse-client:2.0.1 - no url defined)
(Apache Software License, Version 2.0) swagger-java-quay-client (io.dockstore:swagger-java-quay-client:2.0.2 - no url defined)
(Apache Software License, Version 2.0) swagger-java-sam-client (io.dockstore:swagger-java-sam-client:2.0.2 - no url defined)
Expand Down
29 changes: 28 additions & 1 deletion metricsaggregator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,22 @@ Usage: <main class> [options] [command] [command options]
Possible Values: [MINIWDL, WOMTOOL, CWLTOOL, NF_VALIDATION, OTHER]
* -vv, --validatorVersion
The version of the validator tool used to validate the workflows

submit-terra-metrics Submits workflow metrics provided by Terra via a
CSV file to Dockstore
Usage: submit-terra-metrics [options]
Options:
-c, --config
The config file path.
Default: ./metrics-aggregator.config
* -d, --data
The file path to the CSV file containing workflow metrics from
Terra. The first line of the file should contain the CSV fields: workflow_id,status,workflow_start,workflow_end,workflow_runtime_minutes,source_url
--help
Prints help for metricsaggregator
-r, --recordSkipped
Record skipped executions and the reason skipped to a CSV file
Default: false
```

### aggregate-metrics
Expand All @@ -101,4 +117,15 @@ java -jar target/metricsaggregator-*-SNAPSHOT.jar submit-validation-data --confi
--data <path-to-my-data-file> --validator MINIWDL --validatorVersion 1.0 --platform DNA_STACK
```

After running this command, you will want to run the `aggregate-metrics` command to aggregate the new validation data submitted.
After running this command, you will want to run the `aggregate-metrics` command to aggregate the new validation data submitted.

### submit-terra-metrics

The following is an example command that submits metrics from a CSV file that Terra provided, recording the metrics that were skipped into an output CSV file.

```
java -jar target/metricsaggregator-*-SNAPSHOT.jar submit-terra-metrics --config my-custom-config \
--data <path-to-terra-metrics-csv-file> --recordSkipped
```

After running this command, you will want to run the `aggregate-metrics` command to aggregate the new Terra metrics submitted.
14 changes: 14 additions & 0 deletions metricsaggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,14 @@
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3</artifactId>
Expand All @@ -144,6 +152,10 @@
<groupId>javax.money</groupId>
<artifactId>money-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
Expand Down Expand Up @@ -278,6 +290,8 @@
<usedDependency>org.glassfish.jersey.inject:jersey-hk2</usedDependency>
<usedDependency>javax.money:money-api</usedDependency>
<usedDependency>org.javamoney.moneta:moneta-core</usedDependency>
<usedDependency>ch.qos.logback:logback-classic</usedDependency>
<usedDependency>ch.qos.logback:logback-core</usedDependency>
</usedDependencies>
</configuration>
</execution>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ public class MetricsAggregatorS3Client {

private static final Logger LOG = LoggerFactory.getLogger(MetricsAggregatorS3Client.class);
private static final Gson GSON = new Gson();
private int numberOfDirectoriesProcessed = 0;
private int numberOfMetricsSubmitted = 0;
private int numberOfMetricsSkipped = 0;

private final String bucketName;

Expand All @@ -67,60 +70,72 @@ public MetricsAggregatorS3Client(String bucketName, String s3EndpointOverride) t
}

public void aggregateMetrics(ExtendedGa4GhApi extendedGa4GhApi) {
LOG.info("Getting directories to process...");
List<S3DirectoryInfo> metricsDirectories = getDirectories();

if (metricsDirectories.isEmpty()) {
System.out.println("No directories found to aggregate metrics");
LOG.info("No directories found to aggregate metrics");
return;
}

System.out.println("Aggregating metrics...");
for (S3DirectoryInfo directoryInfo : metricsDirectories) {
String toolId = directoryInfo.toolId();
String versionName = directoryInfo.versionId();
List<String> platforms = directoryInfo.platforms();
String platformsString = String.join(", ", platforms);
String versionS3KeyPrefix = directoryInfo.versionS3KeyPrefix();

// Collect metrics for each platform, so we can calculate metrics across all platforms
List<Metrics> allMetrics = new ArrayList<>();
for (String platform : platforms) {
ExecutionsRequestBody allSubmissions;
try {
allSubmissions = getExecutions(toolId, versionName, platform);
} catch (Exception e) {
LOG.error("Error aggregating metrics: Could not get all executions from directory {}", versionS3KeyPrefix, e);
continue; // Continue aggregating metrics for other directories
}

try {
getAggregatedMetrics(allSubmissions).ifPresent(metrics -> {
extendedGa4GhApi.aggregatedMetricsPut(metrics, platform, toolId, versionName);
System.out.printf("Aggregated metrics for tool ID %s, version %s, platform %s from directory %s%n", toolId, versionName, platform, versionS3KeyPrefix);
allMetrics.add(metrics);
});
} catch (Exception e) {
LOG.error("Error aggregating metrics: Could not put all executions from directory {}", versionS3KeyPrefix, e);
// Continue aggregating metrics for other platforms
}
LOG.info("Aggregating metrics for {} directories", metricsDirectories.size());
metricsDirectories.stream()
.parallel()
.forEach(directoryInfo -> aggregateMetricsForDirectory(directoryInfo, extendedGa4GhApi));
LOG.info("Completed aggregating metrics. Processed {} directories, submitted {} metrics, and skipped {} metrics", numberOfDirectoriesProcessed, numberOfMetricsSubmitted, numberOfMetricsSkipped);
}

private void aggregateMetricsForDirectory(S3DirectoryInfo directoryInfo, ExtendedGa4GhApi extendedGa4GhApi) {
LOG.info("Processing directory {}", directoryInfo);
String toolId = directoryInfo.toolId();
String versionName = directoryInfo.versionId();
List<String> platforms = directoryInfo.platforms();
String platformsString = String.join(", ", platforms);
String versionS3KeyPrefix = directoryInfo.versionS3KeyPrefix();

// Collect metrics for each platform, so we can calculate metrics across all platforms
List<Metrics> allMetrics = new ArrayList<>();
for (String platform : platforms) {
ExecutionsRequestBody allSubmissions;
try {
allSubmissions = getExecutions(toolId, versionName, platform);
} catch (Exception e) {
LOG.error("Error aggregating metrics: Could not get all executions from directory {}", versionS3KeyPrefix, e);
++numberOfMetricsSkipped;
continue; // Continue aggregating metrics for other directories
}

if (!allMetrics.isEmpty()) {
// Calculate metrics across all platforms by aggregating the aggregated metrics from each platform
try {
getAggregatedMetrics(new ExecutionsRequestBody().aggregatedExecutions(allMetrics)).ifPresent(metrics -> {
extendedGa4GhApi.aggregatedMetricsPut(metrics, Partner.ALL.name(), toolId, versionName);
System.out.printf("Aggregated metrics across all platforms (%s) for tool ID %s, version %s from directory %s%n",
platformsString, toolId, versionName, versionS3KeyPrefix);
allMetrics.add(metrics);
});
} catch (Exception e) {
LOG.error("Error aggregating metrics across all platforms ({}) for tool ID {}, version {} from directory {}", platformsString, toolId, versionName, versionS3KeyPrefix, e);
// Continue aggregating metrics for other directories
}
try {
getAggregatedMetrics(allSubmissions).ifPresent(metrics -> {
extendedGa4GhApi.aggregatedMetricsPut(metrics, platform, toolId, versionName);
System.out.printf("Aggregated metrics for tool ID %s, version %s, platform %s from directory %s%n", toolId, versionName, platform, versionS3KeyPrefix);
allMetrics.add(metrics);
});
} catch (Exception e) {
LOG.error("Error aggregating metrics: Could not put all executions from directory {}", versionS3KeyPrefix, e);
// Continue aggregating metrics for other platforms
}
}

if (!allMetrics.isEmpty()) {
// Calculate metrics across all platforms by aggregating the aggregated metrics from each platform
try {
getAggregatedMetrics(new ExecutionsRequestBody().aggregatedExecutions(allMetrics)).ifPresent(metrics -> {
extendedGa4GhApi.aggregatedMetricsPut(metrics, Partner.ALL.name(), toolId, versionName);
System.out.printf("Aggregated metrics across all platforms (%s) for tool ID %s, version %s from directory %s%n",
platformsString, toolId, versionName, versionS3KeyPrefix);
allMetrics.add(metrics);
});
} catch (Exception e) {
LOG.error("Error aggregating metrics across all platforms ({}) for tool ID {}, version {} from directory {}", platformsString, toolId, versionName, versionS3KeyPrefix, e);
// Continue aggregating metrics for other directories
}
++numberOfMetricsSubmitted;
} else {
++numberOfMetricsSkipped;
}
System.out.println("Completed aggregating metrics");
++numberOfDirectoriesProcessed;
LOG.info("Processed {} directories", numberOfDirectoriesProcessed);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ public File getConfig() {
}

@Parameters(commandNames = { "submit-validation-data" }, commandDescription = "Formats workflow validation data specified in a file then submits it to Dockstore")
public static class SubmitValidationData extends CommandLineArgs {
public static class SubmitValidationData extends CommandLineArgs {
@Parameter(names = {"-c", "--config"}, description = "The config file path.")
private File config = new File("./" + MetricsAggregatorClient.CONFIG_FILE_NAME);

Expand Down Expand Up @@ -78,4 +78,45 @@ public Partner getPlatform() {
return platform;
}
}

@Parameters(commandNames = { "submit-terra-metrics" }, commandDescription = "Submits workflow metrics provided by Terra via a CSV file to Dockstore")
public static class SubmitTerraMetrics extends CommandLineArgs {
@Parameter(names = {"-c", "--config"}, description = "The config file path.")
private File config = new File("./" + MetricsAggregatorClient.CONFIG_FILE_NAME);


@Parameter(names = {"-d", "--data"}, description = "The file path to the CSV file containing workflow metrics from Terra. The first line of the file should contain the CSV fields: workflow_id,status,workflow_start,workflow_end,workflow_runtime_minutes,source_url", required = true)
private String dataFilePath;

@Parameter(names = {"-r", "--recordSkipped"}, description = "Record skipped executions and the reason skipped to a CSV file")
private boolean recordSkippedExecutions;

public File getConfig() {
return config;
}


public String getDataFilePath() {
return dataFilePath;
}

public boolean isRecordSkippedExecutions() {
return recordSkippedExecutions;
}

/**
* Headers for the input data file
*/
public enum TerraMetricsCsvHeaders {
workflow_id, status, workflow_start, workflow_end, workflow_runtime_minutes, source_url
}

/**
* Headers for the output file containing workflow executions that were skipped.
* The headers are the same as the input file headers, with the addition of a "reason" header indicating why an execution was skipped
*/
public enum SkippedTerraMetricsCsvHeaders {
workflow_id, status, workflow_start, workflow_end, workflow_runtime_minutes, source_url, reason_skipped
}
}
}
Loading
Loading