diff --git a/THIRD-PARTY-LICENSES.txt b/THIRD-PARTY-LICENSES.txt index 9b4ac2f1..3ab6c02c 100644 --- a/THIRD-PARTY-LICENSES.txt +++ b/THIRD-PARTY-LICENSES.txt @@ -1,5 +1,5 @@ -Lists of 416 third-party dependencies. +Lists of 426 third-party dependencies. (The Apache Software License, Version 2.0) Adapter: RxJava 2 (com.squareup.retrofit2:adapter-rxjava2:2.9.0 - https://github.com/square/retrofit) (Apache License, Version 2.0) akka-actor (com.typesafe.akka:akka-actor_2.13:2.5.32 - https://akka.io/) (Apache License, Version 2.0) akka-protobuf (com.typesafe.akka:akka-protobuf_2.13:2.5.32 - https://akka.io/) @@ -12,7 +12,7 @@ Lists of 416 third-party dependencies. (EPL 2.0) (GPL2 w/ CPE) aopalliance version 1.0 repackaged as a module (org.glassfish.hk2.external:aopalliance-repackaged:3.0.4 - https://github.com/eclipse-ee4j/glassfish-hk2/external/aopalliance-repackaged) (Apache-2.0) Apache Avro (org.apache.avro:avro:1.11.3 - https://avro.apache.org) (Apache License, Version 2.0) Apache Commons BeanUtils (commons-beanutils:commons-beanutils:1.9.4 - https://commons.apache.org/proper/commons-beanutils/) - (Apache License, Version 2.0) Apache Commons Codec (commons-codec:commons-codec:1.15 - https://commons.apache.org/proper/commons-codec/) + (Apache-2.0) Apache Commons Codec (commons-codec:commons-codec:1.17.1 - https://commons.apache.org/proper/commons-codec/) (Apache License, Version 2.0) Apache Commons Collections (commons-collections:commons-collections:3.2.2 - http://commons.apache.org/collections/) (Apache-2.0) Apache Commons Compress (org.apache.commons:commons-compress:1.26.0 - https://commons.apache.org/proper/commons-compress/) (Apache-2.0) (The Apache Software License, Version 2.0) Apache Commons Configuration (org.apache.commons:commons-configuration2:2.10.1 - https://commons.apache.org/proper/commons-configuration/) @@ -45,32 +45,44 @@ Lists of 416 third-party dependencies. (Apache License 2.0) Asynchronous Http Client Netty Utils (org.asynchttpclient:async-http-client-netty-utils:2.10.3 - http://github.com/AsyncHttpClient/async-http-client/async-http-client-netty-utils) (Apache 2.0) AutoValue Annotations (com.google.auto.value:auto-value-annotations:1.9 - https://github.com/google/auto/tree/master/value) (Apache License, Version 2.0) AWS Event Stream (software.amazon.eventstream:eventstream:1.0.1 - https://github.com/awslabs/aws-eventstream-java) - (Apache License, Version 2.0) AWS Java SDK :: Annotations (software.amazon.awssdk:annotations:2.17.186 - https://aws.amazon.com/sdkforjava/core/annotations) - (Apache License, Version 2.0) AWS Java SDK :: Arns (software.amazon.awssdk:arns:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Auth (software.amazon.awssdk:auth:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: AWS Core (software.amazon.awssdk:aws-core:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Cbor Protocol (software.amazon.awssdk:aws-cbor-protocol:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Json Protocol (software.amazon.awssdk:aws-json-protocol:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Query Protocol (software.amazon.awssdk:aws-query-protocol:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Xml Protocol (software.amazon.awssdk:aws-xml-protocol:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: Json Utils (software.amazon.awssdk:json-utils:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: Protocol Core (software.amazon.awssdk:protocol-core:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: HTTP Client Interface (software.amazon.awssdk:http-client-spi:2.17.186 - https://aws.amazon.com/sdkforjava/http-client-spi) - (Apache License, Version 2.0) AWS Java SDK :: HTTP Clients :: Apache (software.amazon.awssdk:apache-client:2.17.186 - https://aws.amazon.com/sdkforjava/http-clients/apache-client) - (Apache License, Version 2.0) AWS Java SDK :: HTTP Clients :: Netty Non-Blocking I/O (software.amazon.awssdk:netty-nio-client:2.17.186 - https://aws.amazon.com/sdkforjava/http-clients/netty-nio-client) - (Apache License, Version 2.0) AWS Java SDK :: Metrics SPI (software.amazon.awssdk:metrics-spi:2.17.186 - https://aws.amazon.com/sdkforjava/core/metrics-spi) - (Apache License, Version 2.0) AWS Java SDK :: Profiles (software.amazon.awssdk:profiles:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Regions (software.amazon.awssdk:regions:2.17.186 - https://aws.amazon.com/sdkforjava/core/regions) - (Apache License, Version 2.0) AWS Java SDK :: SDK Core (software.amazon.awssdk:sdk-core:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon Athena (software.amazon.awssdk:athena:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon CloudWatch (software.amazon.awssdk:cloudwatch:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon EC2 Container Service (software.amazon.awssdk:ecs:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon Kinesis (software.amazon.awssdk:kinesis:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon S3 (software.amazon.awssdk:s3:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon SQS (software.amazon.awssdk:sqs:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Third Party :: Jackson-core (software.amazon.awssdk:third-party-jackson-core:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Third Party :: Jackson-dataformat-cbor (software.amazon.awssdk:third-party-jackson-dataformat-cbor:2.17.186 - https://aws.amazon.com/sdkforjava) - (Apache License, Version 2.0) AWS Java SDK :: Utilities (software.amazon.awssdk:utils:2.17.186 - https://aws.amazon.com/sdkforjava/utils) + (Apache License, Version 2.0) AWS Java SDK :: Annotations (software.amazon.awssdk:annotations:2.28.7 - https://aws.amazon.com/sdkforjava/core/annotations) + (Apache License, Version 2.0) AWS Java SDK :: Arns (software.amazon.awssdk:arns:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Auth (software.amazon.awssdk:auth:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: AWS Core (software.amazon.awssdk:aws-core:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: AWS CRT Core (software.amazon.awssdk:crt-core:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Checksums (software.amazon.awssdk:checksums:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Checksums SPI (software.amazon.awssdk:checksums-spi:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Cbor Protocol (software.amazon.awssdk:aws-cbor-protocol:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Json Protocol (software.amazon.awssdk:aws-json-protocol:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Query Protocol (software.amazon.awssdk:aws-query-protocol:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: AWS Xml Protocol (software.amazon.awssdk:aws-xml-protocol:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: Json Utils (software.amazon.awssdk:json-utils:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Core :: Protocols :: Protocol Core (software.amazon.awssdk:protocol-core:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Endpoints SPI (software.amazon.awssdk:endpoints-spi:2.28.7 - https://aws.amazon.com/sdkforjava/core/endpoints-spi) + (Apache License, Version 2.0) AWS Java SDK :: HTTP Auth (software.amazon.awssdk:http-auth:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: HTTP Auth AWS (software.amazon.awssdk:http-auth-aws:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: HTTP Auth Event Stream (software.amazon.awssdk:http-auth-aws-eventstream:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: HTTP Auth SPI (software.amazon.awssdk:http-auth-spi:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: HTTP Client Interface (software.amazon.awssdk:http-client-spi:2.28.7 - https://aws.amazon.com/sdkforjava/http-client-spi) + (Apache License, Version 2.0) AWS Java SDK :: HTTP Clients :: Apache (software.amazon.awssdk:apache-client:2.28.7 - https://aws.amazon.com/sdkforjava/http-clients/apache-client) + (Apache License, Version 2.0) AWS Java SDK :: HTTP Clients :: Netty Non-Blocking I/O (software.amazon.awssdk:netty-nio-client:2.28.7 - https://aws.amazon.com/sdkforjava/http-clients/netty-nio-client) + (Apache License, Version 2.0) AWS Java SDK :: Identity SPI (software.amazon.awssdk:identity-spi:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Metrics SPI (software.amazon.awssdk:metrics-spi:2.28.7 - https://aws.amazon.com/sdkforjava/core/metrics-spi) + (Apache License, Version 2.0) AWS Java SDK :: Profiles (software.amazon.awssdk:profiles:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Regions (software.amazon.awssdk:regions:2.28.7 - https://aws.amazon.com/sdkforjava/core/regions) + (Apache License, Version 2.0) AWS Java SDK :: Retries (software.amazon.awssdk:retries:2.28.7 - https://aws.amazon.com/sdkforjava/core/retries) + (Apache License, Version 2.0) AWS Java SDK :: Retries API (software.amazon.awssdk:retries-spi:2.28.7 - https://aws.amazon.com/sdkforjava/core/retries-spi) + (Apache License, Version 2.0) AWS Java SDK :: SDK Core (software.amazon.awssdk:sdk-core:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon Athena (software.amazon.awssdk:athena:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon CloudWatch (software.amazon.awssdk:cloudwatch:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon EC2 Container Service (software.amazon.awssdk:ecs:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon Kinesis (software.amazon.awssdk:kinesis:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon S3 (software.amazon.awssdk:s3:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Services :: Amazon SQS (software.amazon.awssdk:sqs:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Services :: Bedrock Runtime (software.amazon.awssdk:bedrockruntime:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Third Party :: Jackson-core (software.amazon.awssdk:third-party-jackson-core:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Third Party :: Jackson-dataformat-cbor (software.amazon.awssdk:third-party-jackson-dataformat-cbor:2.28.7 - https://aws.amazon.com/sdkforjava) + (Apache License, Version 2.0) AWS Java SDK :: Utilities (software.amazon.awssdk:utils:2.28.7 - https://aws.amazon.com/sdkforjava/utils) (Apache License, Version 2.0) AWS Java SDK for Amazon S3 (com.amazonaws:aws-java-sdk-s3:1.11.83 - https://aws.amazon.com/sdkforjava) (Apache License, Version 2.0) AWS Java SDK for AWS KMS (com.amazonaws:aws-java-sdk-kms:1.11.83 - https://aws.amazon.com/sdkforjava) (Apache License, Version 2.0) AWS SDK for Java - Core (com.amazonaws:aws-java-sdk-core:1.11.83 - https://aws.amazon.com/sdkforjava) @@ -129,10 +141,10 @@ Lists of 416 third-party dependencies. (The Apache Software License, Version 2.0) docker-java-core (com.github.docker-java:docker-java-core:3.3.6 - https://github.com/docker-java/docker-java) (The Apache Software License, Version 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.3.6 - https://github.com/docker-java/docker-java) (The Apache Software License, Version 2.0) docker-java-transport-httpclient5 (com.github.docker-java:docker-java-transport-httpclient5:3.3.6 - https://github.com/docker-java/docker-java) - (Apache Software License, Version 2.0) dockstore-common (io.dockstore:dockstore-common:1.16.0-alpha.16 - no url defined) - (Apache Software License, Version 2.0) dockstore-integration-testing (io.dockstore:dockstore-integration-testing:1.16.0-alpha.16 - no url defined) - (Apache Software License, Version 2.0) dockstore-language-plugin-parent (io.dockstore:dockstore-language-plugin-parent:1.16.0-alpha.16 - no url defined) - (Apache Software License, Version 2.0) dockstore-webservice (io.dockstore:dockstore-webservice:1.16.0-alpha.16 - no url defined) + (Apache Software License, Version 2.0) dockstore-common (io.dockstore:dockstore-common:1.16.0-beta.1 - no url defined) + (Apache Software License, Version 2.0) dockstore-integration-testing (io.dockstore:dockstore-integration-testing:1.16.0-beta.1 - no url defined) + (Apache Software License, Version 2.0) dockstore-language-plugin-parent (io.dockstore:dockstore-language-plugin-parent:1.16.0-beta.1 - no url defined) + (Apache Software License, Version 2.0) dockstore-webservice (io.dockstore:dockstore-webservice:1.16.0-beta.1 - no url defined) (Apache License 2.0) Dropwizard (io.dropwizard:dropwizard-core:4.0.2 - http://www.dropwizard.io/4.0.2/dropwizard-bom/dropwizard-dependencies/dropwizard-parent/dropwizard-core) (Apache License 2.0) Dropwizard Asset Bundle (io.dropwizard:dropwizard-assets:4.0.2 - http://www.dropwizard.io/4.0.2/dropwizard-bom/dropwizard-dependencies/dropwizard-parent/dropwizard-assets) (Apache License 2.0) Dropwizard Authentication (io.dropwizard:dropwizard-auth:4.0.2 - http://www.dropwizard.io/4.0.2/dropwizard-bom/dropwizard-dependencies/dropwizard-parent/dropwizard-auth) @@ -287,7 +299,7 @@ Lists of 416 third-party dependencies. (Public Domain) JSON in Java (org.json:json:20231013 - https://github.com/douglascrockford/JSON-java) (The MIT License) jsoup Java HTML Parser (org.jsoup:jsoup:1.10.2 - https://jsoup.org/) (Apache License, Version 2.0) JSR 354 (Money and Currency API) (javax.money:money-api:1.1 - https://javamoney.github.io/) - (MIT License) JTokkit (com.knuddels:jtokkit:1.0.0 - https://github.com/knuddelsgmbh/jtokkit) + (MIT License) JTokkit (com.knuddels:jtokkit:1.1.0 - https://github.com/knuddelsgmbh/jtokkit) (MIT License) JUL to SLF4J bridge (org.slf4j:jul-to-slf4j:2.0.9 - http://www.slf4j.org) (Eclipse Public License 1.0) JUnit (junit:junit:4.13.2 - http://junit.org) (Eclipse Public License v2.0) JUnit Jupiter (Aggregator) (org.junit.jupiter:junit-jupiter:5.10.0 - https://junit.org/junit5/) @@ -305,7 +317,7 @@ Lists of 416 third-party dependencies. (WDL License https://github.com/openwdl/wdl/blob/master/LICENSE) language-factory-core (org.broadinstitute:language-factory-core_2.13:85 - no url defined) (Apache License, Version 2.0) Liquibase (org.liquibase:liquibase-core:4.23.0 - http://www.liquibase.com) (MIT License) liquibase-slf4j (com.mattbertolini:liquibase-slf4j:5.0.0 - https://github.com/mattbertolini/liquibase-slf4j) - (Apache License 2.0) localstack-utils (cloud.localstack:localstack-utils:0.2.22 - http://localstack.cloud) + (Apache License 2.0) localstack-utils (cloud.localstack:localstack-utils:0.2.23 - http://localstack.cloud) (Apache Software Licenses) Log4j Implemented Over SLF4J (org.slf4j:log4j-over-slf4j:2.0.9 - http://www.slf4j.org) (Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Access Module (ch.qos.logback:logback-access:1.4.12 - http://logback.qos.ch/logback-access) (Eclipse Public License - v 1.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.4.12 - http://logback.qos.ch/logback-classic) @@ -331,28 +343,26 @@ Lists of 416 third-party dependencies. (The MIT License) mockito-inline (org.mockito:mockito-inline:3.12.4 - https://github.com/mockito/mockito) (Apache 2 License) Moneta Core (org.javamoney.moneta:moneta-core:1.4.2 - http://javamoney.org) (MIT) mouse (org.typelevel:mouse_2.13:1.0.11 - https://typelevel.org/mouse) - (Apache License, Version 2.0) Netty Reactive Streams HTTP support (com.typesafe.netty:netty-reactive-streams-http:2.0.5 - https://github.com/playframework/netty-reactive-streams/netty-reactive-streams-http) (Apache License, Version 2.0) Netty Reactive Streams Implementation (com.typesafe.netty:netty-reactive-streams:2.0.5 - https://github.com/playframework/netty-reactive-streams/netty-reactive-streams) - (Apache License, Version 2.0) Netty/Buffer (io.netty:netty-buffer:4.1.72.Final - https://netty.io/netty-buffer/) - (Apache License, Version 2.0) Netty/Codec (io.netty:netty-codec:4.1.72.Final - https://netty.io/netty-codec/) - (Apache License, Version 2.0) Netty/Codec/DNS (io.netty:netty-codec-dns:4.1.72.Final - https://netty.io/netty-codec-dns/) - (Apache License, Version 2.0) Netty/Codec/HTTP (io.netty:netty-codec-http:4.1.72.Final - https://netty.io/netty-codec-http/) - (Apache License, Version 2.0) Netty/Codec/HTTP2 (io.netty:netty-codec-http2:4.1.72.Final - https://netty.io/netty-codec-http2/) - (Apache License, Version 2.0) Netty/Codec/Socks (io.netty:netty-codec-socks:4.1.72.Final - https://netty.io/netty-codec-socks/) - (Apache License, Version 2.0) Netty/Common (io.netty:netty-common:4.1.72.Final - https://netty.io/netty-common/) - (Apache License, Version 2.0) Netty/Handler (io.netty:netty-handler:4.1.72.Final - https://netty.io/netty-handler/) - (Apache License, Version 2.0) Netty/Handler/Proxy (io.netty:netty-handler-proxy:4.1.72.Final - https://netty.io/netty-handler-proxy/) - (Apache License, Version 2.0) Netty/Resolver (io.netty:netty-resolver:4.1.72.Final - https://netty.io/netty-resolver/) - (Apache License, Version 2.0) Netty/Resolver/DNS (io.netty:netty-resolver-dns:4.1.72.Final - https://netty.io/netty-resolver-dns/) - (https://github.com/netty/netty-tcnative/blob/main/LICENSE.txt) Netty/TomcatNative [OpenSSL - Classes] (io.netty:netty-tcnative-classes:2.0.46.Final - https://github.com/netty/netty-tcnative/netty-tcnative-classes/) - (Apache License, Version 2.0) Netty/Transport (io.netty:netty-transport:4.1.72.Final - https://netty.io/netty-transport/) - (Apache License, Version 2.0) Netty/Transport/Classes/Epoll (io.netty:netty-transport-classes-epoll:4.1.72.Final - https://netty.io/netty-transport-classes-epoll/) - (Apache License, Version 2.0) Netty/Transport/Native/Epoll (io.netty:netty-transport-native-epoll:4.1.72.Final - https://netty.io/netty-transport-native-epoll/) - (Apache License, Version 2.0) Netty/Transport/Native/Unix/Common (io.netty:netty-transport-native-unix-common:4.1.72.Final - https://netty.io/netty-transport-native-unix-common/) + (Apache License, Version 2.0) Netty/Buffer (io.netty:netty-buffer:4.1.112.Final - https://netty.io/netty-buffer/) + (Apache License, Version 2.0) Netty/Codec (io.netty:netty-codec:4.1.112.Final - https://netty.io/netty-codec/) + (Apache License, Version 2.0) Netty/Codec/DNS (io.netty:netty-codec-dns:4.1.112.Final - https://netty.io/netty-codec-dns/) + (Apache License, Version 2.0) Netty/Codec/HTTP (io.netty:netty-codec-http:4.1.112.Final - https://netty.io/netty-codec-http/) + (Apache License, Version 2.0) Netty/Codec/HTTP2 (io.netty:netty-codec-http2:4.1.112.Final - https://netty.io/netty-codec-http2/) + (Apache License, Version 2.0) Netty/Codec/Socks (io.netty:netty-codec-socks:4.1.112.Final - https://netty.io/netty-codec-socks/) + (Apache License, Version 2.0) Netty/Common (io.netty:netty-common:4.1.112.Final - https://netty.io/netty-common/) + (Apache License, Version 2.0) Netty/Handler (io.netty:netty-handler:4.1.112.Final - https://netty.io/netty-handler/) + (Apache License, Version 2.0) Netty/Handler/Proxy (io.netty:netty-handler-proxy:4.1.112.Final - https://netty.io/netty-handler-proxy/) + (Apache License, Version 2.0) Netty/Resolver (io.netty:netty-resolver:4.1.112.Final - https://netty.io/netty-resolver/) + (Apache License, Version 2.0) Netty/Resolver/DNS (io.netty:netty-resolver-dns:4.1.112.Final - https://netty.io/netty-resolver-dns/) + (Apache License, Version 2.0) Netty/Transport (io.netty:netty-transport:4.1.112.Final - https://netty.io/netty-transport/) + (Apache License, Version 2.0) Netty/Transport/Classes/Epoll (io.netty:netty-transport-classes-epoll:4.1.112.Final - https://netty.io/netty-transport-classes-epoll/) + (Apache License, Version 2.0) Netty/Transport/Native/Epoll (io.netty:netty-transport-native-epoll:4.1.112.Final - https://netty.io/netty-transport-native-epoll/) + (Apache License, Version 2.0) Netty/Transport/Native/Unix/Common (io.netty:netty-transport-native-unix-common:4.1.112.Final - https://netty.io/netty-transport-native-unix-common/) (Apache License, Version 2.0) Objenesis (org.objenesis:objenesis:3.2 - http://objenesis.org/objenesis) (The Apache Software License, Version 2.0) okhttp (com.squareup.okhttp3:okhttp:4.10.0 - https://square.github.io/okhttp/) (The Apache Software License, Version 2.0) okio (com.squareup.okio:okio-jvm:3.0.0 - https://github.com/square/okio/) - (Apache Software License, Version 2.0) openapi-java-client (io.dockstore:openapi-java-client:1.16.0-alpha.16 - no url defined) + (Apache Software License, Version 2.0) openapi-java-client (io.dockstore:openapi-java-client:1.16.0-beta.1 - no url defined) (The Apache License, Version 2.0) OpenCensus (io.opencensus:opencensus-api:0.31.0 - https://github.com/census-instrumentation/opencensus-java) (Apache 2) opencsv (com.opencsv:opencsv:5.7.1 - http://opencsv.sf.net) (Apache 2.0) optics (io.circe:circe-optics_2.13:0.14.1 - https://github.com/circe/circe-optics) @@ -365,7 +375,7 @@ Lists of 416 third-party dependencies. (MIT) pprint_2.13 (com.lihaoyi:pprint_2.13:0.7.3 - https://github.com/lihaoyi/PPrint) (The Apache Software License, Version 2.0) rank-eval (org.elasticsearch.plugin:rank-eval-client:7.10.2 - https://github.com/elastic/elasticsearch) (Apache License 2.0) Reactive Relational Database Connectivity - SPI (io.r2dbc:r2dbc-spi:1.0.0.RELEASE - https://r2dbc.io/r2dbc-spi) - (CC0) reactive-streams (org.reactivestreams:reactive-streams:1.0.3 - http://www.reactive-streams.org/) + (MIT-0) reactive-streams (org.reactivestreams:reactive-streams:1.0.4 - http://www.reactive-streams.org/) (MIT) refined (eu.timepit:refined_2.13:0.10.1 - https://github.com/fthomas/refined) (The Apache Software License, Version 2.0) rest (org.elasticsearch.client:elasticsearch-rest-client:7.10.2 - https://github.com/elastic/elasticsearch) (The Apache Software License, Version 2.0) rest-high-level (org.elasticsearch.client:elasticsearch-rest-high-level-client:7.10.2 - https://github.com/elastic/elasticsearch) @@ -394,11 +404,11 @@ Lists of 416 third-party dependencies. (Apache License 2.0) swagger-core-jakarta (io.swagger.core.v3:swagger-core-jakarta:2.2.15 - https://github.com/swagger-api/swagger-core/modules/swagger-core-jakarta) (Apache License 2.0) swagger-integration-jakarta (io.swagger.core.v3:swagger-integration-jakarta:2.2.15 - https://github.com/swagger-api/swagger-core/modules/swagger-integration-jakarta) (Apache Software License, Version 2.0) swagger-java-bitbucket-client (io.dockstore:swagger-java-bitbucket-client:2.0.3 - no url defined) - (Apache Software License, Version 2.0) swagger-java-client (io.dockstore:swagger-java-client:1.16.0-alpha.16 - no url defined) + (Apache Software License, Version 2.0) swagger-java-client (io.dockstore:swagger-java-client:1.16.0-beta.1 - no url defined) (Apache Software License, Version 2.0) swagger-java-discourse-client (io.dockstore:swagger-java-discourse-client:2.0.1 - no url defined) (Apache Software License, Version 2.0) swagger-java-quay-client (io.dockstore:swagger-java-quay-client:2.0.2 - no url defined) (Apache Software License, Version 2.0) swagger-java-sam-client (io.dockstore:swagger-java-sam-client:2.0.2 - no url defined) - (Apache Software License, Version 2.0) swagger-java-zenodo-client (io.dockstore:swagger-java-zenodo-client:2.0.4 - no url defined) + (Apache Software License, Version 2.0) swagger-java-zenodo-client (io.dockstore:swagger-java-zenodo-client:2.1.3 - no url defined) (Apache License 2.0) swagger-jaxrs2-jakarta (io.swagger.core.v3:swagger-jaxrs2-jakarta:2.2.15 - https://github.com/swagger-api/swagger-core/modules/swagger-jaxrs2-jakarta) (Apache License 2.0) swagger-jaxrs2-servlet-initializer-jakarta (io.swagger.core.v3:swagger-jaxrs2-servlet-initializer-jakarta:2.2.15 - https://github.com/swagger-api/swagger-core/modules/swagger-jaxrs2-servlet-initializer-jakarta) (Apache License 2.0) swagger-models (io.swagger:swagger-models:1.6.8 - https://github.com/swagger-api/swagger-core/modules/swagger-models) diff --git a/metricsaggregator/src/test/resources/metrics-aggregator.config b/metricsaggregator/src/test/resources/metrics-aggregator.config index 159f9b08..7d7ef0a9 100644 --- a/metricsaggregator/src/test/resources/metrics-aggregator.config +++ b/metricsaggregator/src/test/resources/metrics-aggregator.config @@ -6,7 +6,7 @@ token: 08932ab0c9ae39a880905666902f8659633ae0232e94ba9f3d2094cb928397e7 [s3] bucketName: local-dockstore-metrics-data -endpointOverride: http://localhost:4566 +endpointOverride: https://s3.localhost.localstack.cloud:4566 [athena] workgroup: local-dockstore-metrics-workgroup diff --git a/pom.xml b/pom.xml index 8f2ca77b..cec002df 100644 --- a/pom.xml +++ b/pom.xml @@ -38,7 +38,7 @@ scm:git:git@github.com:dockstore/dockstore-support.git UTF-8 - 1.16.0-alpha.16 + 1.16.0-beta.1 3.0.0-M5 2.22.2 false diff --git a/topicgenerator/README.md b/topicgenerator/README.md index 06fa0997..21124868 100644 --- a/topicgenerator/README.md +++ b/topicgenerator/README.md @@ -1,6 +1,6 @@ # Topic Generator -This is a Java program that generates topics for public Dockstore entries using OpenAI's gpt-3.5-turbo-16k AI model. +This is a Java program that generates topics for public Dockstore entries using AI. The [entries.csv](entries.csv) file contains the TRS ID and default versions of public Dockstore entries to generate topics for. The [results](results) directory contains the generated topics for those entries from running the topic generator. @@ -8,15 +8,12 @@ The [entries.csv](entries.csv) file contains the TRS ID and default versions of ### Configuration file -Create a configuration file like the following. A template `metrics-aggregator.config` file can be found [here](templates/topic-generator.config). +Create a configuration file like the following. A template `topic-generator.config` file can be found [here](templates/topic-generator.config). ``` [dockstore] server-url: token: - -[ai] -openai-api-key: ``` **Required:** @@ -26,7 +23,26 @@ openai-api-key: - `https://staging.dockstore.org/api` - `https://dockstore.org/api` - `token`: The Dockstore token of an admin or curator. This token is used to upload topics to the webservice. -- `openai-api-key`: The OpenAI API key required for using the OpenAI APIs. See https://platform.openai.com/docs/api-reference/authentication for more details. This is used to generate topics. + +### Authentication to invoke AI models + +#### AWS Bedrock + +By default, the program uses AWS Bedrock to invoke the Anthropic Claude 3 Haiku model to generate topics. +AWS credentials that have permissions to use the AWS Bedrock API are required and they must have access to the Anthropic Claude models on AWS. +There are several ways that this can be provided. +Read [this](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain) for the default credential provider chain. + + +#### OpenAI (deprecated) + +We have moved away from using OpenAI models to generate topics, but if you wish to use it, you need to add the following section to your configuration file. +See https://platform.openai.com/docs/api-reference/authentication for more details. + +``` +[ai] +openai-api-key: +``` ## Running the program @@ -49,6 +65,10 @@ Usage:
[options] [command] [command options] name of the entries to generate topics for. The first line of the file should contain the CSV fields: trsID,version Default: ./entries.csv + -m, --model + The AI model to use + Default: CLAUDE_3_HAIKU + Possible Values: [CLAUDE_3_5_SONNET, CLAUDE_3_HAIKU, GPT_4O_MINI] upload-topics Upload AI topics, generated by the generate-topics command, for public Dockstore entries. @@ -59,7 +79,7 @@ Usage:
[options] [command] [command options] of the entries to upload topics for. The first line of the file should contain the CSV fields: trsId,aiTopic. The output file generated by the generate-topics command can be used as the - argument. + argument. ``` ### generate-topics diff --git a/topicgenerator/pom.xml b/topicgenerator/pom.xml index ae6ca26e..081c95cb 100644 --- a/topicgenerator/pom.xml +++ b/topicgenerator/pom.xml @@ -112,7 +112,7 @@ com.knuddels jtokkit - 1.0.0 + 1.1.0 org.apache.commons @@ -148,6 +148,19 @@ org.apache.commons commons-csv + + software.amazon.awssdk + bedrockruntime + + + software.amazon.awssdk + auth + + + software.amazon.awssdk + sdk-core + + io.dockstore dockstore-webservice diff --git a/topicgenerator/results/generated-topics_CLAUDE_3_HAIKU_20240926T154233Z.csv b/topicgenerator/results/generated-topics_CLAUDE_3_HAIKU_20240926T154233Z.csv new file mode 100644 index 00000000..2bef1811 --- /dev/null +++ b/topicgenerator/results/generated-topics_CLAUDE_3_HAIKU_20240926T154233Z.csv @@ -0,0 +1,24 @@ +trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,cost,finishReason,aiTopic +"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/COVID-19-PE-ARTIC-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/main//pe-artic-variation.ga,dcc2761eb35156d7d09479112daf089439774fc29938f02cb6ee8cda87906758,false,24595,43,0.0062025,end_turn,"Trim ARTIC primer sequences, realign reads, call and filter variants, annotate variants, and apply a strand-bias soft filter to the final annotated variants." +"#workflow/github.com/iwc-workflows/sars-cov-2-variation-reporting/COVID-19-VARIATION-REPORTING",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-variation-reporting/main//variation-reporting.ga,2dce46106d669248d5858b56269c9cbc26057acfdb121f693dd6321d0350105c,false,24503,48,0.00618575,end_turn,"Filters and extracts variants from a VCF dataset, generates tabular lists of variants by Samples and by Variant, and creates an overview plot of variants and their allele-frequencies." +"#workflow/github.com/iwc-workflows/sars-cov-2-ont-artic-variant-calling/COVID-19-ARTIC-ONT",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-ont-artic-variant-calling/main//ont-artic-variation.ga,03c9318d50df9a3c2d725da77f9a175ce7eb78e264cc4a40fa17d48a99dd124b,false,20290,57,0.00514375,end_turn,"Perform read filtering, mapping, primer trimming, variant calling, and annotation on ONT-sequenced ARTIC data using tools like fastp, minimap2, ivar, medaka, and SnpEff." +"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-wgs-variant-calling/COVID-19-PE-WGS-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-wgs-variant-calling/main//pe-wgs-variation.ga,f2357986dd72af73efb2b8110f0f08f6f715017518498edda61d50d23b2e560f,false,11352,48,0.0028980000000000004,end_turn,"Perform paired-end read mapping with bwa-mem, deduplicate and realign the reads, and then call and annotate variants using lofreq and SnpEff." +"#workflow/github.com/iwc-workflows/sars-cov-2-se-illumina-wgs-variant-calling/COVID-19-SE-WGS-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-se-illumina-wgs-variant-calling/main//se-wgs-variation.ga,89993f9570eebd53983cdf4f8e4e0f44631f536e855b2e0741f4a5b907858e8d,false,9521,57,0.0024515000000000006,end_turn,"Perform single-end read mapping with Bowtie2, mark duplicates with Picard, realign reads with LoFreq, call variants with LoFreq, and annotate variants with SnpEff." +"#workflow/github.com/iwc-workflows/sars-cov-2-consensus-from-variation/COVID-19-CONSENSUS-CONSTRUCTION",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-consensus-from-variation/main//consensus-from-variation.ga,cb8acce2a1b0d059b117ae3737307be82e81807dad3efcbfe6c8f6a09cbd2798,false,15395,45,0.003905,end_turn,"Build a consensus sequence from FILTER PASS variants with intrasample allele-frequency above a configurable consensus threshold, hard-mask regions with low coverage, and ambiguous sites." +"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-artic-ivar-analysis/SARS-COV-2-ILLUMINA-AMPLICON-IVAR-PANGOLIN-NEXTCLADE",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-artic-ivar-analysis/main//pe-wgs-ivar-analysis.ga,11e9e375b01f1a5b33fb64b184798bc6e46181201a255cb23147a7f021007864,false,13894,49,0.00353475,end_turn,"Find and annotate variants in ampliconic SARS-CoV-2 Illumina sequencing data, classify samples with pangolin and nextclade, and generate a quality control report." +"#workflow/github.com/iwc-workflows/parallel-accession-download/main",main,https://raw.githubusercontent.com/iwc-workflows/parallel-accession-download/main//parallel-accession-download.ga,3e9aee6218674651a981b3437b9ca8ec294ff492984b153db0526989e716e674,false,3629,39,9.56E-4,end_turn,"Downloads fastq files for sequencing run accessions provided in a text file using fasterq-dump, creating one job per listed run accession." +"#workflow/github.com/nf-core/rnaseq",1.4.2,https://raw.githubusercontent.com/nf-core/rnaseq/1.4.2//main.nf,0d0db0adf13e907e33e44b1357486cf57747ad78c3a5e31f187919d844530fe9,false,24811,32,0.00624275,end_turn,"Trim the raw reads, align them to the reference genome, perform quality control analysis, and quantify gene expression." +"#workflow/github.com/nf-core/vipr",master,https://raw.githubusercontent.com/nf-core/vipr/master//main.nf,a8b50e8afa5730e6ede16e4588cec2f03c3ae881385daed22edb3e8376af5793,false,4734,78,0.001281,end_turn,"Execute the ViPR workflow, which performs viral amplicon/enrichment analysis and intrahost variant calling, starting with trimming and combining read pairs, followed by decontamination, metagenomics classification, assembly, polishing, mapping, variant calling, coverage computation, and finally plotting and preparing the final reference sequence." +"#workflow/github.com/nf-core/methylseq",1.4,https://raw.githubusercontent.com/nf-core/methylseq/1.4//main.nf,5123fc239af84cd0357ddfe1ca1f582d1f7412f08ecaa9bb2b8f19160bd75cfb,false,16091,46,0.0040802500000000005,end_turn,"Runs the nf-core/methylseq pipeline, which performs alignment, deduplication, methylation extraction, and quality control analysis on bisulfite-sequencing data." +"#workflow/github.com/sevenbridges-openworkflows/Broad-Best-Practice-Data-pre-processing-CWL1.0-workflow-GATK-4.1.0.0/GATK_4_1_0_0_data_pre_processing_workflow",master,https://raw.githubusercontent.com/sevenbridges-openworkflows/Broad-Best-Practice-Data-pre-processing-CWL1.0-workflow-GATK-4.1.0.0/master//broad-best-practice-data-pre-processing-workflow-4-1-0-0_decomposed.cwl,2f87eaf01d47acf0d70b41609d6faba135e11db5ca337f99ae18c614996e387e,false,7682,35,0.0019642500000000003,end_turn,"Perform data pre-processing by aligning to a reference genome, cleaning up the data, and preparing it for variant calling analysis." +"#workflow/github.com/DataBiosphere/topmed-workflows/UM_variant_caller_wdl",1.32.0,https://raw.githubusercontent.com/DataBiosphere/topmed-workflows/1.32.0/variant-caller/variant-caller-wdl/topmed_freeze3_calling.wdl,03daf00da2f90efd50368bd392f1c32f93b8f48e968dcc573598c9432df5ba21,false,18179,66,0.004627249999999999,end_turn,"Execute the variant caller workflow by creating symlinks for CRAM and CRAI files, configuring the reference files, running the variant detection and merging steps, and optionally performing variant filtering using pedigree information, and finally compressing the output directories into a tar.gz file." +"#workflow/github.com/DataBiosphere/analysis_pipeline_WDL/vcf-to-gds-wdl",v7.1.1,https://raw.githubusercontent.com/DataBiosphere/analysis_pipeline_WDL/v7.1.1/vcf-to-gds/vcf-to-gds.wdl,e92888af471f11793f2c5c4fa7f8da825dfb51603aa6b7f6ecb8d51c9a815fd7,false,3195,41,8.5E-4,end_turn,"Converts VCF files to GDS files, assigns unique variant IDs, and optionally checks the GDS files against the original VCF files." +"#workflow/github.com/DataBiosphere/analysis_pipeline_WDL/ld-pruning-wdl",v7.1.1,https://raw.githubusercontent.com/DataBiosphere/analysis_pipeline_WDL/v7.1.1/ld-pruning/ld-pruning.wdl,eabd52bf3f223c58ad220a418f87f2ec23d2dd91958b61f12066144a8a84a444,false,4460,41,0.0011662500000000002,end_turn,"Calculates linkage disequilibrium, subsets GDS files, merges the subsetted files, and checks the merged file against the inputs." +"#workflow/github.com/AnalysisCommons/genesis_wdl/genesis_GWAS",v1_5,https://raw.githubusercontent.com/AnalysisCommons/genesis_wdl/v1_5//genesis_GWAS.wdl,9c0c50df22bb95869dcc66d77693ecaf702b980878f38feca3ebd75e00eb9fbe,false,4265,43,0.00112,end_turn,"Execute the null model generation, association testing, and summarization tasks to perform a genome-wide association study (GWAS) using the GENESIS biostatistical package." +"#workflow/github.com/aofarrel/covstats-wdl",master,https://raw.githubusercontent.com/aofarrel/covstats-wdl/master/covstats/covstats.wdl,49dec26c695bbb88e0e1198e04c6857f497c80b8b59f8bd86377eaf76ee74a4a,false,2532,42,6.855E-4,end_turn,"Perform read length and coverage analysis on input BAM/CRAM files, generate a report summarizing the results, and handle various file types and runtime configurations." +"#workflow/github.com/broadinstitute/warp/Optimus",aa-PD2413,https://raw.githubusercontent.com/broadinstitute/warp/aa-PD2413/pipelines/skylab/optimus/Optimus.wdl,fb1b9fbd4be73e7210dec444d446c7405afcbcb11f9030391b5e63dd9defe4b6,false,3305,66,9.0875E-4,end_turn,"Imports necessary WDL workflows, defines input parameters, performs input checks, splits FASTQ files, aligns reads, merges BAM files, calculates gene and cell metrics, generates sparse count matrix, runs EmptyDrops, and produces an H5AD output file." +"#workflow/github.com/theiagen/terra_utilities/Concatenate_Column_Content",v1.4.1,https://raw.githubusercontent.com/theiagen/terra_utilities/v1.4.1/workflows/wf_cat_column.wdl,e080a455cb741c152c056e55af55cb42b2b3b46e24cb7185c2e1a6f1b74389bc,false,225,33,9.750000000000001E-5,end_turn,"Import task files, concatenate column content, capture versioning, and output the concatenated files and versioning details." +"#workflow/github.com/gatk-workflows/seq-format-conversion/BAM-to-Unmapped-BAM",3.0.0,https://raw.githubusercontent.com/gatk-workflows/seq-format-conversion/3.0.0//bam-to-unmapped-bams.wdl,d75de73b26fd49d71a29e4709f77c4decb2b51209d169c1b3fa6fa427f53dd04,false,1266,43,3.7025E-4,end_turn,"Converts a BAM file into unmapped BAMs by reverting the BAM, sorting the unmapped BAMs, and outputting the sorted unmapped BAMs." +"#notebook/github.com/denis-yuen/test-notebooks/ibm-tax-maps",0.2,https://raw.githubusercontent.com/denis-yuen/test-notebooks/0.2/ibm-et/jupyter-samples/tax-maps/Interactive_Data_Maps.ipynb,7ffefdbf8c4ab6333b9ad78c6b811365365e2ae433bca2acf261cc1209c59027,false,107540,42,0.026937500000000003,end_turn,"This notebook analyzes state tax data from the US Census Bureau, creates interactive maps to visualize the data, and provides insights into the tax revenue collected by different states." +quay.io/pancancer/pcawg-sanger-cgp-workflow,2.1.0,https://raw.githubusercontent.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/2.1.0//Dockstore.cwl,5eb9e182fc2e313606a9445604a4b47e0546ed17a641448da12fcc0371ead3d8,false,2573,57,7.145000000000001E-4,end_turn,Execute the Seqware-Sanger-Somatic-Workflow command-line tool to perform somatic variant calling on tumor and normal whole-genome sequencing data using the PCAWG Sanger variant calling workflow. +github.com/dockstore/dockstore-tool-bamstats/bamstats_sort_cwl,1.25-9,https://raw.githubusercontent.com/dockstore/dockstore-tool-bamstats/1.25-9//bamstats_sort.cwl,1fd7d8637cb91e031a6db604898f5728e51f4591bf4acfbb75ecefd0bfda3448,false,426,37,1.5275E-4,end_turn,Utilize the commandlinetool to execute the sort command within a Docker container and generate a sorted file based on the specified key positions. diff --git a/topicgenerator/results/generated-topics_GPT_4O_MINI_20240926T155659Z.csv b/topicgenerator/results/generated-topics_GPT_4O_MINI_20240926T155659Z.csv new file mode 100644 index 00000000..e9fa189a --- /dev/null +++ b/topicgenerator/results/generated-topics_GPT_4O_MINI_20240926T155659Z.csv @@ -0,0 +1,24 @@ +trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,cost,finishReason,aiTopic +"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/COVID-19-PE-ARTIC-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/main//pe-artic-variation.ga,dcc2761eb35156d7d09479112daf089439774fc29938f02cb6ee8cda87906758,false,19558,28,0.0029504999999999996,stop,"Execute a bioinformatics workflow for analyzing ARTIC PE data by mapping, trimming, filtering, and variant calling. " +"#workflow/github.com/iwc-workflows/sars-cov-2-variation-reporting/COVID-19-VARIATION-REPORTING",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-variation-reporting/main//variation-reporting.ga,2dce46106d669248d5858b56269c9cbc26057acfdb121f693dd6321d0350105c,false,19211,26,0.0028972499999999997,stop,Generate tabular and graphical reports from VCF variant data for COVID-19 analysis based on specified filters. +"#workflow/github.com/iwc-workflows/sars-cov-2-ont-artic-variant-calling/COVID-19-ARTIC-ONT",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-ont-artic-variant-calling/main//ont-artic-variation.ga,03c9318d50df9a3c2d725da77f9a175ce7eb78e264cc4a40fa17d48a99dd124b,false,16089,31,0.0024319499999999996,stop,"Process ONT-sequenced ARTIC data through read filtering, mapping, variant calling, and annotation for COVID-19 analysis." +"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-wgs-variant-calling/COVID-19-PE-WGS-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-wgs-variant-calling/main//pe-wgs-variation.ga,f2357986dd72af73efb2b8110f0f08f6f715017518498edda61d50d23b2e560f,false,8981,29,0.00136455,stop,Perform paired-end read mapping with bwa-mem and variant calling using lofreq for COVID-19 WGS data analysis. +"#workflow/github.com/iwc-workflows/sars-cov-2-se-illumina-wgs-variant-calling/COVID-19-SE-WGS-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-se-illumina-wgs-variant-calling/main//se-wgs-variation.ga,89993f9570eebd53983cdf4f8e4e0f44631f536e855b2e0741f4a5b907858e8d,false,7462,32,0.0011385,stop,"Perform single-end read mapping with Bowtie2, followed by variant calling using Lofreq for COVID-19 WGS SE data." +"#workflow/github.com/iwc-workflows/sars-cov-2-consensus-from-variation/COVID-19-CONSENSUS-CONSTRUCTION",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-consensus-from-variation/main//consensus-from-variation.ga,cb8acce2a1b0d059b117ae3737307be82e81807dad3efcbfe6c8f6a09cbd2798,false,12163,22,0.00183765,stop,Construct a consensus sequence from high-frequency variants while masking low-coverage and ambiguous sites. +"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-artic-ivar-analysis/SARS-COV-2-ILLUMINA-AMPLICON-IVAR-PANGOLIN-NEXTCLADE",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-artic-ivar-analysis/main//pe-wgs-ivar-analysis.ga,11e9e375b01f1a5b33fb64b184798bc6e46181201a255cb23147a7f021007864,false,10929,33,0.0016591499999999999,stop,"Annotate SARS-CoV-2 variants using Illumina data, classify with pangolin and nextclade, while ensuring quality control." +"#workflow/github.com/iwc-workflows/parallel-accession-download/main",main,https://raw.githubusercontent.com/iwc-workflows/parallel-accession-download/main//parallel-accession-download.ga,3e9aee6218674651a981b3437b9ca8ec294ff492984b153db0526989e716e674,false,2882,30,4.503E-4,stop,"Download FASTQ files from sequencing run accessions listed in a text file using fasterq-dump, processing each accession separately." +"#workflow/github.com/nf-core/rnaseq",1.4.2,https://raw.githubusercontent.com/nf-core/rnaseq/1.4.2//main.nf,0d0db0adf13e907e33e44b1357486cf57747ad78c3a5e31f187919d844530fe9,false,19552,25,0.0029477999999999996,stop,"Execute a comprehensive RNA-seq analysis workflow, including read processing, alignment, and quantification steps." +"#workflow/github.com/nf-core/vipr",master,https://raw.githubusercontent.com/nf-core/vipr/master//main.nf,a8b50e8afa5730e6ede16e4588cec2f03c3ae881385daed22edb3e8376af5793,false,3706,28,5.726999999999999E-4,stop,Run the viPR pipeline for viral amplicon analysis and variant calling using Nextflow with specified input and configuration. +"#workflow/github.com/nf-core/methylseq",1.4,https://raw.githubusercontent.com/nf-core/methylseq/1.4//main.nf,5123fc239af84cd0357ddfe1ca1f582d1f7412f08ecaa9bb2b8f19160bd75cfb,false,12820,24,0.0019373999999999997,stop,Execute the nf-core/methylseq pipeline for methylation analysis using specified aligners and parameters. +"#workflow/github.com/sevenbridges-openworkflows/Broad-Best-Practice-Data-pre-processing-CWL1.0-workflow-GATK-4.1.0.0/GATK_4_1_0_0_data_pre_processing_workflow",master,https://raw.githubusercontent.com/sevenbridges-openworkflows/Broad-Best-Practice-Data-pre-processing-CWL1.0-workflow-GATK-4.1.0.0/master//broad-best-practice-data-pre-processing-workflow-4-1-0-0_decomposed.cwl,2f87eaf01d47acf0d70b41609d6faba135e11db5ca337f99ae18c614996e387e,false,6569,21,9.979499999999998E-4,stop,Prepare data for variant calling analysis through alignment to reference genome and data cleanup operations. +"#workflow/github.com/DataBiosphere/topmed-workflows/UM_variant_caller_wdl",1.32.0,https://raw.githubusercontent.com/DataBiosphere/topmed-workflows/1.32.0/variant-caller/variant-caller-wdl/topmed_freeze3_calling.wdl,03daf00da2f90efd50368bd392f1c32f93b8f48e968dcc573598c9432df5ba21,false,14310,27,0.0021627,stop,"Execute the TopMed Variant Caller workflow to process CRAM files, calculate contamination, and generate variant calling outputs." +"#workflow/github.com/DataBiosphere/analysis_pipeline_WDL/vcf-to-gds-wdl",v7.1.1,https://raw.githubusercontent.com/DataBiosphere/analysis_pipeline_WDL/v7.1.1/vcf-to-gds/vcf-to-gds.wdl,e92888af471f11793f2c5c4fa7f8da825dfb51603aa6b7f6ecb8d51c9a815fd7,false,2650,29,4.1489999999999995E-4,stop,"Convert VCF files to GDS format, generate unique variant IDs, and optionally check GDS against VCF input." +"#workflow/github.com/DataBiosphere/analysis_pipeline_WDL/ld-pruning-wdl",v7.1.1,https://raw.githubusercontent.com/DataBiosphere/analysis_pipeline_WDL/v7.1.1/ld-pruning/ld-pruning.wdl,eabd52bf3f223c58ad220a418f87f2ec23d2dd91958b61f12066144a8a84a444,false,3605,28,5.5755E-4,stop,"Execute linkage disequilibrium pruning, subset variants, merge GDS files, and check the merged output for accuracy." +"#workflow/github.com/AnalysisCommons/genesis_wdl/genesis_GWAS",v1_5,https://raw.githubusercontent.com/AnalysisCommons/genesis_wdl/v1_5//genesis_GWAS.wdl,9c0c50df22bb95869dcc66d77693ecaf702b980878f38feca3ebd75e00eb9fbe,false,3553,25,5.4795E-4,stop,"Execute a GWAS workflow that builds a null model, conducts genetic association tests, and summarizes results." +"#workflow/github.com/aofarrel/covstats-wdl",master,https://raw.githubusercontent.com/aofarrel/covstats-wdl/master/covstats/covstats.wdl,49dec26c695bbb88e0e1198e04c6857f497c80b8b59f8bd86377eaf76ee74a4a,false,2165,27,3.4094999999999997E-4,stop,"Execute a workflow to calculate read length and coverage from BAM or CRAM files, generating a report with statistics." +"#workflow/github.com/broadinstitute/warp/Optimus",aa-PD2413,https://raw.githubusercontent.com/broadinstitute/warp/aa-PD2413/pipelines/skylab/optimus/Optimus.wdl,fb1b9fbd4be73e7210dec444d446c7405afcbcb11f9030391b5e63dd9defe4b6,false,2461,26,3.8474999999999995E-4,stop,"Process 10x Genomics sequencing data by performing input checks, aligning reads, and generating output files." +"#workflow/github.com/theiagen/terra_utilities/Concatenate_Column_Content",v1.4.1,https://raw.githubusercontent.com/theiagen/terra_utilities/v1.4.1/workflows/wf_cat_column.wdl,e080a455cb741c152c056e55af55cb42b2b3b46e24cb7185c2e1a6f1b74389bc,false,174,17,3.63E-5,stop,Concatenate specified files and capture versioning information for analysis. +"#workflow/github.com/gatk-workflows/seq-format-conversion/BAM-to-Unmapped-BAM",3.0.0,https://raw.githubusercontent.com/gatk-workflows/seq-format-conversion/3.0.0//bam-to-unmapped-bams.wdl,d75de73b26fd49d71a29e4709f77c4decb2b51209d169c1b3fa6fa427f53dd04,false,1019,26,1.6844999999999997E-4,stop,Convert BAM files to sorted unmapped BAMs using GATK with specified runtime parameters and output configuration. +"#notebook/github.com/denis-yuen/test-notebooks/ibm-tax-maps",0.2,https://raw.githubusercontent.com/denis-yuen/test-notebooks/0.2/ibm-et/jupyter-samples/tax-maps/Interactive_Data_Maps.ipynb,7ffefdbf8c4ab6333b9ad78c6b811365365e2ae433bca2acf261cc1209c59027,false,86121,26,0.012933749999999997,stop,Create interactive data maps using folium to visualize U.S. state tax collections and their categories with minimal coding. +quay.io/pancancer/pcawg-sanger-cgp-workflow,2.1.0,https://raw.githubusercontent.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/2.1.0//Dockstore.cwl,5eb9e182fc2e313606a9445604a4b47e0546ed17a641448da12fcc0371ead3d8,false,2108,27,3.324E-4,stop,Execute the PCAWG Sanger variant calling workflow using aligned tumor and normal BAM files for somatic variant analysis. +github.com/dockstore/dockstore-tool-bamstats/bamstats_sort_cwl,1.25-9,https://raw.githubusercontent.com/dockstore/dockstore-tool-bamstats/1.25-9//bamstats_sort.cwl,1fd7d8637cb91e031a6db604898f5728e51f4591bf4acfbb75ecefd0bfda3448,false,360,22,6.72E-5,stop,Execute a sorting command on input files using the BAMStats tool in a Docker container. diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorClient.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorClient.java index 8d911314..c68d49a1 100644 --- a/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorClient.java +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorClient.java @@ -2,25 +2,16 @@ import static io.dockstore.utils.ConfigFileUtils.getConfiguration; import static io.dockstore.utils.DockstoreApiClientUtils.setupApiClient; +import static io.dockstore.utils.ExceptionHandler.CLIENT_ERROR; import static io.dockstore.utils.ExceptionHandler.GENERIC_ERROR; import static io.dockstore.utils.ExceptionHandler.IO_ERROR; +import static io.dockstore.utils.ExceptionHandler.errorMessage; import static io.dockstore.utils.ExceptionHandler.exceptionMessage; import com.beust.jcommander.JCommander; import com.beust.jcommander.MissingCommandException; import com.beust.jcommander.ParameterException; import com.google.common.collect.Lists; -import com.knuddels.jtokkit.Encodings; -import com.knuddels.jtokkit.api.Encoding; -import com.knuddels.jtokkit.api.EncodingRegistry; -import com.knuddels.jtokkit.api.EncodingResult; -import com.knuddels.jtokkit.api.ModelType; -import com.theokanning.openai.completion.chat.ChatCompletionChoice; -import com.theokanning.openai.completion.chat.ChatCompletionRequest; -import com.theokanning.openai.completion.chat.ChatCompletionResult; -import com.theokanning.openai.completion.chat.ChatMessage; -import com.theokanning.openai.completion.chat.ChatMessageRole; -import com.theokanning.openai.service.OpenAiService; import io.dockstore.common.NextflowUtilities; import io.dockstore.openapi.client.ApiClient; import io.dockstore.openapi.client.ApiException; @@ -31,37 +22,35 @@ import io.dockstore.openapi.client.model.ToolVersion.DescriptorTypeEnum; import io.dockstore.openapi.client.model.UpdateAITopicRequest; import io.dockstore.topicgenerator.client.cli.TopicGeneratorCommandLineArgs.GenerateTopicsCommand; +import io.dockstore.topicgenerator.client.cli.TopicGeneratorCommandLineArgs.GenerateTopicsCommand.InputCsvHeaders; import io.dockstore.topicgenerator.client.cli.TopicGeneratorCommandLineArgs.GenerateTopicsCommand.OutputCsvHeaders; import io.dockstore.topicgenerator.client.cli.TopicGeneratorCommandLineArgs.UploadTopicsCommand; +import io.dockstore.topicgenerator.helper.AIModel; +import io.dockstore.topicgenerator.helper.AIModel.AIResponseInfo; +import io.dockstore.topicgenerator.helper.AIModelType; +import io.dockstore.topicgenerator.helper.AnthropicClaudeModel; +import io.dockstore.topicgenerator.helper.CSVHelper; import io.dockstore.topicgenerator.helper.ChuckNorrisFilter; -import io.dockstore.topicgenerator.helper.OpenAIHelper; +import io.dockstore.topicgenerator.helper.OpenAIModel; import io.dockstore.topicgenerator.helper.StringFilter; -import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; -import java.io.Reader; import java.nio.charset.StandardCharsets; import java.time.Instant; import java.time.temporal.ChronoUnit; -import java.util.HashMap; import java.util.List; import java.util.Optional; import org.apache.commons.configuration2.INIConfiguration; import org.apache.commons.csv.CSVFormat; import org.apache.commons.csv.CSVPrinter; import org.apache.commons.csv.CSVRecord; +import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class TopicGeneratorClient { public static final String OUTPUT_FILE_PREFIX = "generated-topics"; private static final Logger LOG = LoggerFactory.getLogger(TopicGeneratorClient.class); - // Using GPT 3.5-turbo because it's cheaper and faster than GPT4, and it takes more tokens (16k). GPT4 seems to cause a lot of timeouts. - // https://platform.openai.com/docs/models/gpt-3-5-turbo - private static final ModelType AI_MODEL = ModelType.GPT_3_5_TURBO; - private static final int MAX_CONTEXT_LENGTH = 16385; - private static final EncodingRegistry REGISTRY = Encodings.newDefaultEncodingRegistry(); - private static final Encoding ENCODING = REGISTRY.getEncodingForModel(AI_MODEL); private final List stringFilters = Lists.newArrayList(new ChuckNorrisFilter("en"), new ChuckNorrisFilter("fr-CA-u-sd-caqc")); TopicGeneratorClient() { @@ -99,7 +88,7 @@ public static void main(String[] args) { if ("generate-topics".equals(jCommander.getParsedCommand())) { // Read CSV file - topicGeneratorClient.generateTopics(topicGeneratorConfig, generateTopicsCommand.getEntriesCsvFilePath()); + topicGeneratorClient.generateTopics(topicGeneratorConfig, generateTopicsCommand.getEntriesCsvFilePath(), generateTopicsCommand.getAiModel()); } else if ("upload-topics".equals(jCommander.getParsedCommand())) { // Read CSV file topicGeneratorClient.uploadTopics(topicGeneratorConfig, uploadTopicsCommand.getAiTopicsCsvFilePath()); @@ -108,21 +97,35 @@ public static void main(String[] args) { } /** - * Generates a topic for public entries by asking the GPT-3.5-turbo-16k AI model to summarize the content of the entry's primary descriptor. + * Generates a topic for public entries by asking the AI model to summarize the content of the entry's primary descriptor. * @param topicGeneratorConfig * @param inputCsvFilePath */ - private void generateTopics(TopicGeneratorConfig topicGeneratorConfig, String inputCsvFilePath) { + private void generateTopics(TopicGeneratorConfig topicGeneratorConfig, String inputCsvFilePath, AIModelType aiModelType) { final ApiClient apiClient = setupApiClient(topicGeneratorConfig.dockstoreServerUrl()); final Ga4Ghv20Api ga4Ghv20Api = new Ga4Ghv20Api(apiClient); - final OpenAiService openAiService = new OpenAiService(topicGeneratorConfig.openaiApiKey()); - final String outputFileName = OUTPUT_FILE_PREFIX + "_" + AI_MODEL + "_" + Instant.now().truncatedTo(ChronoUnit.SECONDS).toString().replace("-", "").replace(":", "") + ".csv"; - final Iterable entriesCsvRecords = readCsvFile(inputCsvFilePath, GenerateTopicsCommand.InputCsvHeaders.class); + + AIModel aiModel = null; + if (aiModelType == AIModelType.CLAUDE_3_HAIKU || aiModelType == AIModelType.CLAUDE_3_5_SONNET) { + aiModel = new AnthropicClaudeModel(aiModelType); + } else if (aiModelType == AIModelType.GPT_4O_MINI) { + if (StringUtils.isEmpty(topicGeneratorConfig.openaiApiKey())) { + errorMessage("OpenAI API key is required in the config file to use an OpenAI model", CLIENT_ERROR); + } + aiModel = new OpenAIModel(topicGeneratorConfig.openaiApiKey(), aiModelType); + } else { + errorMessage("Invalid AI model type", CLIENT_ERROR); + } + + LOG.info("Generating topics using {}", aiModelType.getModelId()); + + final String outputFileName = OUTPUT_FILE_PREFIX + "_" + aiModelType + "_" + Instant.now().truncatedTo(ChronoUnit.SECONDS).toString().replace("-", "").replace(":", "") + ".csv"; + final Iterable entriesCsvRecords = CSVHelper.readFile(inputCsvFilePath, InputCsvHeaders.class); try (CSVPrinter csvPrinter = new CSVPrinter(new FileWriter(outputFileName, StandardCharsets.UTF_8), CSVFormat.DEFAULT.builder().setHeader(OutputCsvHeaders.class).build())) { for (CSVRecord entry: entriesCsvRecords) { - final String trsId = entry.get(GenerateTopicsCommand.InputCsvHeaders.trsId); - final String versionId = entry.get(GenerateTopicsCommand.InputCsvHeaders.version); + final String trsId = entry.get(InputCsvHeaders.trsId); + final String versionId = entry.get(InputCsvHeaders.version); // Get descriptor file content and entry type FileWrapper descriptorFile; @@ -146,74 +149,26 @@ private void generateTopics(TopicGeneratorConfig topicGeneratorConfig, String in continue; } - // Create ChatGPT request + // Create AI request try { - getAiGeneratedTopicAndRecordToCsv(openAiService, csvPrinter, trsId, versionId, entryType, descriptorFile); - LOG.info("Generated topic for entry with TRS ID {} and version {}", trsId, versionId); + String prompt = "Summarize the " + entryType + " in one sentence that starts with a verb in the tags. Use a maximum of 150 characters.\n" + descriptorFile.getContent() + ""; + Optional aiResponseInfo = aiModel.generateTopic(prompt); + if (aiResponseInfo.isPresent()) { + CSVHelper.writeRecord(csvPrinter, trsId, versionId, descriptorFile, aiResponseInfo.get()); + LOG.info("Generated topic for entry with TRS ID {} and version {}", trsId, versionId); + } else { + LOG.error("Unable to generate topic for entry with TRS ID {} and version {}, skipping", trsId, versionId); + } } catch (Exception ex) { LOG.error("Unable to generate topic for entry with TRS ID {} and version {}, skipping", trsId, versionId, ex); } } + LOG.info("View generated topics in file {}", outputFileName); } catch (IOException e) { exceptionMessage(e, "Unable to create new CSV output file", IO_ERROR); } } - /** - * Generates a topic for the entry by asking the GPT-3.5-turbo-16k AI model to summarize the contents of the entry's primary descriptor. - * Records the result in a CSV file. - * @param openAiService - * @param csvPrinter - * @param trsId - * @param versionId - * @param entryType - * @param descriptorFile - */ - private void getAiGeneratedTopicAndRecordToCsv(OpenAiService openAiService, CSVPrinter csvPrinter, String trsId, String versionId, String entryType, FileWrapper descriptorFile) { - // A character limit is specified but ChatGPT doesn't follow it strictly - final String systemPrompt = "Summarize the " + entryType + " in one sentence that starts with a verb. Use a maximum of 150 characters."; - final ChatMessage systemMessage = new ChatMessage(ChatMessageRole.SYSTEM.value(), systemPrompt); - // The sum of the number of tokens in the request and response cannot exceed the model's maximum context length. - final int maxResponseTokens = 100; // One token is roughly 4 characters. Using 100 tokens because setting it too low might truncate the response - // Chat completion API calls include additional tokens for message-based formatting. Calculate how long the descriptor content can be and truncate if needed - final int maxUserMessageTokens = OpenAIHelper.getMaximumAmountOfTokensForUserMessageContent(REGISTRY, AI_MODEL, MAX_CONTEXT_LENGTH, systemMessage, maxResponseTokens); - final EncodingResult encoded = ENCODING.encode(descriptorFile.getContent(), maxUserMessageTokens); // Encodes the content up to the maximum number of tokens specified - final String truncatedDescriptorContent = ENCODING.decode(encoded.getTokens()); // Decode the tokens to get the truncated content string - - final ChatMessage userMessage = new ChatMessage(ChatMessageRole.USER.value(), truncatedDescriptorContent); - final List messages = List.of(systemMessage, userMessage); - - ChatCompletionRequest chatCompletionRequest = ChatCompletionRequest - .builder() - .model(AI_MODEL.getName()) - .messages(messages) - .n(1) - .maxTokens(maxResponseTokens) - .logitBias(new HashMap<>()) - .build(); - final ChatCompletionResult chatCompletionResult = openAiService.createChatCompletion(chatCompletionRequest); - - if (chatCompletionResult.getChoices().isEmpty()) { - // I don't think this should happen, but check anyway - LOG.error("There was no chat completion choices, skipping"); - return; - } - - final ChatCompletionChoice chatCompletionChoice = chatCompletionResult.getChoices().get(0); - final String aiGeneratedTopic = chatCompletionChoice.getMessage().getContent(); - final String finishReason = chatCompletionChoice.getFinishReason(); - final long promptTokens = chatCompletionResult.getUsage().getPromptTokens(); - final long completionTokens = chatCompletionResult.getUsage().getCompletionTokens(); - String descriptorFileChecksum = descriptorFile.getChecksum().isEmpty() ? "" : descriptorFile.getChecksum().get(0).getChecksum(); - - // Write response to new CSV file - try { - csvPrinter.printRecord(trsId, versionId, descriptorFile.getUrl(), descriptorFileChecksum, encoded.isTruncated(), promptTokens, completionTokens, finishReason, aiGeneratedTopic); - } catch (IOException e) { - LOG.error("Unable to write CSV record to file, skipping", e); - } - } - private Optional getNextflowMainScript(String nextflowConfigFileContent, Ga4Ghv20Api ga4Ghv20Api, String trsId, String versionId, DescriptorTypeEnum descriptorType) { final String mainScriptPath = NextflowUtilities.grabConfig(nextflowConfigFileContent).getString("manifest.mainScript", "main.nf"); try { @@ -227,12 +182,12 @@ private Optional getNextflowMainScript(String nextflowConfigFileCon private void uploadTopics(TopicGeneratorConfig topicGeneratorConfig, String inputCsvFilePath) { final ApiClient apiClient = setupApiClient(topicGeneratorConfig.dockstoreServerUrl(), topicGeneratorConfig.dockstoreToken()); final ExtendedGa4GhApi extendedGa4GhApi = new ExtendedGa4GhApi(apiClient); - final Iterable entriesWithAITopics = readCsvFile(inputCsvFilePath, GenerateTopicsCommand.OutputCsvHeaders.class); + final Iterable entriesWithAITopics = CSVHelper.readFile(inputCsvFilePath, OutputCsvHeaders.class); for (CSVRecord entryWithAITopic: entriesWithAITopics) { // This command's input CSV headers are the generate-topic command's output headers - final String trsId = entryWithAITopic.get(GenerateTopicsCommand.OutputCsvHeaders.trsId); - final String aiTopic = entryWithAITopic.get(GenerateTopicsCommand.OutputCsvHeaders.aiTopic); + final String trsId = entryWithAITopic.get(OutputCsvHeaders.trsId); + final String aiTopic = entryWithAITopic.get(OutputCsvHeaders.aiTopic); boolean caughtByFilter = assessTopic(aiTopic); if (caughtByFilter) { LOG.info("Topic for {} was deemed offensive, please review above", trsId); @@ -258,20 +213,8 @@ private boolean assessTopic(String aiTopic) { return false; } - private Iterable readCsvFile(String inputCsvFilePath, Class> csvHeaders) { - // Read CSV file - Iterable csvRecords = null; - try { - final Reader entriesCsv = new FileReader(inputCsvFilePath); - CSVFormat csvFormat = CSVFormat.DEFAULT.builder() - .setHeader(csvHeaders) - .setSkipHeaderRecord(true) - .setTrim(true) - .build(); - csvRecords = csvFormat.parse(entriesCsv); - } catch (IOException e) { - exceptionMessage(e, "Unable to read input CSV file", IO_ERROR); - } - return csvRecords; + public static String removeSummaryTagsFromTopic(String aiTopic) { + String cleanedTopic = StringUtils.removeStart(aiTopic, ""); + return StringUtils.removeEnd(cleanedTopic, ""); } } diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorCommandLineArgs.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorCommandLineArgs.java index a8f4f29f..f05d6506 100644 --- a/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorCommandLineArgs.java +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/client/cli/TopicGeneratorCommandLineArgs.java @@ -2,6 +2,7 @@ import com.beust.jcommander.Parameter; import com.beust.jcommander.Parameters; +import io.dockstore.topicgenerator.helper.AIModelType; import java.io.File; public class TopicGeneratorCommandLineArgs { @@ -28,10 +29,17 @@ public static class GenerateTopicsCommand { @Parameter(names = {"-e", "--entries"}, description = "The file path to the CSV file containing the TRS ID, and version name of the entries to generate topics for. The first line of the file should contain the CSV fields: trsID,version") private String entriesCsvFilePath = "./" + DEFAULT_ENTRIES_FILE_NAME; + @Parameter(names = {"-m", "--model"}, description = "The AI model to use") + private AIModelType aiModelType = AIModelType.CLAUDE_3_HAIKU; + public String getEntriesCsvFilePath() { return entriesCsvFilePath; } + public AIModelType getAiModel() { + return aiModelType; + } + /** * Headers for the input data file of entries to generate AI topics for. */ @@ -50,6 +58,7 @@ public enum OutputCsvHeaders { isTruncated, // Whether the descriptor file content truncated because it exceeded the token maximum promptTokens, // Number of tokens in prompt completionTokens, // Number of tokens in response + cost, // Estimated cost of the prompt and completion tokens finishReason, // The reason that the response stopped aiTopic } diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AIModel.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AIModel.java new file mode 100644 index 00000000..a8d51fff --- /dev/null +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AIModel.java @@ -0,0 +1,60 @@ +package io.dockstore.topicgenerator.helper; + +import java.util.Optional; + +/** + * An AI model that generates topics. + */ +public abstract class AIModel { + // The sum of the number of tokens in the request and response cannot exceed the model's maximum context length. + public static final int MAX_RESPONSE_TOKENS = 100; // One token is roughly 4 characters. Using 100 tokens because setting it too low might truncate the response + private final String modelName; + private final double pricePer1kInputTokens; + private final double pricePer1kOutputTokens; + private final int maxContextLength; + + public AIModel(AIModelType modelType) { + this.modelName = modelType.getModelId(); + this.pricePer1kInputTokens = modelType.getPricePer1kInputTokens(); + this.pricePer1kOutputTokens = modelType.getPricePer1kOutputTokens(); + this.maxContextLength = modelType.getMaxContextLength(); + } + + public String getModelName() { + return modelName; + } + + public double getPricePer1kInputTokens() { + return pricePer1kInputTokens; + } + + public double getPricePer1kOutputTokens() { + return pricePer1kOutputTokens; + } + + public int getMaxContextLength() { + return maxContextLength; + } + + @SuppressWarnings("checkstyle:magicnumber") + public double calculatePrice(long inputTokens, long outputTokens) { + return (((double)inputTokens / 1000) * pricePer1kInputTokens) + (((double)outputTokens / 1000) * pricePer1kOutputTokens); + } + + /** + * Generate an AI topic using the contents of the descriptor file. + * + * @return + */ + public abstract Optional generateTopic(String prompt); + + public int estimateTokens(String prompt) { + // AWS Bedrock suggests using 6 characters per token as an estimation + // https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prepare.html + final int estimatedCharactersPerToken = 6; + return prompt.length() / estimatedCharactersPerToken; + } + + public record AIResponseInfo(String aiTopic, boolean isTruncated, long inputTokens, long outputTokens, double cost, String stopReason) { + } +} diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AIModelType.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AIModelType.java new file mode 100644 index 00000000..4c45d8f7 --- /dev/null +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AIModelType.java @@ -0,0 +1,37 @@ +package io.dockstore.topicgenerator.helper; + +import com.knuddels.jtokkit.api.ModelType; + +public enum AIModelType { + CLAUDE_3_5_SONNET("anthropic.claude-3-5-sonnet-20240620-v1:0", 0.003, 0.015, 200000), + CLAUDE_3_HAIKU("anthropic.claude-3-haiku-20240307-v1:0", 0.00025, 0.00125, 200000), + GPT_4O_MINI(ModelType.GPT_4O_MINI.getName(), 0.000150, 0.000600, ModelType.GPT_4O_MINI.getMaxContextLength()); + + private final String modelId; + private final double pricePer1kInputTokens; + private final double pricePer1kOutputTokens; + private final int maxContextLength; + + AIModelType(String modelId, double pricePer1kInputTokens, double pricePer1kOutputTokens, int maxInputTokens) { + this.modelId = modelId; + this.pricePer1kInputTokens = pricePer1kInputTokens; + this.pricePer1kOutputTokens = pricePer1kOutputTokens; + this.maxContextLength = maxInputTokens; + } + + public String getModelId() { + return modelId; + } + + public double getPricePer1kInputTokens() { + return pricePer1kInputTokens; + } + + public double getPricePer1kOutputTokens() { + return pricePer1kOutputTokens; + } + + public int getMaxContextLength() { + return maxContextLength; + } +} diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AnthropicClaudeModel.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AnthropicClaudeModel.java new file mode 100644 index 00000000..ee17291c --- /dev/null +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/AnthropicClaudeModel.java @@ -0,0 +1,66 @@ +package io.dockstore.topicgenerator.helper; + +import static io.dockstore.topicgenerator.client.cli.TopicGeneratorClient.removeSummaryTagsFromTopic; + +import com.google.gson.Gson; +import io.dockstore.topicgenerator.helper.ClaudeRequest.Message; +import io.dockstore.topicgenerator.helper.ClaudeResponse.Content; +import java.util.List; +import java.util.Optional; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider; +import software.amazon.awssdk.core.SdkBytes; +import software.amazon.awssdk.core.exception.SdkClientException; +import software.amazon.awssdk.services.bedrockruntime.BedrockRuntimeClient; +import software.amazon.awssdk.services.bedrockruntime.model.InvokeModelResponse; + +public class AnthropicClaudeModel extends AIModel { + private static final Logger LOG = LoggerFactory.getLogger(AnthropicClaudeModel.class); + private static final Gson GSON = new Gson(); + + private final BedrockRuntimeClient bedrockRuntimeClient; + + public AnthropicClaudeModel(AIModelType anthropicModel) { + super(anthropicModel); + bedrockRuntimeClient = BedrockRuntimeClient.builder() + .credentialsProvider(DefaultCredentialsProvider.create()) + .build(); + } + + @Override + public Optional generateTopic(String prompt) { + if (estimateTokens(prompt) > getMaxContextLength()) { + prompt = prompt.substring(0, getMaxContextLength()); + } + + final String nativeRequest = createClaudeRequest(prompt); + + try { + // Encode and send the request to the Bedrock Runtime. + InvokeModelResponse response = bedrockRuntimeClient.invokeModel(request -> request + .body(SdkBytes.fromUtf8String(nativeRequest)) + .modelId(this.getModelName()) + ); + + ClaudeResponse claudeResponse = GSON.fromJson(response.body().asUtf8String(), ClaudeResponse.class); + + final String aiGeneratedTopic = claudeResponse.content().get(0).text(); + final String stopReason = claudeResponse.stopReason(); + final long inputTokens = claudeResponse.usage().inputTokens(); + final long outputTokens = claudeResponse.usage().outputTokens(); + + return Optional.of(new AIResponseInfo(removeSummaryTagsFromTopic(aiGeneratedTopic), false, inputTokens, outputTokens, this.calculatePrice(inputTokens, outputTokens), stopReason)); + } catch (SdkClientException e) { + LOG.error("Could not invoke model {}", this.getModelName(), e); + } + return Optional.empty(); + } + + private String createClaudeRequest(String prompt) { + final double temperature = 0.5; + final String anthropicVersion = "bedrock-2023-05-31"; // Must be this value + ClaudeRequest claudeRequest = new ClaudeRequest(anthropicVersion, MAX_RESPONSE_TOKENS, temperature, List.of(new Message("user", List.of(new Content("text", prompt))))); + return GSON.toJson(claudeRequest); + } +} diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/CSVHelper.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/CSVHelper.java new file mode 100644 index 00000000..305a7192 --- /dev/null +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/CSVHelper.java @@ -0,0 +1,49 @@ +package io.dockstore.topicgenerator.helper; + +import static io.dockstore.utils.ExceptionHandler.IO_ERROR; +import static io.dockstore.utils.ExceptionHandler.exceptionMessage; + +import io.dockstore.openapi.client.model.FileWrapper; +import io.dockstore.topicgenerator.helper.AIModel.AIResponseInfo; +import java.io.FileReader; +import java.io.IOException; +import java.io.Reader; +import org.apache.commons.csv.CSVFormat; +import org.apache.commons.csv.CSVPrinter; +import org.apache.commons.csv.CSVRecord; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public final class CSVHelper { + private static final Logger LOG = LoggerFactory.getLogger(CSVHelper.class); + + private CSVHelper() { + // Intentionally empty + } + + public static Iterable readFile(String inputCsvFilePath, Class> csvHeaders) { + // Read CSV file + Iterable csvRecords = null; + try { + final Reader entriesCsv = new FileReader(inputCsvFilePath); + CSVFormat csvFormat = CSVFormat.DEFAULT.builder() + .setHeader(csvHeaders) + .setSkipHeaderRecord(true) + .setTrim(true) + .build(); + csvRecords = csvFormat.parse(entriesCsv); + } catch (IOException e) { + exceptionMessage(e, "Unable to read input CSV file", IO_ERROR); + } + return csvRecords; + } + + public static void writeRecord(CSVPrinter csvPrinter, String trsId, String versionId, FileWrapper descriptorFile, AIResponseInfo aiResponseInfo) { + String descriptorChecksum = descriptorFile.getChecksum().isEmpty() ? "" : descriptorFile.getChecksum().get(0).getChecksum(); + try { + csvPrinter.printRecord(trsId, versionId, descriptorFile.getUrl(), descriptorChecksum, aiResponseInfo.isTruncated(), aiResponseInfo.inputTokens(), aiResponseInfo.outputTokens(), aiResponseInfo.cost(), aiResponseInfo.stopReason(), aiResponseInfo.aiTopic()); + } catch (IOException e) { + LOG.error("Unable to write CSV record to file, skipping", e); + } + } +} diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/ClaudeRequest.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/ClaudeRequest.java new file mode 100644 index 00000000..e5546d08 --- /dev/null +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/ClaudeRequest.java @@ -0,0 +1,11 @@ +package io.dockstore.topicgenerator.helper; + +import com.google.gson.annotations.SerializedName; +import io.dockstore.topicgenerator.helper.ClaudeResponse.Content; +import java.util.List; + +public record ClaudeRequest(@SerializedName(value = "anthropic_version") String anthropicVersion, @SerializedName(value = "max_tokens") int maxTokens, double temperature, List messages) { + public record Message(String role, List content) { + + } +} diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/ClaudeResponse.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/ClaudeResponse.java new file mode 100644 index 00000000..e69dff26 --- /dev/null +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/ClaudeResponse.java @@ -0,0 +1,13 @@ +package io.dockstore.topicgenerator.helper; + +import com.google.gson.annotations.SerializedName; +import java.util.List; + +public record ClaudeResponse(String id, String type, String role, String model, List content, @SerializedName(value = "stop_reason") String stopReason, @SerializedName(value = "stop_sequence") String stopSequence, Usage usage) { + + public record Content(String type, String text) { + } + + public record Usage(@SerializedName(value = "input_tokens") long inputTokens, @SerializedName(value = "output_tokens") long outputTokens) { + } +} diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/OpenAIHelper.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/OpenAIHelper.java index 80ca3823..ee9ce400 100644 --- a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/OpenAIHelper.java +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/OpenAIHelper.java @@ -55,11 +55,11 @@ public static int countMessageTokens(EncodingRegistry registry, String model, Li * @param maxResponseToken * @return */ - public static int getMaximumAmountOfTokensForUserMessageContent(EncodingRegistry registry, ModelType aiModel, int maxContextLength, ChatMessage systemMessage, int maxResponseToken) { + public static int getMaximumAmountOfTokensForUserMessageContent(EncodingRegistry registry, ModelType aiModel, ChatMessage systemMessage, int maxResponseToken) { ChatMessage userMessageWithoutContent = new ChatMessage(ChatMessageRole.USER.value()); List messages = List.of(systemMessage, userMessageWithoutContent); final int tokenCountWithoutUserContent = countMessageTokens(registry, aiModel.getName(), messages); - return maxContextLength - maxResponseToken - tokenCountWithoutUserContent; + return aiModel.getMaxContextLength() - maxResponseToken - tokenCountWithoutUserContent; } } diff --git a/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/OpenAIModel.java b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/OpenAIModel.java new file mode 100644 index 00000000..a5b026a3 --- /dev/null +++ b/topicgenerator/src/main/java/io/dockstore/topicgenerator/helper/OpenAIModel.java @@ -0,0 +1,85 @@ +package io.dockstore.topicgenerator.helper; + +import static io.dockstore.topicgenerator.client.cli.TopicGeneratorClient.removeSummaryTagsFromTopic; + +import com.knuddels.jtokkit.Encodings; +import com.knuddels.jtokkit.api.Encoding; +import com.knuddels.jtokkit.api.EncodingRegistry; +import com.knuddels.jtokkit.api.EncodingResult; +import com.knuddels.jtokkit.api.ModelType; +import com.theokanning.openai.completion.chat.ChatCompletionChoice; +import com.theokanning.openai.completion.chat.ChatCompletionRequest; +import com.theokanning.openai.completion.chat.ChatCompletionResult; +import com.theokanning.openai.completion.chat.ChatMessage; +import com.theokanning.openai.completion.chat.ChatMessageRole; +import com.theokanning.openai.service.OpenAiService; +import java.util.HashMap; +import java.util.List; +import java.util.Optional; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +@Deprecated +public class OpenAIModel extends AIModel { + private static final Logger LOG = LoggerFactory.getLogger(OpenAIModel.class); + private static final EncodingRegistry REGISTRY = Encodings.newDefaultEncodingRegistry(); + + private final OpenAiService openAiService; + private final ModelType aiModel; + private final Encoding encoding; + + + public OpenAIModel(String openaiApiKey, AIModelType aiModelType) { + super(aiModelType); + openAiService = new OpenAiService(openaiApiKey); + aiModel = ModelType.fromName(aiModelType.getModelId()).orElseThrow(() -> new RuntimeException("Invalid OpenAI model type " + aiModelType.getModelId())); + encoding = REGISTRY.getEncodingForModel(aiModel); + } + + /** + * Generates a topic for the entry by asking the AI model to summarize the contents of the entry's primary descriptor. + */ + @Override + public Optional generateTopic(String prompt) { + // Chat completion API calls include additional tokens for message-based formatting. Calculate how long the descriptor content can be and truncate if needed + ChatMessage userMessage = new ChatMessage(ChatMessageRole.USER.value(), prompt); + boolean isPromptTruncated = false; + if (estimateTokens(prompt) > getMaxContextLength()) { + final EncodingResult encoded = encoding.encode(prompt, getMaxContextLength()); // Encodes the prompt up to the maximum number of tokens specified + final String truncatedPrompt = encoding.decode(encoded.getTokens()); // Decode the tokens to get the truncated content string + userMessage.setContent(truncatedPrompt); + isPromptTruncated = encoded.isTruncated(); + } + + final List messages = List.of(userMessage); + + ChatCompletionRequest chatCompletionRequest = ChatCompletionRequest + .builder() + .model(aiModel.getName()) + .messages(messages) + .n(1) + .maxTokens(MAX_RESPONSE_TOKENS) + .logitBias(new HashMap<>()) + .build(); + final ChatCompletionResult chatCompletionResult = openAiService.createChatCompletion(chatCompletionRequest); + + if (chatCompletionResult.getChoices().isEmpty()) { + // I don't think this should happen, but check anyway + LOG.error("There was no chat completion choices, skipping"); + return Optional.empty(); + } + + final ChatCompletionChoice chatCompletionChoice = chatCompletionResult.getChoices().get(0); + final String aiGeneratedTopic = chatCompletionChoice.getMessage().getContent(); + final String finishReason = chatCompletionChoice.getFinishReason(); + final long promptTokens = chatCompletionResult.getUsage().getPromptTokens(); + final long completionTokens = chatCompletionResult.getUsage().getCompletionTokens(); + return Optional.of(new AIResponseInfo(removeSummaryTagsFromTopic(aiGeneratedTopic), isPromptTruncated, promptTokens, completionTokens, this.calculatePrice(promptTokens, completionTokens), finishReason)); + } + + @Override + public int estimateTokens(String prompt) { + ChatMessage userMessage = new ChatMessage(ChatMessageRole.USER.value(), prompt); + return OpenAIHelper.countMessageTokens(REGISTRY, aiModel.getName(), List.of(userMessage)); + } +} diff --git a/topicgenerator/src/test/resources/generated-ai-topics.csv b/topicgenerator/src/test/resources/generated-ai-topics.csv index 40f63724..a6777f86 100644 --- a/topicgenerator/src/test/resources/generated-ai-topics.csv +++ b/topicgenerator/src/test/resources/generated-ai-topics.csv @@ -1,2 +1,2 @@ -trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,finishReason,aiTopic -"#workflow/github.com/dockstore-testing/testWorkflow",master,https://raw.githubusercontent.com/dockstore-testing/testWorkflow/master/Dockstore.cwl,07d68a2bce6118b31018c31a013325cda07030efc92750c036df4ab112849c02,true,16283,50,stop,"The workflow starts by inputting Illumina-sequenced ARTIC data and involves steps such as quality filtering, mapping, variant calling, and filtering mutations using tools like ivar, samtools, lofreq, snpsift, and multiqc." +trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,cost,finishReason,aiTopic +"#workflow/github.com/dockstore-testing/testWorkflow",master,https://raw.githubusercontent.com/dockstore-testing/testWorkflow/master/Dockstore.cwl,07d68a2bce6118b31018c31a013325cda07030efc92750c036df4ab112849c02,true,16283,50,stop,0,"The workflow starts by inputting Illumina-sequenced ARTIC data and involves steps such as quality filtering, mapping, variant calling, and filtering mutations using tools like ivar, samtools, lofreq, snpsift, and multiqc." diff --git a/topicgenerator/src/test/resources/offensive-generated-ai-topics.csv b/topicgenerator/src/test/resources/offensive-generated-ai-topics.csv index 4a6c66c9..e3a2201b 100644 --- a/topicgenerator/src/test/resources/offensive-generated-ai-topics.csv +++ b/topicgenerator/src/test/resources/offensive-generated-ai-topics.csv @@ -1,2 +1,2 @@ -trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,finishReason,aiTopic -"#workflow/github.com/dockstore-testing/testWorkflow",master,https://raw.githubusercontent.com/dockstore-testing/testWorkflow/master/Dockstore.cwl,07d68a2bce6118b31018c31a013325cda07030efc92750c036df4ab112849c02,true,16283,50,stop,"The workflow starts by inputting Illumina-sequenced ARTIC data and involves a voyeur." +trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,cost,finishReason,aiTopic +"#workflow/github.com/dockstore-testing/testWorkflow",master,https://raw.githubusercontent.com/dockstore-testing/testWorkflow/master/Dockstore.cwl,07d68a2bce6118b31018c31a013325cda07030efc92750c036df4ab112849c02,true,16283,50,0,stop,"The workflow starts by inputting Illumina-sequenced ARTIC data and involves a voyeur." diff --git a/topicgenerator/templates/topic-generator.config b/topicgenerator/templates/topic-generator.config index 3a87d52b..2f34fb2d 100644 --- a/topicgenerator/templates/topic-generator.config +++ b/topicgenerator/templates/topic-generator.config @@ -1,7 +1,4 @@ # This is a template topic generator config file for prod. Modify it for different environments. [dockstore] server-url: https://dockstore.org/api -token: YOUR_ADMIN_OR_CURATOR_DOCKSTORE_TOKEN - -[ai] -openai-api-key: YOUR_OPENAI_API_KEY \ No newline at end of file +token: YOUR_ADMIN_OR_CURATOR_DOCKSTORE_TOKEN \ No newline at end of file