Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Anthropic Claude 3 Haiku to generate topics via AWS Bedrock #503

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

kathy-t
Copy link
Contributor

@kathy-t kathy-t commented Sep 26, 2024

Description
Corresponding webservice PR: dockstore/dockstore#6003

This PR updates the topics generator so that it uses AWS Bedrock to invoke the Anthropic Claude 3 Haiku model to generate topics. See the ticket comment for more details about why we want to do this and some benchmarking information.

Along with Claude 3 Haiku, the program can also invoke the Claude 3.5 Sonnet model, which is more intelligent and more expensive. This is mainly for experimentation or if we find that Claude 3 Haiku isn't doing so well for whatever reason.

I deprecated the OpenAI functionality in the program, but it is still accessible if we wish to experiment with it.

I've included a new CSV file in the results directory for the topics generated by Claude 3 Haiku. If you read the ticket, you'll see that these topics are cleaner than the ones I initially generated with the old prompt. So Claude 3 Haiku appears to be sufficient, but do take a look at the topics.

Review Instructions
(will fill in)

Issue
SEAB-6469

Security
If there are any concerns that require extra attention from the security team, highlight them here.

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install in the project that you have modified (until https://ucsc-cgl.atlassian.net/browse/SEAB-5300 adds multi-module support properly)
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • If you are changing dependencies, check with dependabot to ensure you are not introducing new high/critical vulnerabilities
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

@kathy-t kathy-t self-assigned this Sep 26, 2024
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 18.30986% with 116 lines in your changes missing coverage. Please review.

Project coverage is 39.06%. Comparing base (0381ffd) to head (f59679e).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...o/dockstore/topicgenerator/helper/OpenAIModel.java 0.00% 34 Missing ⚠️
...re/topicgenerator/helper/AnthropicClaudeModel.java 0.00% 26 Missing ⚠️
...opicgenerator/client/cli/TopicGeneratorClient.java 11.53% 23 Missing ⚠️
...va/io/dockstore/topicgenerator/helper/AIModel.java 0.00% 14 Missing ⚠️
.../io/dockstore/topicgenerator/helper/CSVHelper.java 57.89% 8 Missing ⚠️
...o/dockstore/topicgenerator/helper/AIModelType.java 71.42% 4 Missing ⚠️
...ockstore/topicgenerator/helper/ClaudeResponse.java 0.00% 3 Missing ⚠️
...dockstore/topicgenerator/helper/ClaudeRequest.java 0.00% 2 Missing ⚠️
...ator/client/cli/TopicGeneratorCommandLineArgs.java 66.66% 1 Missing ⚠️
.../dockstore/topicgenerator/helper/OpenAIHelper.java 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (0381ffd) and HEAD (f59679e). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (0381ffd) HEAD (f59679e)
tooltester 3 1
toolbackup 2 0
topicgenerator 2 1
metricsaggregator 2 1
Additional details and impacted files
@@              Coverage Diff              @@
##             develop     #503      +/-   ##
=============================================
- Coverage      45.70%   39.06%   -6.64%     
+ Complexity       319      282      -37     
=============================================
  Files             50       57       +7     
  Lines           2700     2785      +85     
  Branches         210      214       +4     
=============================================
- Hits            1234     1088     -146     
- Misses          1375     1613     +238     
+ Partials          91       84       -7     
Flag Coverage Δ
metricsaggregator 35.58% <0.00%> (-1.13%) ⬇️
toolbackup ?
tooltester 16.55% <0.00%> (-0.53%) ⬇️
topicgenerator 20.03% <18.30%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm any thoughts about keeping this file around? This was generated using OpenAI and I included it for comparison purposes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to give an example

Copy link
Member

@denis-yuen denis-yuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor documentation questions

pom.xml Outdated
@@ -38,7 +38,7 @@

<github.url>scm:git:git@github.com:dockstore/dockstore-support.git</github.url>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<dockstore-core.version>1.16.0-alpha.16</dockstore-core.version>
<dockstore-core.version>1.16.0-SNAPSHOT</dockstore-core.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be versioned, I think you can use a dated snapshot in a pinch
Like

<dependency>
    <groupId>io.dockstore</groupId>
    <artifactId>bom-internal</artifactId>
    <version>1.16.0-20240926.155529-115</version>
    <type>pom</type>
</dependency>

but I haven't tried it yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll update this once I create a new webservice tag 🙂

"#workflow/github.com/DataBiosphere/analysis_pipeline_WDL/ld-pruning-wdl",v7.1.1,https://raw.githubusercontent.com/DataBiosphere/analysis_pipeline_WDL/v7.1.1/ld-pruning/ld-pruning.wdl,eabd52bf3f223c58ad220a418f87f2ec23d2dd91958b61f12066144a8a84a444,false,4460,41,0.0011662500000000002,end_turn,"Calculates linkage disequilibrium, subsets GDS files, merges the subsetted files, and checks the merged file against the inputs."
"#workflow/github.com/AnalysisCommons/genesis_wdl/genesis_GWAS",v1_5,https://raw.githubusercontent.com/AnalysisCommons/genesis_wdl/v1_5//genesis_GWAS.wdl,9c0c50df22bb95869dcc66d77693ecaf702b980878f38feca3ebd75e00eb9fbe,false,4265,43,0.00112,end_turn,"Execute the null model generation, association testing, and summarization tasks to perform a genome-wide association study (GWAS) using the GENESIS biostatistical package."
"#workflow/github.com/aofarrel/covstats-wdl",master,https://raw.githubusercontent.com/aofarrel/covstats-wdl/master/covstats/covstats.wdl,49dec26c695bbb88e0e1198e04c6857f497c80b8b59f8bd86377eaf76ee74a4a,false,2532,42,6.855E-4,end_turn,"Perform read length and coverage analysis on input BAM/CRAM files, generate a report summarizing the results, and handle various file types and runtime configurations."
"#workflow/github.com/broadinstitute/warp/Optimus",aa-PD2413,https://raw.githubusercontent.com/broadinstitute/warp/aa-PD2413/pipelines/skylab/optimus/Optimus.wdl,fb1b9fbd4be73e7210dec444d446c7405afcbcb11f9030391b5e63dd9defe4b6,false,3305,66,9.0875E-4,end_turn,"Imports necessary WDL workflows, defines input parameters, performs input checks, splits FASTQ files, aligns reads, merges BAM files, calculates gene and cell metrics, generates sparse count matrix, runs EmptyDrops, and produces an H5AD output file."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Imports necessary WDL workflows"
Amusing but I guess it's not wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to give an example

}

private String createClaudeRequest(String prompt) {
final double temperature = 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc or link if there was reasoning behind a temperature of 0.5 , guessing this makes it more predicable less creative in general (compared to 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from their example 😅 their default is 1, let me try that


private String createClaudeRequest(String prompt) {
final double temperature = 0.5;
final String anthropicVersion = "bedrock-2023-05-31"; // Must be this value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for this, 2023 seems on the old side but I'm guessing it really means "includes data up to 2023" rather than "released in 2023"

token: YOUR_ADMIN_OR_CURATOR_DOCKSTORE_TOKEN

[ai]
openai-api-key: YOUR_OPENAI_API_KEY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems since the openai code is present, could leave openai-api-key and just note that it is optional

@david4096
Copy link
Member

This is cool to see. I think it might be worth trying to do a benchmark of the performance of multiple models here and publishing on it.

@@ -0,0 +1,24 @@
trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,cost,finishReason,aiTopic
"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/COVID-19-PE-ARTIC-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/main//pe-artic-variation.ga,dcc2761eb35156d7d09479112daf089439774fc29938f02cb6ee8cda87906758,false,24595,43,0.0062025,end_turn,"Trim ARTIC primer sequences, realign reads, call and filter variants, annotate variants, and apply a strand-bias soft filter to the final annotated variants."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come some of the verbs have s and some don't? Trim here, but Filters on the next line? I guess it depends on the content?

I'd like it if they were all the same, but that's probably just me and probably isn't worth the effort and/or may not even make sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be easy to fix by adding a description of the tense to the prompt, ala

starts with a first-person present tense verb

LOG.info("Generated topic for entry with TRS ID {} and version {}", trsId, versionId);
String prompt = "Summarize the " + entryType + " in one sentence that starts with a verb in the <summary> tags. Use a maximum of 150 characters.\n<content>" + descriptorFile.getContent() + "</content>";
Optional<AIResponseInfo> aiResponseInfo = aiModel.generateTopic(prompt);
if (aiResponseInfo.isPresent()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change, but could use Optional.ifPresentOrElse to convert this conditional to a "safer" form.

@@ -0,0 +1,24 @@
trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,cost,finishReason,aiTopic
"#workflow/github.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/COVID-19-PE-ARTIC-ILLUMINA",main,https://raw.githubusercontent.com/iwc-workflows/sars-cov-2-pe-illumina-artic-variant-calling/main//pe-artic-variation.ga,dcc2761eb35156d7d09479112daf089439774fc29938f02cb6ee8cda87906758,false,24595,43,0.0062025,end_turn,"Trim ARTIC primer sequences, realign reads, call and filter variants, annotate variants, and apply a strand-bias soft filter to the final annotated variants."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be easy to fix by adding a description of the tense to the prompt, ala

starts with a first-person present tense verb

@@ -6,7 +6,7 @@ token: 08932ab0c9ae39a880905666902f8659633ae0232e94ba9f3d2094cb928397e7

[s3]
bucketName: local-dockstore-metrics-data
endpointOverride: http://localhost:4566
endpointOverride: https://s3.localhost.localstack.cloud:4566

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this spurious or does it relate to the ai topics somehow?

Copy link
Member

@denis-yuen denis-yuen Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this was a side-effect of a necessary AWS SDK upgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've commented in this PR as well, but here's the reason why this changed: dockstore/dockstore#6003 (comment)

*
* @return
*/
public abstract Optional<AIResponseInfo> generateTopic(String prompt);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we'll probably use AIModel to submit prompts for other [non-topic-related] purposes, and it looks like the code already would support that, so would it make sense to name generateTopic something more suggestive of the generalized purpose, like submitPrompt? (and similarly, the field with a name that contains topic in the response record)

/**
* An AI model that generates topics.
*/
public abstract class AIModel {
Copy link

@svonworl svonworl Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hierarchies like this, it can be useful to define the core interface as an interface, ala:

public interface AIModel {
    String getModelName();
    double getPricePer1kInputTokens();
    double getPricePer1kOutputTokens();
    double getMaxContentLength();
    [...]
}

Then, you can define the "utility" base abstract class as follows:

public abstract class BaseAIModel implements AIModel {
     // implementations of basic functionality
} 

Model-specific classes can then extend `BaseAIModel:

public class AnthropicClaudeModel extends BaseAIModel {
    // claude-specific implementation
}

What does this buy you? Amongst other things, if you want to implement a new AIModel that works differently than the derivatives of BaseAIModel - say it wraps another AIModel, or combines several AIModels, or whatever - you can simply implement AIModel directly, and not have to deal with all the stuff in BaseAIModel that isn't needed or doesn't relate.

trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,finishReason,aiTopic
"#workflow/github.com/dockstore-testing/testWorkflow",master,https://raw.githubusercontent.com/dockstore-testing/testWorkflow/master/Dockstore.cwl,07d68a2bce6118b31018c31a013325cda07030efc92750c036df4ab112849c02,true,16283,50,stop,"The workflow starts by inputting Illumina-sequenced ARTIC data and involves a voyeur."
trsId,version,descriptorUrl,descriptorChecksum,isTruncated,promptTokens,completionTokens,cost,finishReason,aiTopic
"#workflow/github.com/dockstore-testing/testWorkflow",master,https://raw.githubusercontent.com/dockstore-testing/testWorkflow/master/Dockstore.cwl,07d68a2bce6118b31018c31a013325cda07030efc92750c036df4ab112849c02,true,16283,50,0,stop,"The workflow starts by inputting Illumina-sequenced ARTIC data and involves a voyeur."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wut (lol)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#497
This was the least offensive way I could trigger one of the obscenity filters for training.

Amusingly. IIRC, I ruled out one of the filters because it found "breast cancer" too offensive which seemed like it could trip up some of our workflows.

@denis-yuen
Copy link
Member

This is cool to see. I think it might be worth trying to do a benchmark of the performance of multiple models here and publishing on it.

Could be a fun conference poster

Copy link

sonarcloud bot commented Sep 27, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants