Skip to content

Using LLMs to turn podcast transcripts into website pull requests

Notifications You must be signed in to change notification settings

fourTheorem/episoder

Repository files navigation

episoder 🎙️🧠

Create pull requests for podcast websites using provided transcripts and GenAI.

Overview

This an AWS SAM project that takes podcast transcripts generated by Podwhisperer and performs two steps:

  1. Use an Amazon Bedrock LLM to summarise the episode and create draft YouTube chapters
  2. Create a Pull Request for the podcast's website source code containing the summary and the transcript content

An example of a Pull Request generated by this project can be found here.

GenAI / LLM use

We currently use the Anthropic Claude V2 Large Language Model (LLM) in Amazon Bedrock

System Overview

System overview

(Source for this diagram is in template.drawio in this directory)

Prerequisites

  1. The project assumes that JSON transcripts are created upstream in an S3 Bucket by Podwhisperer.
  2. You will need the following build tooling installed.
  • Node.js 18.x and NPM 8.x
  • AWS SAM, used to build and deploy most of the application
  • The AWS CLI
  • esbuild
  1. By default, the target AWS account should have the SLIC Watch SAR Application installed. It can be installed by going to this page in the AWS Console. SLIC Watch is used to create alarms and dashboards for our transcription application. If you want to skip this option, just remove the single line referring to the SlicWatch-v2 macro from template.yaml.
  2. You will need to go to the Amazon Bedrock Console and enable the Anthropic Claude v2 model in "Model Settings"
  3. Enable access to the website repository for your podcast with an SSM SecureString parameter in your AWS account:
Parameter Description Example Value
/episoder/gitHubUserCredentials Personal Access Token (PAT) for the GitHub repository username:github_pat_123AB...xyz

Prompt Engineering

To test changes to the LLM prompt, you don't have to deploy. You can run summarise.ts with a path to a JSON transcript file. A sample transcript is provided. This script uses Bedrock so you must have AWS credentials for an account set up.

./bin/summarise.ts ./sample-transcripts/aws-bites-101.json 

To tweak the prompt, edit lib/prompt-template.ts.

Deployment

Using AWS SAM:

sam build --parallel
sam deploy --guided

You will be prompted for:

  • The S3 Bucket where transcripts are expected to arrive
  • The region to use for Bedrock, since Bedrock is currently only available in a limited number of regions
  • The email address and name to use for Git commits
  • The HTTPS URL of your website GitHub repository, e.g., https://github.com/awsbites/aws-bites-site.git

Once deployment has completed, you can check the Step Function that orchestrates the whole process in the AWS Console. This state machine is automatically executed when transcripts are placed in the processed-transcripts/ prefix.

Price Monitoring

Bedrock pricing can be difficult to estimate. This repo comes with a pricing CloudWatch dashboard that helps to show the cost for a given period and the relationship between invocations, input tokens and output tokens. This is calculated based on published on-demand pricing for the ClaudeV2 model as of 28 October 2023. A CloudWatch alarm is also created for the total cost per hour, defaulting to breach when the cost exceeds $1 per hour for three consecutive hours.

Pricing Dashboard

The pricing dashboard can be deployed with CDK:

cd price-monitor
npm install
npx cdk deploy -c bedrockRegion=us-east-1