-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use transcription service for transcripting/translation of audio/video files #239
Conversation
d33be98
to
a9cb6d1
Compare
* status: z.literal('SUCCESS'), | ||
* languageCode: z.string(), | ||
* outputBucketKeys: OutputBucketKeys, | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could probably be removed - maybe replace with a comment linking to the relevant types.ts
file in the transcription-service repo
if (completed > 0) { | ||
go() | ||
} else { | ||
println(s"try again ExternalWorkerScheduler in ${interval}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either remove or change to log
|
||
def go(): Unit = { | ||
try { | ||
println("running ExternalWorkerScheduler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rogue println
"uri", uri, | ||
"extractorName", extractorName, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
congratulations on writing all this cipher!
backend/app/AppComponents.scala
Outdated
@@ -83,6 +85,7 @@ class AppComponents(context: Context, config: Config) | |||
// data storage services | |||
val ingestStorage = S3IngestStorage(s3Client, config.s3.buckets.ingestion, config.s3.buckets.deadLetter).valueOr(failure => throw new Exception(failure.msg)) | |||
val blobStorage = S3ObjectStorage(s3Client, config.s3.buckets.collections).valueOr(failure => throw new Exception(failure.msg)) | |||
val transcriptionStorage = S3ObjectStorage(s3Client, config.s3.buckets.transcription).valueOr(failure => throw new Exception(failure.msg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be transcriptStorage
or transcriptionOutputStorage
- just to make clear that it should only have transcripts in it, not source media
This is looking great - just a few minor comments above |
logger.error(s"failed to process sqs message", failure.toThrowable) | ||
if (messageAttributes.receiveCount > 2) { | ||
markAsFailure(new Uri(messageAttributes.messageGroupId), "ExternalTranscriptionExtractor", failure.msg) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this handler needs updating to deal with the case where the message was processed succsesfully but there was a failure message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot 👏 I updated the failure cases to handle the error differently when the message is a failure one. This happens in the failure message scenario:
- move message to dead letter queue
- delete message from output queue
- mark blob/extractor relationship as failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work! Very excited to see this go into giant and try it out!
Paired on this with @philmcmahon
What does this change?
This PR integrates transcription service into Giant.
PROCESSING_EXTERNALLY
for when the message is sent to external transcription service until the transcript output is ready and output message is delivered in the output queueThe following SSM parameters were created for playground but should also be created for pfi-giant (prod):
TODO in upcoming PR
How to test
Tested locally and in code
The relevant PRs for this change and the order they need to be released are as followed:
1- https://github.com/guardian/investigations-platform/pull/521
2- guardian/transcription-service#103
3- Current PR