Batch transcription/evaluation of audio files via whisper.
Developed by UCSD IT Services, used to explore more accurate captioning for UCSD course podcasts and its potential uses. Running on the ITS Data Science and Machine Learning Platform.
- Evaluate transcription accuracy via WER.
- Leverage transcriptions to generate meaningful learning aids for students.
Note: If you are not affiliated with UCSD, you will need to modify the query function in utilities.py
to use a different LLM such as ChatGPT. Feel free to reach out if you are running into issues or have additional questions.