Most of this codebase was ChatGPT generated! Documentation, code snippets, git commits, etc. It was heavily directed. Finished in about 3 hours total, by a sleepy dev.
The VTT Transcript Summarizer is an application that takes in a .vtt
file, particularly from Microsoft Teams, cleans and processes it, and then provides a concise recap using the OpenAI's GPT-3.5-turbo model. The application extracts the essence of a transcript, highlighting key points, action items, and unanswered questions. This is particularly useful for those looking to quickly understand the outcomes of meetings without having to go through the entire transcript.
- Reads and cleans
.vtt
files to remove timestamps and unnecessary metadata. - Summarizes transcripts into bullet points that capture key points, action items, or unanswered questions.
- Provides a final, organized summarization to present the findings.
-
Clone the repository:
git clone https://github.com/iUngerTime/VTT-Captions-to-Summary.git cd <repository_directory>
-
Install the necessary libraries:
pip install openai tiktoken python-dotenv
-
Setup OpenAI credentials:
- Copy example
.env.example
file to.env
in root of directory - Replace
<your_openai_api_key>
with your OpenAI secret generated in your profile settings on https://platform.openai.com/
- Copy example
-
Run the application:
python3 <script_name>.py
-
Ensure your
.vtt
file is ready. Update thefile_path
variable in the script to reflect your location. -
Run the application as instructed above. The script will process the
.vtt
file and provide a summarized recap in the console.
- Token usage and limits have been taken into consideration. Large transcripts are batch processed to fit within OpenAI's token limit.
- Always remember that the accuracy and quality of the summarization can vary. Always review and adjust the results/prompts as needed. I hold zero responsibility for the output of the program!
- Microsoft does not attach the people's names to the individual commenting in the transcript. If that functionality exists, or your VTT file has those, consider updating the cleaning function to properly check it.
- The default microsoftTeamsGeneratedTranscript.vtt has been added to .gitignore so you don't accidentally publish private corporate information into a public repository. :)
- OpenAI: For using the GPT-3.5-turbo model to generate summaries.
- Tiktoken: For counting how many tokens are in a text string without making an API call.
- python-dotenv: For loading environment variables from an
.env
file.
- For feedback, issues, or contributions, please open an issue or pull request on the repository.
- This project is open-source and available to everyone. Please ensure to credit the original author when sharing or using in your projects.