[Google Colab] Whisper2Summarize is an application that uses Whisper for audio processing and GPT for summarization. It generates summaries of audio transcripts accurately, making it ideal for a variety of use cases such as note-taking, research, and content creation.
To get started with Google Colab, you may check out the Whisper2Summarize Notebook that contains a modified version of the code that works in Google Colab.
Just add in your API Key, audio file to the session storage, and select the Whisper Model to use. (I don't suggest using medium or large as it will be incredibly slow.)
To immediately get started with this program, you should clone this repository and install the requirements.
git clone https://github.com/AndreDalwin/Whisper2Summarize.git
cd Whisper2Summarize
pip install -r requirements.txt
python w2sgui.py
I used Python 3.10.11 to build this application, but OpenAI's Whisper and GPT is expected to be compatible with Python 3.8-3.10. The code depends on a few Python packages, notably OpenAI's Whisper and GPT, their dependencies, a torch verison that supports CUDA, and rust. You have the option to install all the requirements by cloning the repository then typing:
pip install -r requirements.txt
If you have an NVIDIA GPU, follow this step. Otherwise skip it. You want to install a different version of torch that supports CUDA.
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
You will need to install OpenAI's Whisper and GPT.
pip install -U openai-whisper openai
Additionally, it also requires the command-line tool ffmpeg
to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
NOTE: To install Whisper, might need rust
install as well in case you don't have pre-built wheel for your platform.
pip install setuptools-rust
Lastly, you need to clone this repository.
git clone https://github.com/AndreDalwin/Whisper2Summarize.git
cd Whisper2Summarize
Ensure you create a .env
file in the directory containing your OpenAI API Key, you will need it to run this program.
The following command will transcribe audio files, using Whisper's medium
model:
python whisper2summarize.py audio.mp3 --model medium
The default setting (which selects Whisper's base
model) works well with CPU for transcribing English. I recommend using other models when trying out multilingual audio snippets.
Here is the full list of available Whisper models:
tiny, small, base, medium, large-v2
To see the requirements to run these different models, check out OpenAI's Whisper Github to learn more.
You may start the GUI which allows you to select the audio file, model select, and paste in your OpenAI API Key.
python w2sgui.py
Running the program will output 2 files. Transcript.txt which is the raw transcript of the audio recording, and Summary.txt which is the summarized short form of the transcript.
Whisper's model weights are released under the MIT License. See LICENSE for further details.
Feel free to fork this to experiment this yourself. I actually made this for my girlfriend since her class recordings are really long.
- Implement a "Translate" feature to translate transcripts to a different language
- Implement an option to change OpenAI model (gpt-3.5-turbo, text-davinci-003, gpt-4)
- Print out possible errors in the GUI terminal when something bad happens