Hark is your new favorite gadget for turning live audio into text, all while mingling with OpenAI’s GPT-4 for some extra brainpower! Whether you're capturing epic meetings or casual chats, Hark’s got you covered with its slick features and nerdy charm.
- Real-Time Speech-to-Text-to-LLM: Watch in awe as live audio transforms into text instantaneously thanks to cutting-edge speech recognition.
- Multi-Language Support: Speak in your language of choice! Hark supports a ton of languages for flawless transcriptions.
- Interactive GPT-4 Integration: Chat with OpenAI’s GPT-4 for smart answers and insights that go beyond mere transcription.
- Meeting Summarization: Get concise summaries of your meetings that highlight all the important bits without the fluff.
- User-Friendly Interface: Big, friendly buttons for starting, stopping, and clearing recordings—perfect for all levels of tech wizardry.
Gear up and get ready to roll! Make sure you have Node.js and Yarn installed on your machine.
-
Clone the repo:
git clone https://github.com/us/hark.git cd hark
-
Install dependencies:
npm install yarn yarn
To capture system audio on macOS, grab BlackHole—a nifty virtual audio driver.
-
Install BlackHole: Download and install BlackHole.
-
Create a Multi-Output Device:
- Open Audio MIDI Setup.
- Hit the + button and choose "Create Multi-Output Device."
- Add your speakers and BlackHole to this device.
- Set it as your system audio output.
-
Set BlackHole as Input:
- In Hark, select BlackHole from the audio input device dropdown.
To achieve a similar setup on Windows, use Voicemeeter.
-
Install Voicemeeter: Download and install Voicemeeter.
-
Configure Voicemeeter:
- Open Voicemeeter.
- Set Hardware Input 1 as your default microphone and send it only to
B
. - Also, send the virtual input to both
A
andB
(withA
for hearing through your default speakers andB
for virtual output). - Set Hardware Out A1 as your default output, typically your system speakers.
- Double-check the Windows sound settings in the system tray to ensure Voicemeeter hasn’t changed your default speaker output. (Keep your sound output as your default device, not voicemeeter!)
-
Configure Audio in Google Meet and Hark:
- In Google Meet, set the input as your default mic and output as Voicemeeter Input.
- In Hark, choose B1 as the input device.
Fire up your local server with:
yarn dev
Then check out your app at http://localhost:3000.
- Select Audio Device: Choose BlackHole to capture system sound.
- Start Recording: Hit "Start Recording" to capture and transcribe audio in real-time.
- Language Selection: Pick your preferred language from the dropdown.
- Ask GPT-4: Use "Answer the Latest Question" to get smart responses from GPT-4.
- Summarize Meeting: Click "What’s This Meeting About?" for a quick summary of your discussion.
- Stop Recording: End the session with "Stop Recording."
- Clear Results: Hit "Clear" to reset and prep for the next session.
- Whisper Integration: Planning to add Whisper API for even more accurate transcriptions. Note: It's heavy and slow, so our current system is still quicker.
- More Languages: Expanding language options to cover even more tongues.
- React UI Overhaul: A fresh, React-based UI to make the interface even more user-friendly.
- Local Speech-to-Text Models: Offline capabilities so you’re never left hanging.
- Expanded Model Support: Additional AI models for broader interaction possibilities.
Before you dive into using Hark, make sure you've completed these steps for a seamless experience:
- Audio Routing: Ensure that audio routing is correctly set up with BlackHole (or a similar virtual audio driver). BlackHole captures system audio, allowing Hark to process sound from other applications.
- Input Device Configuration: Verify that BlackHole is selected as the input device within Hark. This ensures the app captures all system sounds accurately.
- API Key Setup: Enter your OpenAI API key in
app.js
to enable GPT-4 interactions. - Model Selection: Choose the appropriate GPT-4 model for your needs.
- Application Testing: Start listening with Hark, and test by asking questions to ensure everything works as expected.
By following these steps, you ensure that Hark is fully functional and ready to provide a smooth, real-time transcription and interaction experience.
Got ideas or want to help out? We’re all ears! Submit a pull request or open an issue to join the fun.
“Sharing knowledge is the most fundamental act of friendship. Because it is a way you can give something without losing something.”
— Richard Stallman
This project is licensed under GLWTPL (GOOD LUCK WITH THAT PUBLIC LICENSE)
It is built for educational purposes only. If you choose to use it otherwise, the developers will not be held responsible. Please, do not use it with evil intent.
Questions, suggestions, or just want to chat? Hit us up at rahmetsaritekin@gmail.com.