-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Voice entry demo #31010
Voice entry demo #31010
Conversation
4cba86c
to
e6dc6f7
Compare
Offline wake words and speech to text/text to speech are pretty nice right now due to Home Assistant's year of voice: I found one model that'll run via ONNX, which should be accelerated by the comma 3/x Would be interested to see how reliable it is over LTE! |
It looks like you didn't use one of the Pull Request templates. Please check the contributing docs. Also make sure that you didn't modify any of the checkboxes or headings within the template. |
… speechToText service
@jnewb1 can you try this out in a car? |
system/micd.py
Outdated
self.pm.send('microphone', msg) | ||
self.rk.keep_time() | ||
msg = messaging.new_message('microphoneRaw', valid=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace
system/assistant/rev_speechd.py
Outdated
print(f'Timeout reached. {loop_count=}, {time.time()-start_time=}') | ||
break | ||
elif self.stop_thread.is_set(): | ||
print(f'stop_thread.is_set()=') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean this whole file
selfdrive/ui/qt/window.cc
Outdated
@@ -33,15 +35,18 @@ MainWindow::MainWindow(QWidget *parent) : QWidget(parent) { | |||
main_layout->setCurrentWidget(onboardingWindow); | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
selfdrive/ui/qt/window.h
Outdated
|
||
class MainWindow : public QWidget { | ||
Q_OBJECT | ||
|
||
public: | ||
explicit MainWindow(QWidget *parent = 0); | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
white space
You can also run rev_speechd.py which will wait for the "WakeWordDetected" param to be set. | ||
To setup the Rev.Ai api you need to install rev_ai: | ||
|
||
pip install rev_ai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my PR for rev_ai revdotcom/revai-python-sdk#111
"download_url": "https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/weather_v0.1.tflite" | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove all these paths
REFERENCE_SPL = 2e-5 # newtons/m^2 | ||
SAMPLE_RATE = 44100 | ||
SAMPLE_BUFFER = 4096 # (approx 100ms) | ||
SAMPLE_RATE = 16000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reduces the frequency response so soundPressure is lower now. I tested that soundd still adjusts the volume for normal road noise and music.
msg.microphoneRaw.frameIndex = self.frame_index | ||
if not (self.frame_index_last == self.frame_index or | ||
self.frame_index - self.frame_index_last == SAMPLE_BUFFER): | ||
cloudlog.info(f'skipped {(self.frame_index - self.frame_index_last)//SAMPLE_BUFFER-1} samples') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
figure out how to stop it from skipping sometimes
tried it in a car, looks pretty good! openpilot seems to run fine with it running, I will check some cpu stats soon. couple things
https://www.youtube.com/watch?v=TGMpwqr3r08 either way, this certainly fulfills the bounty qualifications! |
Thank you for testing it. I am glad you liked it!
That requires training a new model which I am not familiar with doing but it is well documented here.
Yes, there are ways to decrease latency but there are some tradeoffs. you can read the docs here but here are the important parts. https://docs.rev.ai/api/streaming/requests/#raw-file-content-type
I originally used Google's api and it was faster! you can see my google API implementation here f48cfbe. There may be optimizations if you select a different model from rev.ai but I am not sure. You could change the api provider fairly easily as long as it supports streaming. The reason I moved away from the google api is because the authentication process was cumbersome and the python library they provide is not documented well enough for me and I wasn't able to end transcription sessions cleanly. There are many other API providers that you can try but rev.ai was easy to get started with. |
@jnewb1 Can you send interac e transfers? |
What? |
Like for the bounty payout. Do you guys do PayPal or e-transfer or something else? |
https://github.com/commaai/openpilot/blob/master/docs/BOUNTIES.md#rules |
Ok if thats the case id like to do a few more and save up for a comma body! |
Closing since I don't want to merge this in this state, but this was still very valuable and we're good to payout whenever you want. |
Thank you sounds good! |
I emailed rev AI about the latency. They said they will fix the issue. Here is the email exchange: I am developing speech to text for Openpilot, a self driving add-on system. I have chosen to use rev AI because of its ease of authentication and simple API. On issue I am facing is the delay between the interim and final results. For example if the user says "Navigate to home", the interim result will return "navigate to" immediately and it will take about 8 seconds for it to return the full final result "Navigate to home". Using your streaming demo on your website it's very fast and I would like the same performance.I am using the python API SDK. I am recording 80ms chunks of audio. I tried sending 250ms chunks but it doesn't make a difference. Reply from support: The team does have an idea of how they can improve your experience. The issue is it is going to be weeks before they can implement it. At minimum, 2 weeks. Possibly longer if something more urgent comes up. As I understand it, they need to code the ability to disable overlaps and apply it to your account specifically. Given your responses will all likely be short, the overlap is what is holding things up. This is not something we have built yet, but they think it can be done. I am keeping your ticket open so I can circle back to you once I have confirmation this has happened. |
Basic demo starting point. Doesn't do anything useful yet
Demo running on Comma 3 device:
20240117_183950.mp4