Voice entry demo #31010

jakethesnake420 · 2024-01-15T22:02:19Z

Basic demo starting point. Doesn't do anything useful yet

Demo running on Comma 3 device:

20240117_183950.mp4

cone-guy · 2024-01-16T18:13:02Z

Offline wake words and speech to text/text to speech are pretty nice right now due to Home Assistant's year of voice:
https://www.home-assistant.io/blog/2022/12/20/year-of-voice/

I found one model that'll run via ONNX, which should be accelerated by the comma 3/x

Would be interested to see how reliable it is over LTE!

github-actions · 2024-01-17T05:10:53Z

It looks like you didn't use one of the Pull Request templates. Please check the contributing docs. Also make sure that you didn't modify any of the checkboxes or headings within the template.

… speechToText service

adeebshihadeh · 2024-01-19T23:21:58Z

@jnewb1 can you try this out in a car?

jakethesnake420 · 2024-01-20T02:01:27Z

system/micd.py

    self.pm.send('microphone', msg)
-    self.rk.keep_time()
+    msg = messaging.new_message('microphoneRaw', valid=True)
+


jakethesnake420 · 2024-01-20T02:02:27Z

system/assistant/rev_speechd.py

+        print(f'Timeout reached. {loop_count=}, {time.time()-start_time=}')
+        break
+      elif self.stop_thread.is_set():
+        print(f'stop_thread.is_set()=')


jakethesnake420 · 2024-01-20T02:03:37Z

system/assistant/openwakeword/model.py

clean this whole file

selfdrive/ui/qt/widgets/assistant.cc

jakethesnake420 · 2024-01-20T02:06:03Z

selfdrive/ui/qt/window.cc

@@ -33,15 +35,18 @@ MainWindow::MainWindow(QWidget *parent) : QWidget(parent) {
    main_layout->setCurrentWidget(onboardingWindow);
  }

+


jakethesnake420 · 2024-01-20T02:06:45Z

selfdrive/ui/qt/window.h


 class MainWindow : public QWidget {
  Q_OBJECT

 public:
  explicit MainWindow(QWidget *parent = 0);
-
+  


white space

jakethesnake420 · 2024-01-20T02:07:40Z

system/assistant/openwakeword/README.md

+You can also run rev_speechd.py which will wait for the "WakeWordDetected" param to be set.
+To setup the Rev.Ai api you need to install rev_ai:
+
+pip install rev_ai


See my PR for rev_ai revdotcom/revai-python-sdk#111

jakethesnake420 · 2024-01-20T02:09:12Z

system/assistant/openwakeword/__init__.py

+    "download_url": "https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/weather_v0.1.tflite"
+  }
+}
+


remove all these paths

jakethesnake420 · 2024-01-20T02:18:56Z

system/micd.py

 REFERENCE_SPL = 2e-5  # newtons/m^2
-SAMPLE_RATE = 44100
-SAMPLE_BUFFER = 4096 # (approx 100ms)
+SAMPLE_RATE = 16000


this reduces the frequency response so soundPressure is lower now. I tested that soundd still adjusts the volume for normal road noise and music.

jakethesnake420 · 2024-01-20T02:21:29Z

system/micd.py

+    msg.microphoneRaw.frameIndex = self.frame_index
+    if not (self.frame_index_last == self.frame_index or
+            self.frame_index - self.frame_index_last == SAMPLE_BUFFER):
+      cloudlog.info(f'skipped {(self.frame_index - self.frame_index_last)//SAMPLE_BUFFER-1} samples')


figure out how to stop it from skipping sometimes

jnewb1 · 2024-01-22T21:15:16Z

tried it in a car, looks pretty good! openpilot seems to run fine with it running, I will check some cpu stats soon. couple things

noticed the last word always takes a bit longer in the live view, is that just a bug in the display or something with the API?
have you experimented with making a custom wake word (like "hey comma" or just "comma"?)
final parsing to get the full phrase seems to take 5 seconds, can we reduce that?

https://www.youtube.com/watch?v=TGMpwqr3r08

either way, this certainly fulfills the bounty qualifications!

jakethesnake420 · 2024-01-22T23:00:37Z

Thank you for testing it. I am glad you liked it!

have you experimented with making a custom wake word (like "hey comma" or just "comma"?)

That requires training a new model which I am not familiar with doing but it is well documented here.

noticed the last word always takes a bit longer in the live view, is that just a bug in the display or something with the API?

final parsing to get the full phrase seems to take 5 seconds, can we reduce that?

Yes, there are ways to decrease latency but there are some tradeoffs.
I am sending 80ms binary chunks instead of 250ms chunks. This should be fairly easy to change by concatenating chunks from the audio_queue. This could have a big effect on final transcript latency. You can also select different transcription models which will have an effect.

you can read the docs here but here are the important parts. https://docs.rev.ai/api/streaming/requests/#raw-file-content-type

max_segment_duration_seconds: This parameter potentially changes the amount of context our engine has when creating final hypotheses and therefore has a minor effect on word error rate. Higher values correlate with fewer errors in transcription.

skip_postprocessing=true: Only available for English and Spanish languages You can choose to skip post-processing operations, such as inverse text normalization (ITN), casing and punctuation, by adding skip_postprocessing=true to your request. Doing so will result in a small decrease in latency; however, your final hypotheses will no longer contain capitalization, punctuation, or inverse text normalization (for example, five hundred will not be normalized to 500).

I originally used Google's api and it was faster! you can see my google API implementation here f48cfbe. There may be optimizations if you select a different model from rev.ai but I am not sure. You could change the api provider fairly easily as long as it supports streaming.

The reason I moved away from the google api is because the authentication process was cumbersome and the python library they provide is not documented well enough for me and I wasn't able to end transcription sessions cleanly. There are many other API providers that you can try but rev.ai was easy to get started with.

jakethesnake420 · 2024-01-26T09:58:51Z

@jnewb1 Can you send interac e transfers?

jnewb1 · 2024-01-26T17:07:51Z

@jnewb1 Can you send interac e transfers?

What?

jakethesnake420 · 2024-01-27T00:42:27Z

@jnewb1 Can you send interac e transfers?

What?

Like for the bounty payout. Do you guys do PayPal or e-transfer or something else?

jnewb1 · 2024-01-27T00:43:53Z

@jnewb1 Can you send interac e transfers?

What?

Like for the bounty payout. Do you guys do PayPal or e-transfer or something else?

https://github.com/commaai/openpilot/blob/master/docs/BOUNTIES.md#rules

jakethesnake420 · 2024-01-27T04:18:49Z

@jnewb1 Can you send interac e transfers?

What?

Like for the bounty payout. Do you guys do PayPal or e-transfer or something else?

https://github.com/commaai/openpilot/blob/master/docs/BOUNTIES.md#rules

Ok if thats the case id like to do a few more and save up for a comma body!

adeebshihadeh · 2024-01-29T23:11:14Z

Closing since I don't want to merge this in this state, but this was still very valuable and we're good to payout whenever you want.

jakethesnake420 · 2024-01-30T08:47:07Z

Thank you sounds good!

jakethesnake420 · 2024-03-17T20:20:26Z

tried it in a car, looks pretty good! openpilot seems to run fine with it running, I will check some cpu stats soon. couple things

noticed the last word always takes a bit longer in the live view, is that just a bug in the display or something with the API?

have you experimented with making a custom wake word (like "hey comma" or just "comma"?)

final parsing to get the full phrase seems to take 5 seconds, can we reduce that?

https://www.youtube.com/watch?v=TGMpwqr3r08

either way, this certainly fulfills the bounty qualifications!

I emailed rev AI about the latency. They said they will fix the issue.

Here is the email exchange:

I am developing speech to text for Openpilot, a self driving add-on system. I have chosen to use rev AI because of its ease of authentication and simple API. On issue I am facing is the delay between the interim and final results. For example if the user says "Navigate to home", the interim result will return "navigate to" immediately and it will take about 8 seconds for it to return the full final result "Navigate to home".

Using your streaming demo on your website it's very fast and I would like the same performance.I am using the python API SDK. I am recording 80ms chunks of audio. I tried sending 250ms chunks but it doesn't make a difference.

Reply from support:

The team does have an idea of how they can improve your experience. The issue is it is going to be weeks before they can implement it. At minimum, 2 weeks. Possibly longer if something more urgent comes up. As I understand it, they need to code the ability to disable overlaps and apply it to your account specifically. Given your responses will all likely be short, the overlap is what is holding things up. This is not something we have built yet, but they think it can be done. I am keeping your ticket open so I can circle back to you once I have confirmation this has happened.

Basic openwakeword demo

5d69fce

jakethesnake420 mentioned this pull request Jan 15, 2024

[$200 bounty] Voice entry demo #30884

Closed

adeebshihadeh marked this pull request as draft January 15, 2024 22:04

adeebshihadeh added the enhancement label Jan 15, 2024

removed pyaudio and update README

74d09de

jnewb1 linked an issue Jan 15, 2024 that may be closed by this pull request

[$200 bounty] Voice entry demo #30884

Closed

jakethesnake420 added 2 commits January 16, 2024 07:51

microphone subscriber stream

c9cdba2

bump cereal

e6dc6f7

jakethesnake420 force-pushed the wake-word-demo branch from 4cba86c to e6dc6f7 Compare January 16, 2024 13:51

cleanup

248fe17

jakethesnake420 marked this pull request as ready for review January 16, 2024 22:31

jakethesnake420 added 4 commits January 16, 2024 18:18

switch to new microphone service

b4980a6

bump cereal

7a95204

micd: Send microphoneRaw and don't enforce rate with rate keeper

f619eb7

bump cereal

4677b04

github-actions bot added the in-bot-review label Jan 17, 2024

jakethesnake420 added 12 commits January 16, 2024 23:11

use 16bit audio

105aedb

Track skipped samples

7854f96

run until closed

6bcc4cc

run until closed

87a2891

bump cereal

5d1c47d

Wait for sounddevice to start before rate keeper starts

a3c28fc

Buffer samples in queue for slow connections and publish responses to…

415d27c

… speechToText service

fix typo

6338af3

better timeouts for google speech api

fa04d3d

update poetry

70605c5

update speech_printer.py

0ab436c

speechToText UI label widget

95373f9

I think this is good enough

2841aff

jnewb1 added 4 commits January 19, 2024 15:31

set .gitmodules

0a60f5f

Merge remote-tracking branch 'origin/master' into wake-word-demo

7be0f17

bump

04c571c

bump

fa7c4bc

jakethesnake420 commented Jan 20, 2024

View reviewed changes

jakethesnake420 added 12 commits January 20, 2024 01:47

Only run onroad

b651db3

Only run onroad unless PC

2d2dc83

widget state refactor. Hopefully less buggy

7dfe752

bump cereal

207123b

don't print so much

b9f64a5

cleanup

e809f8e

cleanup added nav setter

52a3daa

nav setter and cleanup

6d2d7f2

less lines. remove comments

9c045db

less lines

1d12867

cleaning

a967a15

fix static analysis

abbafb6

adeebshihadeh closed this Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice entry demo #31010

Voice entry demo #31010

jakethesnake420 commented Jan 15, 2024 •

edited

Loading

cone-guy commented Jan 16, 2024

github-actions bot commented Jan 17, 2024

adeebshihadeh commented Jan 19, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jakethesnake420 Jan 20, 2024

jnewb1 commented Jan 22, 2024 •

edited

Loading

jakethesnake420 commented Jan 22, 2024 •

edited

Loading

jakethesnake420 commented Jan 26, 2024

jnewb1 commented Jan 26, 2024

jakethesnake420 commented Jan 27, 2024

jnewb1 commented Jan 27, 2024

jakethesnake420 commented Jan 27, 2024

adeebshihadeh commented Jan 29, 2024

jakethesnake420 commented Jan 30, 2024

jakethesnake420 commented Mar 17, 2024

		@@ -33,15 +35,18 @@ MainWindow::MainWindow(QWidget *parent) : QWidget(parent) {
		main_layout->setCurrentWidget(onboardingWindow);
		}

Voice entry demo #31010

Voice entry demo #31010

Conversation

jakethesnake420 commented Jan 15, 2024 • edited Loading

cone-guy commented Jan 16, 2024

github-actions bot commented Jan 17, 2024

adeebshihadeh commented Jan 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnewb1 commented Jan 22, 2024 • edited Loading

jakethesnake420 commented Jan 22, 2024 • edited Loading

jakethesnake420 commented Jan 26, 2024

jnewb1 commented Jan 26, 2024

jakethesnake420 commented Jan 27, 2024

jnewb1 commented Jan 27, 2024

jakethesnake420 commented Jan 27, 2024

adeebshihadeh commented Jan 29, 2024

jakethesnake420 commented Jan 30, 2024

jakethesnake420 commented Mar 17, 2024

jakethesnake420 commented Jan 15, 2024 •

edited

Loading

jnewb1 commented Jan 22, 2024 •

edited

Loading

jakethesnake420 commented Jan 22, 2024 •

edited

Loading