Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioWorkletNode #385

Merged
merged 28 commits into from
Nov 18, 2023
Merged

AudioWorkletNode #385

merged 28 commits into from
Nov 18, 2023

Conversation

orottier
Copy link
Owner

@orottier orottier commented Nov 9, 2023

Hey, this is already getting out of hand so I wanted to show you the current progress.

I think mainly the examples/worklet.rs file would be interesting to inspect. It should resemble the spec way more than the AudioProcessor trait we used before.

Lots of work to do:

  • support multiple inputs/outputs
  • support outputChannelCount
  • construct the processor inside the render thread and support constructor options
  • remove the Send bound from the trait
  • support scope (current time, frame , sample rate)
  • message port
  • AudioParamDescriptor should contain name: String
  • Add bitcrusher example
  • decide mod layout
  • documentation
  • rename RenderScope to AudioWorkletGlobalScope ?
  • decide: hide the raw AudioProcessor trait?
  • benchmarking

@orottier orottier requested a review from b-ma November 9, 2023 07:35
outputs: &mut [AudioRenderQuantum],
params: AudioParamValues<'_>,
_scope: &RenderScope,
inputs: &'b [&'a [f32]],
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These different lifetimes are necessary unfortunately

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit odd/complex indeed, but not really shocking neither in my opinion as this is boilerplate code...

Do you think it is due to the fact that you need to deal with the existing AudioWorkletRenderer so this could potentially be removed once (and if) all nodes are ported to this new API, or is it something we will have to live with forever (with an additional depth when several inputs/outputs are supported) in any case?

test.rs Outdated Show resolved Hide resolved
@b-ma
Copy link
Collaborator

b-ma commented Nov 11, 2023

Hey, I'll try to have a close look in the next few days

@b-ma
Copy link
Collaborator

b-ma commented Nov 12, 2023

It looks quite nice indeed! Great work

I read the spec and the code quite a few time (...but probably not enough :) and here are a few questions / remarks:

  • rename AudioWorkletProcessor::construct to AudioWorkletProcessor::constructor?
  • rename ConstructorOptions or ProcessorOptions?

So if I understand well you consider the instantiation should occur in render thread (which is confirmed by the spec by the way) because you think processors should be only stack allocated and thus deallocation should be free? But I still don't really get how this new API would help to solve our issues with e.g. delay ring buffer (but maybe that's not the point yet...)?

If I'm not missing something the bitcrusher example could already be implemented no? I can try to do it if you want, would be an interesting test maybe.
The vumeter example would also be interesting to implement at some point to test message ports.

In any case, that's really interesting! Let me know if I can help in any way (maybe porting some existing nodes see were it gets us?)

@orottier
Copy link
Owner Author

Thanks for having a look. I added a bit more work so right now I think it would be very interesting if you could try to port the bitcrusher example and see if it works out. The VU-meter is not possible as of yet I think because it needs to extend the AudioWorkletNode which is not possible in rust and you should then still use the raw AudioProcessor trait

So if I understand well you consider the instantiation should occur in render thread (which is confirmed by the spec by the way)

Yes definitely

because you think processors should be only stack allocated and thus deallocation should be free?

I'm worried that this is a bridge to far, this will not help us get rid of the Box to support polymorphism of the processors

But I still don't really get how this new API would help to solve our issues with e.g. delay ring buffer (but maybe that's not the point yet...)?

I understand, I'm making things messier before I make them nicer ;) I will need to port the construct-on-render-thread code to the AudioProcessor trait as well. Then we can remove some unsafe snippets :)

maybe porting some existing nodes see were it gets us?

I think the current AudioWorkletProcessor trait has a performance hit so I'm not sure if we should do this at all. Provided I port the nice to have features from AudioWorkletProcessor to AudioProcessor

params: crate::render::AudioParamValues<'_>,
scope: &RenderScope,
) -> bool {
// Bear with me, to construct a &[&[&[f32]]] we first build a backing vector of all the
Copy link
Owner Author

@orottier orottier Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get rid of these allocations in safe rust is going to be tricky. But I will try a few more things

@orottier
Copy link
Owner Author

/bench

Copy link

Benchmark result:


bench_ctor
  Instructions:             4423187 (-0.303427%)
  L1 Accesses:              6719165 (-0.181953%)
  L2 Accesses:                54181 (No change)
  RAM Accesses:               61532 (+0.006501%)
  Estimated Cycles:         9143690 (-0.132244%)

bench_audio_buffer_decode
  Instructions:             7585215 (+0.009994%)
  L1 Accesses:             10489372 (+0.007980%)
  L2 Accesses:                28516 (+1.314574%)
  RAM Accesses:               32049 (-0.003120%)
  Estimated Cycles:        11753667 (+0.022568%)

bench_sine
  Instructions:            70252052 (+0.060680%)
  L1 Accesses:            102871945 (+0.554040%)
  L2 Accesses:               264599 (-0.656660%)
  RAM Accesses:               62408 (+0.022438%)
  Estimated Cycles:       106379220 (+0.527833%)

bench_sine_gain
  Instructions:            77885160 (+4.086911%)
  L1 Accesses:            114473467 (+4.612159%)
  L2 Accesses:               316761 (+3.721737%)
  RAM Accesses:               62522 (+0.046405%)
  Estimated Cycles:       118245542 (+4.511876%)

bench_sine_gain_delay
  Instructions:           148392722 (+0.031196%)
  L1 Accesses:            209202175 (+0.278761%)
  L2 Accesses:               529527 (+0.424435%)
  RAM Accesses:               64086 (+0.017167%)
  Estimated Cycles:       214092820 (+0.277812%)

bench_buffer_src
  Instructions:            16774792 (+0.169164%)
  L1 Accesses:             24723205 (+0.286896%)
  L2 Accesses:                89816 (+3.807124%)
  RAM Accesses:              100538 (+0.000995%)
  Estimated Cycles:        28691115 (+0.304966%)

bench_buffer_src_delay
  Instructions:            88934510 (+0.006420%)
  L1 Accesses:            122012147 (+0.035332%)
  L2 Accesses:               164184 (+0.511175%)
  RAM Accesses:              100679 (-0.003973%)
  Estimated Cycles:       126356832 (+0.037312%)

bench_buffer_src_iir
  Instructions:            40715650 (+0.060593%)
  L1 Accesses:             59685499 (+0.116067%)
  L2 Accesses:                89157 (+1.314773%)
  RAM Accesses:              100630 (+0.001988%)
  Estimated Cycles:        63653334 (+0.118043%)

bench_buffer_src_biquad
  Instructions:            36100897 (+0.401245%)
  L1 Accesses:             51299708 (+0.532244%)
  L2 Accesses:               136468 (+6.610628%)
  RAM Accesses:              100755 (+0.004963%)
  Estimated Cycles:        55508473 (+0.569041%)

bench_stereo_positional
  Instructions:            42142814 (+1.123034%)
  L1 Accesses:             64315298 (+1.302864%)
  L2 Accesses:               342083 (+4.400863%)
  RAM Accesses:              100856 (+0.001983%)
  Estimated Cycles:        69555673 (+1.309907%)

bench_stereo_panning_automation
  Instructions:            30958117 (+0.174763%)
  L1 Accesses:             46827419 (+0.253085%)
  L2 Accesses:               137572 (+0.576091%)
  RAM Accesses:              100656 (+0.011923%)
  Estimated Cycles:        51038239 (+0.240739%)

bench_analyser_node
  Instructions:            38693285 (+0.062996%)
  L1 Accesses:             54535453 (+0.132570%)
  L2 Accesses:               188360 (-2.742279%)
  RAM Accesses:              101173 (+0.011862%)
  Estimated Cycles:        59018308 (+0.078116%)

bench_hrtf_panners
  Instructions:           329119773 (-0.067702%)
  L1 Accesses:            577494548 (-0.016454%)
  L2 Accesses:             10916196 (+0.256321%)
  RAM Accesses:              120607 (+0.067205%)
  Estimated Cycles:       636296773 (+0.007441%)

bench_sine_gain_with_worklet
  Instructions:            91786457
  L1 Accesses:            132656206
  L2 Accesses:               351418
  RAM Accesses:               62706
  Estimated Cycles:       136608006


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


@orottier
Copy link
Owner Author

/bench

Copy link

Benchmark result:


bench_ctor
  Instructions:             4414444 (-0.500490%)
  L1 Accesses:              6705690 (-0.382134%)
  L2 Accesses:                54180 (-0.003691%)
  RAM Accesses:               61535 (+0.013002%)
  Estimated Cycles:         9130315 (-0.278000%)

bench_audio_buffer_decode
  Instructions:             7585241 (+0.010337%)
  L1 Accesses:             10489630 (+0.010411%)
  L2 Accesses:                28296 (+0.529364%)
  RAM Accesses:               32044 (-0.006241%)
  Estimated Cycles:        11752650 (+0.015037%)

bench_sine
  Instructions:            69951085 (-0.367990%)
  L1 Accesses:            102328318 (+0.022660%)
  L2 Accesses:               273865 (+2.822248%)
  RAM Accesses:               62419 (+0.043275%)
  Estimated Cycles:       105882308 (+0.058318%)

bench_sine_gain
  Instructions:            77498406 (+3.570047%)
  L1 Accesses:            113762526 (+3.962463%)
  L2 Accesses:               332813 (+8.977524%)
  RAM Accesses:               62533 (+0.064007%)
  Estimated Cycles:       117615246 (+3.954782%)

bench_sine_gain_delay
  Instructions:           147763672 (-0.392845%)
  L1 Accesses:            208086556 (-0.255995%)
  L2 Accesses:               512471 (-2.811145%)
  RAM Accesses:               64103 (+0.037454%)
  Estimated Cycles:       212892516 (-0.284465%)

bench_buffer_src
  Instructions:            16462964 (-1.692654%)
  L1 Accesses:             24171458 (-1.951020%)
  L2 Accesses:                88288 (+2.039920%)
  RAM Accesses:              100551 (+0.013925%)
  Estimated Cycles:        28132183 (-1.648936%)

bench_buffer_src_delay
  Instructions:            88286334 (-0.722451%)
  L1 Accesses:            120866348 (-0.904083%)
  L2 Accesses:               161156 (-1.343732%)
  RAM Accesses:              100691 (+0.005959%)
  Estimated Cycles:       125196313 (-0.881536%)

bench_buffer_src_iir
  Instructions:            40280404 (-1.008966%)
  L1 Accesses:             58916570 (-1.173661%)
  L2 Accesses:                89558 (+1.769298%)
  RAM Accesses:              100642 (+0.011925%)
  Estimated Cycles:        62886830 (-1.087616%)

bench_buffer_src_biquad
  Instructions:            35299684 (-1.827065%)
  L1 Accesses:             49872236 (-2.264815%)
  L2 Accesses:               151030 (+17.80168%)
  RAM Accesses:              100737 (-0.014888%)
  Estimated Cycles:        54153181 (-1.888020%)

bench_stereo_positional
  Instructions:            40249162 (-3.420826%)
  L1 Accesses:             61013199 (-3.897955%)
  L2 Accesses:               296663 (-9.514790%)
  RAM Accesses:              100876 (+0.024789%)
  Estimated Cycles:        66027174 (-3.830390%)

bench_stereo_panning_automation
  Instructions:            30612494 (-0.943487%)
  L1 Accesses:             46198675 (-1.092908%)
  L2 Accesses:               135067 (-1.254542%)
  RAM Accesses:              100674 (+0.026826%)
  Estimated Cycles:        50397600 (-1.017609%)

bench_analyser_node
  Instructions:            38444787 (-0.579654%)
  L1 Accesses:             54067226 (-0.727162%)
  L2 Accesses:               189892 (-1.949728%)
  RAM Accesses:              101186 (+0.025702%)
  Estimated Cycles:        58558196 (-0.702036%)

bench_hrtf_panners
  Instructions:           328923236 (-0.127379%)
  L1 Accesses:            577064572 (-0.090864%)
  L2 Accesses:             10907463 (+0.174405%)
  RAM Accesses:              120698 (+0.137723%)
  Estimated Cycles:       635826317 (-0.066650%)

bench_sine_gain_with_worklet
  Instructions:            78255322
  L1 Accesses:            114811376
  L2 Accesses:               319187
  RAM Accesses:               62719
  Estimated Cycles:       118602476


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


@orottier
Copy link
Owner Author

bench_sine_gain
  Instructions:            77498406
bench_sine_gain_with_worklet
  Instructions:            78255322

I'm happy to see the new worklet overhead is quite low compared to the processors that handle the RenderQuantum raw.
I have added unsafe code to prevent reallocations per render quantum. I might ask for an unsafe review on the rust user forum though, it's quite tricky.

@orottier orottier marked this pull request as ready for review November 16, 2023 16:44
@orottier
Copy link
Owner Author

Are you okay with this mod layout?
I'm leaving the message port for now, that can be added later.
How do you feel about rename RenderScope to AudioWorkletGlobalScope ?

@b-ma
Copy link
Collaborator

b-ma commented Nov 16, 2023

I'm happy to see the new worklet overhead is quite low compared to the processors that handle the RenderQuantum raw.

Yup quite impressive indeed, congrats! (I would be quite curious to test this with other nodes as well at some point)

Are you okay with this mod layout?

I think I would simply put it in web_audio_api::node, as finally that's just some kind of AudioNode too:

interface AudioWorkletNode : AudioNode 

But no strong opinion here

How do you feel about rename RenderScope to AudioWorkletGlobalScope ?

On the one side I don't feel it has exactly the same responsibility as in the spec. But on the other side I don't think we can really have such "scope" behavior in Rust (?). So, maybe that's the closest thing we can provide and somehow it makes sens...

Maybe that's just a matter of properly documenting the tradeoff with the spec? I think you read the spec more than me here, what your guts are telling you? :)

Another thing I'm a bit unsure is the argument order for:

   fn process<'a, 'b>(
        &mut self,
        scope: &'b RenderScope,
        inputs: &'b [&'a [&'a [f32]]],
        outputs: &'b mut [&'a mut [&'a mut [f32]]],
        params: AudioParamValues<'b>,
    ) -> bool {

Why did you decided to put score as first argument here rather than as last one? It would make more sens to me to have it at the end I guess, as it is kind of an "extension" of the AudioWorkletProcessCallback API

@orottier
Copy link
Owner Author

I think I would simply put it in web_audio_api::node, as finally that's just some kind of AudioNode too:

But it also contains AudioParamValues and the AudioWorkletProcessor interface

Why did you decided to put scope as first argument here rather than as last one?

I haven't put much thought in it, but it felt like an extension of self. I will move it to the end though, good suggestion

@b-ma
Copy link
Collaborator

b-ma commented Nov 18, 2023

I think I would simply put it in web_audio_api::node, as finally that's just some kind of AudioNode too:

But it also contains AudioParamValues and the AudioWorkletProcessor interface

Hum right, then let's go with web_audio_api::worklet, that makes sens

@orottier orottier merged commit 65ce47f into main Nov 18, 2023
3 checks passed
@orottier orottier deleted the feature/audioworkletnode branch November 18, 2023 11:48
@orottier
Copy link
Owner Author

I have decided to merge this current work and leave the leftovers for a new issue #391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants