Can I get the output of encoder❓ Questions / Help / Support #28

wuxx1624 · 2020-10-29T23:39:06Z

wuxx1624
Oct 29, 2020

❓ Questions and Help

Hello, thanks for the great work. I would like to use the model as a feature extractor. How can I get the output of the encoder? Thank you!

Answered by tugstugi

Nov 11, 2020

@miraodasilva here is the audio normalization: mravanelli/SincNet#74 (comment)

View full answer

snakers4 · 2020-10-30T05:40:09Z

snakers4
Oct 30, 2020
Maintainer

Well, this is a definitely nice feature to have in future in V2
But for now you just take a torch version of the model and tinker with it
As for TF and ONNX, I cannot really say

How are you planning to use the encoder?

0 replies

miraodasilva · 2020-11-10T20:49:28Z

miraodasilva
Nov 10, 2020

Hi all,

@wuxx1624 did you manage to get it to work? I also want to use this model as a feature extractor: basically I would like to take the features from the output of the encoder (N_FRAMES@25fps x 512) and use them for another task. I can load the model and access the encoder using "model.encoder", but I cannot see what other preprocessing is done before running through the encoder, namely the STFT details. Basically I would like to take a look at the model's forward() function.

Please excuse my potential stupidity and overall ignorance of jit as a format, but is there a way for me to look at the actual code from the jit file? Thanks a lot in advance.

0 replies

tugstugi · 2020-11-11T17:39:09Z

tugstugi
Nov 11, 2020

@miraodasilva here is the audio normalization: mravanelli/SincNet#74 (comment)

0 replies

miraodasilva · 2020-11-11T18:26:17Z

miraodasilva
Nov 11, 2020

I see, so I should:

Compute the STFT using the parameters mentioned above
self.stft = STFT(self.n_fft, # 320 self.hop_length, # 160 self.n_fft) # 320"
Apply the normalization you referenced
feed this into the model

Is this correct? If so, should I use the hamming window for the STFT? Should I use torch.stft? Again, it would be great if I could just take a look at the forward function or the STFT implementation, as even small differences/imprecisions in the implementation can yield drastically different results in my experience. Thanks.

0 replies

tugstugi · 2020-11-11T19:25:14Z

tugstugi
Nov 11, 2020

@miraodasilva you can use this auto normalisation and recover the encoder/decoder from the torchscript. I can get 1:1 output :) But I will not share the code, because the @snakers4 doesn't want the release the code. But if you want, you can really recover the exact PyTorch code. I have even trained this model on LibriSpeech and reported the result somewhere in tickets.

If you use ReZero-ed version of the encoder and transformer, you can reach also much faster convergence.

0 replies

miraodasilva · 2020-11-11T23:20:48Z

miraodasilva
Nov 11, 2020

if I follow the steps above (using nvidia STFT as mentioned in the thread), I get a 10x512x13 for input 10x16000, is this what you obtain as well @tugstugi ? I was under the impression I was supposed to get features @ 25 fps ie. 10x512x25 for this input. Thanks a lot in advance.

0 replies

tugstugi · 2020-11-12T00:52:33Z

tugstugi
Nov 12, 2020

16000 / 160 (hop_length) / 8x reduction = 12.5 so 13 is correct

0 replies

snakers4 · 2020-11-12T04:26:21Z

snakers4
Nov 12, 2020
Maintainer

If you use ReZero-ed version of the encoder and transformer, you can reach also much faster convergence.

This is strange
But I guess our training pipelines may be just radically different

recover the encoder/decoder from the torchscript

I was reluctant to mention it, but yes, it is no big deal, I assume everyone knows about it
Albeit the code there probably is not human-friendly

0 replies

snakers4 · 2021-04-20T15:00:46Z

snakers4
Apr 20, 2021
Maintainer

This release solves this
#55

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I get the output of encoder❓ Questions / Help / Support #28

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can I get the output of encoder❓ Questions / Help / Support #28

wuxx1624 Oct 29, 2020

❓ Questions and Help

Replies: 9 comments

snakers4 Oct 30, 2020 Maintainer

miraodasilva Nov 10, 2020

tugstugi Nov 11, 2020

miraodasilva Nov 11, 2020

tugstugi Nov 11, 2020

miraodasilva Nov 11, 2020

tugstugi Nov 12, 2020

snakers4 Nov 12, 2020 Maintainer

snakers4 Apr 20, 2021 Maintainer

wuxx1624
Oct 29, 2020

snakers4
Oct 30, 2020
Maintainer

miraodasilva
Nov 10, 2020

tugstugi
Nov 11, 2020

miraodasilva
Nov 11, 2020

tugstugi
Nov 11, 2020

miraodasilva
Nov 11, 2020

tugstugi
Nov 12, 2020

snakers4
Nov 12, 2020
Maintainer

snakers4
Apr 20, 2021
Maintainer