Input Dimensionality Mismatch #56

Shrey-55 · 2022-06-14T05:58:07Z

Shrey-55
Jun 14, 2022

I was trying to implement your model but there was a problem. The text encodings for each word are of length 773(768+5) and the audio input is of the length 64(depending on which feature extractor is used). Concatenating these gives an input vector of 837 for each frame, whereas in the paper(Section 4.4) it is stated "..Speech-encoding dimensionality 124 for each ..". I went through the implementation too, but didn't find any form of dimensionality reduction (PCA or any other) done for the input vectors(text or the audio). Please let me know how did you get the input dimensions as 124 because taking it as 837 results in a huge network.

Answered by ghenter

Jun 14, 2022

Hi @Shrey-55,

My understanding is that the first thing that happens is that each frame (the high-dimensional text+audio vector) is passed through a feed-forward network that encodes it to a 124-dimensional vector. (If you are familiar with CNNs, this can alternatively be seen as a "1x1 convolution".) I don't know where in the code this happens, but the paper does include a description of this dimensionality reduction:

First, the text and audio features of each frame are jointly encoded by a feed-forward neural network to reduce dimensionality.

View full answer

ghenter · 2022-06-14T07:52:51Z

ghenter
Jun 14, 2022

Hi @Shrey-55,

My understanding is that the first thing that happens is that each frame (the high-dimensional text+audio vector) is passed through a feed-forward network that encodes it to a 124-dimensional vector. (If you are familiar with CNNs, this can alternatively be seen as a "1x1 convolution".) I don't know where in the code this happens, but the paper does include a description of this dimensionality reduction:

First, the text and audio features of each frame are jointly encoded by a feed-forward neural network to reduce dimensionality.

4 replies

Svito-zar Jun 14, 2022
Maintainer

Exactly! You are right about it @ghenter, the speech features are encoded into a lower-dimensional representations using a feed-forward neural network

Shrey-55 Jun 14, 2022
Author

Thanks a lot, @ghenter for pointing it out! @Svito-zar it would be really helpful if you could please point out where is this implemented in the code! Thanks again!

Svito-zar Jun 14, 2022
Maintainer

@Shrey-55 , here is the corresponding line of code:
https://github.com/Svito-zar/gesticulator/blob/master/gesticulator/model/model.py#L289

Shrey-55 Jun 15, 2022
Author

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input Dimensionality Mismatch #56

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Input Dimensionality Mismatch #56

Shrey-55 Jun 14, 2022

Replies: 1 comment · 4 replies

ghenter Jun 14, 2022

Svito-zar Jun 14, 2022 Maintainer

Shrey-55 Jun 14, 2022 Author

Svito-zar Jun 14, 2022 Maintainer

Shrey-55 Jun 15, 2022 Author

Shrey-55
Jun 14, 2022

Replies: 1 comment 4 replies

ghenter
Jun 14, 2022

Svito-zar Jun 14, 2022
Maintainer

Shrey-55 Jun 14, 2022
Author

Svito-zar Jun 14, 2022
Maintainer

Shrey-55 Jun 15, 2022
Author