Unable to perceive improvement. #17

daikankan · 2023-11-01T03:51:59Z

Thanks for sharing this work, good insight and inspiring.
But I'm unable to perceive improvement of the pretrained model.
My inference of E_expression:
For images, inputs: just concat same image (1, 5, 224, 224), output: Flame parameters (5, 53), chose the center one's param ouput[2, :]
For videos, inputs: just concat same frame (1, 5, 224, 224), output: Flame parameters (5, 53), chose the center one's param ouput[2, :]. Or maybe I should try to concat continous 5 frames for E_expression input ?
My mean_shape for alignment is consist with author's.
Comparison of the results (talkinghead videos and single image reconstruction) between E_flame_without_E_expression and E_flame_with_expression:

E_flame_without_E_expression:

talkinghead_E_flame_without_E_expression.mp4

E_flame_with_expression:

talkinghead_E_flame_with_E_expression.mp4

Sorry, my test maybe not sufficient，and my preprocess maybe not accurate.

filby89 · 2024-02-06T16:57:31Z

Hey,
thanks for your interest and for bringing this up. Generally SPECTRE is trained on videos where a human talks using a perceptual lipread loss between the original video and the rendered video. The lipread loss increases the perception of speech from the output 3D mesh.
Note however, that the perception of speech is not depicted in the ~18 2D mouth landmarks you show here. This is an important reason why methods which score lower error for landmark placements are not necessarily better in terms of human perception (geometric errors do not correlate with human perception).

A better way to compare SPECTRE with another method would be by rendering the out 3D mesh in a video and comparing the two visually.

Also a final note: in some cases the lipread loss will even exaggerate the mouth a bit (e.g. add more protrusion and roundedness than visible) in order to better capture the perception of speech, which will result in even worse landmark placement compared to other methods.

agupta54 · 2024-03-07T10:35:42Z

Hi @daikankan can you please explain how you are pasting back the rendered avatar back in the video?

daikankan · 2024-03-15T03:54:23Z

@agupta54

just opencv: circle rectangle and puttext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to perceive improvement. #17

Unable to perceive improvement. #17

daikankan commented Nov 1, 2023 •

edited

Loading

filby89 commented Feb 6, 2024 •

edited

Loading

agupta54 commented Mar 7, 2024

daikankan commented Mar 15, 2024 •

edited

Loading

Unable to perceive improvement. #17

Unable to perceive improvement. #17

Comments

daikankan commented Nov 1, 2023 • edited Loading

filby89 commented Feb 6, 2024 • edited Loading

agupta54 commented Mar 7, 2024

daikankan commented Mar 15, 2024 • edited Loading

daikankan commented Nov 1, 2023 •

edited

Loading

filby89 commented Feb 6, 2024 •

edited

Loading

daikankan commented Mar 15, 2024 •

edited

Loading