Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to perceive improvement. #17

Open
daikankan opened this issue Nov 1, 2023 · 3 comments
Open

Unable to perceive improvement. #17

daikankan opened this issue Nov 1, 2023 · 3 comments

Comments

@daikankan
Copy link

daikankan commented Nov 1, 2023

Thanks for sharing this work, good insight and inspiring.
But I'm unable to perceive improvement of the pretrained model.
My inference of E_expression:
For images, inputs: just concat same image (1, 5, 224, 224), output: Flame parameters (5, 53), chose the center one's param ouput[2, :]
For videos, inputs: just concat same frame (1, 5, 224, 224), output: Flame parameters (5, 53), chose the center one's param ouput[2, :]. Or maybe I should try to concat continous 5 frames for E_expression input ?
My mean_shape for alignment is consist with author's.
Comparison of the results (talkinghead videos and single image reconstruction) between E_flame_without_E_expression and E_flame_with_expression:

E_flame_without_E_expression:

talkinghead_E_flame_without_E_expression.mp4

msk_E_flame_without_E_expression

obm_E_flame_without_E_expression

E_flame_with_expression:

talkinghead_E_flame_with_E_expression.mp4

msk_E_flame_with_E_expression

obm_E_flame_with_E_expression

Sorry, my test maybe not sufficient,and my preprocess maybe not accurate.

@filby89
Copy link
Owner

filby89 commented Feb 6, 2024

Hey,
thanks for your interest and for bringing this up. Generally SPECTRE is trained on videos where a human talks using a perceptual lipread loss between the original video and the rendered video. The lipread loss increases the perception of speech from the output 3D mesh.
Note however, that the perception of speech is not depicted in the ~18 2D mouth landmarks you show here. This is an important reason why methods which score lower error for landmark placements are not necessarily better in terms of human perception (geometric errors do not correlate with human perception).

A better way to compare SPECTRE with another method would be by rendering the out 3D mesh in a video and comparing the two visually.

Also a final note: in some cases the lipread loss will even exaggerate the mouth a bit (e.g. add more protrusion and roundedness than visible) in order to better capture the perception of speech, which will result in even worse landmark placement compared to other methods.

@agupta54
Copy link

agupta54 commented Mar 7, 2024

Hi @daikankan can you please explain how you are pasting back the rendered avatar back in the video?

@daikankan
Copy link
Author

daikankan commented Mar 15, 2024

@agupta54

just opencv: circle rectangle and puttext

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants