about GPT-4o results #7

SensetimeJiahao · 2025-01-07T03:57:57Z

Hi authors,

The work is truly inspiring! Could you share how you evaluated GPT-4o on the ET-bench, given that 4o cannot directly extract time-related information from video inputs? Additionally, do you have plans to release the pipeline/prompts used for evaluating commercial MLLM? Thanks a lot.

yeliudev · 2025-01-15T07:49:11Z

Hi @SensetimeJiahao, thanks for your interest in our work!

For GPT-4o and GPT-4V, we uniformly sample 8 frames and append the following prompt before each sample's instruction.

You are given 8 uniformly sampled frames from a {duration} seconds long video. Try your best to understand it and do not refuse to answer.

For Gemini-1.5-Pro and Gemini-1.5-Flash, we simply upload the raw video files and use exactly the same prompts (those in the annotation files) as open-source models.

Please kindly let me know if you have further questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about GPT-4o results #7

about GPT-4o results #7

SensetimeJiahao commented Jan 7, 2025

yeliudev commented Jan 15, 2025

about GPT-4o results #7

about GPT-4o results #7

Comments

SensetimeJiahao commented Jan 7, 2025

yeliudev commented Jan 15, 2025