You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The work is truly inspiring! Could you share how you evaluated GPT-4o on the ET-bench, given that 4o cannot directly extract time-related information from video inputs? Additionally, do you have plans to release the pipeline/prompts used for evaluating commercial MLLM? Thanks a lot.
The text was updated successfully, but these errors were encountered:
For GPT-4o and GPT-4V, we uniformly sample 8 frames and append the following prompt before each sample's instruction.
You are given 8 uniformly sampled frames from a {duration} seconds long video. Try your best to understand it and do not refuse to answer.
For Gemini-1.5-Pro and Gemini-1.5-Flash, we simply upload the raw video files and use exactly the same prompts (those in the annotation files) as open-source models.
Please kindly let me know if you have further questions.
Hi authors,
The work is truly inspiring! Could you share how you evaluated GPT-4o on the ET-bench, given that 4o cannot directly extract time-related information from video inputs? Additionally, do you have plans to release the pipeline/prompts used for evaluating commercial MLLM? Thanks a lot.
The text was updated successfully, but these errors were encountered: