Skip to content

Efficient End-to-End Video-Question Answering with Pyramidal Multimodal Transformer - AAAI23

License

Notifications You must be signed in to change notification settings

Trunpm/PMT-AAAI23

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PMT-AAAI23

Efficient End-to-End Video-Question Answering with Pyramidal Multimodal Transformer - AAAI23

PMT

This is the PyTorch Implementation of our paper "[Efficient End-to-End Video-Question Answering with Pyramidal Multimodal Transformer]". (accepted by AAAI’23)

alt text

Data Preparation

  • Download the dataset
    MSVD-QA: link
    MSRVTT-QA: link
    TGIF-QA: link
    ActivityNet-QA: link Youtube2Text-QA: please ref link For the text-to-video retrieval task in our ablation study, pleade ref link

  • Word Glove Embedding and Video Frames extraction

    1. To extract questions or answers Glove Embedding, please ref here.
      Take the action task in TGIF-QA dataset as an example, we have features at the path /inputdata: TGIF/word/Action/TGIF_Action_train_questions.pt TGIF/word/Action/TGIF_Action_test_questions.pt TGIF/word/Action/TGIF_Action_vocab.json
    2. To extract video frames, we use the skvideo.io module to eatract the images and then transfer it to .npz format. for Action task in the TGIF-QA dataset as example, we have .npz files at the path /inputdata: TGIF/video/Action/tumblr_no00ddSlG31t34v14o1_250.npz TGIF/video/Action/tumblr_nd24xaX8d11qkb1azo1_250.npz ... TGIF/video/Action/tumblr_no00ddSlG31t34v14o1_250.npz TGIF/video/Action/tumblr_nd24xaX8d11qkb1azo1_250.npz ...

Reference

@article{peng2022PMT,
     title={Efficient End-to-End Video-Question Answering with Pyramidal Multimodal Transformer},
     author={Peng Min, Wang Chongyang, Shi Yu, Zhou Xiang-Dong},
     journal={Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI)},
     year={2023}}

About

Efficient End-to-End Video-Question Answering with Pyramidal Multimodal Transformer - AAAI23

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages