Dependent pre-trained models

Our code used the following publicly available pre-trained models for visual processing, language modeling, and vision-and-language reasoning. We also provide below the licenses associated with these models, and the download links that we used. Note that we do not use or modify the code behind these pre-trained models, and one may use other implementations of these models, if needed.

Name	License	Link
BERT/HuggingFace	Apache-2.0	https://huggingface.co/docs/transformers/model_doc/bert
GPT2/HuggingFace	MIT	https://huggingface.co/gpt2
MAE/HuggingFace	Apache-2.0	https://huggingface.co/facebook/vit-mae-large
CrossTransformer	MIT	https://github.com/lucidrains/vit-pytorch
CLIP/OpenAI	MIT	https://github.com/openai/CLIP
FLAVA/HuggingFace	BSD-3-Clause	https://huggingface.co/facebook/flava-full

We also used the following models that are part of the torchvision toolbox of PyTorch (released under BSD-3-Clause)

Name	License	Link
AlexNet/VGG	BSD-3-Clause	https://pytorch.org/vision/stable/models.html
ResNet-50	BSD-3-Clause	https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html
ResNet-18	BSD-3-Clause	https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html
ViT-16	BSD-3-Clause	https://pytorch.org/vision/main/models/vision_transformer.html
Swin_b, Swin_t	BSD-3-Clause	https://pytorch.org/vision/main/models/swin_transformer.html

We used GloVe language embeddings from the torchtext toolbox of PyTorch.

Name	License	Link
GloVe from TorchText	BSD-3-Clause	https://pypi.org/project/torchtext/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPENDENCIES.md

DEPENDENCIES.md

Dependent pre-trained models

Files

DEPENDENCIES.md

Latest commit

History

DEPENDENCIES.md

File metadata and controls

Dependent pre-trained models