Awesome-Text-only-Training

The project is used to store text-only training, image-free training for multimodal tasks related papers.

Include text-only training for:

papers

[AAAI] | [ 1] Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training [paper] [code][⭐8]
[ACM] | [ 1] TOMGPT: Reliable Text-Only Training Approach for Cost-Effective Multi-modal Large Language Model[paper]
[IJCV] | [ 5] Learning to Prompt with Text Only Supervision for Vision-Language Models[paper] [code][⭐80]
[arxiv] | [ 0] Text Data-Centric Image Captioning with Interactive Prompts[paper]
[arxiv] | [ 2] MeaCap: Memory-Augmented Zero-shot Image Captioning[paper] [code][⭐27]
[arxiv] | [ 0] ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks [paper]
[arxiv] | [ 0] Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion [paper]
[arxiv] | [ 0] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning [paper] [code][⭐4]
[arxiv] | [ 0] From Unimodal to Multimodal: Scaling up Projectors to Align Modalities [paper]
[arxiv] | [ 0] Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning [paper] [code][⭐0]
[arxiv] | [ 0] DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning [paper]

[arxiv] [ 1] Improved Factorized Neural Transducer Model For text-only Domain Adaptation[paper]
[AAAI] [ 2] Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning[paper]
[ACM] [ 0] Text-Only Training for Visual Storytelling[paper]
[ACM] [ 1] VLIS: Unimodal Language Models Guide Multimodal Language Generation[paper] [code][⭐24]
[NeurlPS] [ 7] LOVM:Language-Only Vision Model Selection[paper] [code][⭐18]
[IJCAI] [ 4] From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping[paper] [code][⭐11]
[ICLR] [ 47] Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training[paper] [code][⭐118]
[DCASE] [ 4] Weakly-supervised Automated Audio Captioning via text only training[paper] [code][⭐1]
[ACM] [ 3] CgT-GAN: CLIP-guided Text GAN for Image Captioning[paper] [code][⭐16]

[EMNLP] [ 64] Text-Only Training for Image Captioning using Noise-Injected CLIP[paper] [code][⭐179]
[ICCV] [ 11] I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision[paper][code][⭐55]
[arxiv] [ 28] Multimodal Knowledge Alignment with Reinforcement Learning[paper][code][⭐22]

[CVPR] [ 124] LAFITE: Towards Language-Free Training for Text-to-Image Generation[paper] [code][⭐180]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md