Skip to content

Latest commit

 

History

History
467 lines (356 loc) · 212 KB

filtered_topic_wise.md

File metadata and controls

467 lines (356 loc) · 212 KB

Topic-wise


AGI

Paper Name Status Topic Category Year Conference Author Summary Link
0 ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning Pending AGI, Dataset, Text 2019 AAAI Maarten Sap, Noah A. Smith, Ronan Le Bras, Yejin Choi link
1 COMET: Commonsense Transformers for Automatic Knowledge Graph Construction Pending AGI, Text , Transformers 2019 ACL Antoine Bosselut, Hannah Rashkin, Yejin Choi link
2 VisualCOMET: Reasoning about the Dynamic Context of a Still Image Pending AGI, Dataset, Image , Text , Transformers 2020 ECCV Ali Farhadi, Chandra Bhagavatula, Jae Sung Park, Yejin Choi link

Activation Function

Paper Name Status Topic Category Year Conference Author Summary Link
0 Self-Normalizing Neural Networks Pending Activation Function, Tabular Optimizations, Tips & Tricks 2017 NIPS Andreas Mayr, Günter Klambauer, Thomas Unterthiner link
1 A Comprehensive Guide on Activation Functions This week Activation Function 2020 Blog Ygor Rebouças Serpa link

Attention

Paper Name Status Topic Category Year Conference Author Summary Link
0 Attention is All you Need Read Attention, Text , Transformers Architecture 2017 NIPS Ashish Vaswani, Illia Polosukhin, Noam Shazeer, Łukasz Kaiser Talks about Transformer architecture which brings SOTA performance for different tasks in NLP link
1 GPT-2 (Language Models are Unsupervised Multitask Learners) Pending Attention, Text , Transformers 2019 Alec Radford, Dario Amodei, Ilya Sutskever, Jeffrey Wu link
2 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Read Attention, Text , Transformers Embeddings 2018 NAACL Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang BERT is an extension to Transformer based architecture which introduces a masked word pretraining and next sentence prediction task to pretrain the model for a wide variety of tasks. link
3 SAGAN: Self-Attention Generative Adversarial Networks Pending Attention, GANs, Image Architecture 2018 arXiv Augustus Odena, Dimitris Metaxas, Han Zhang, Ian Goodfellow link
4 Single Headed Attention RNN: Stop Thinking With Your Head Pending Attention, LSTMs, Text Optimization-No. of params 2019 arXiv Stephen Merity link
5 Reformer: The Efficient Transformer Read Attention, Text , Transformers Architecture, Optimization-Memory, Optimization-No. of params 2020 arXiv Anselm Levskaya, Lukasz Kaiser, Nikita Kitaev Overcome time and memory complexity of Transformers by bucketing Query, Keys and using Reversible residual connections. link
6 Language-Agnostic BERT Sentence Embedding Read Attention, Siamese Network, Text , Transformers Embeddings 2020 arXiv Fangxiaoyu Feng, Yinfei Yang A BERT model with multilingual sentence embeddings learned over 112 languages and Zero-shot learning over unseen languages. link
7 T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Read Attention, Text , Transformers 2020 JMLR Colin Raffel, Noam Shazeer, Peter J. Liu, Wei Liu, Yanqi Zhou Presents a Text-to-Text transformer model with multi-task learning capabilities, simultaneously solving problems such as machine translation, document summarization, question answering, and classification tasks. link
8 GPT-f: Generative Language Modeling for Automated Theorem Proving Pending Attention, Transformers 2020 arXiv Ilya Sutskever, Stanislas Polu link
9 Vision Transformer: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Pending Attention, Image , Transformers 2021 ICLR Alexey Dosovitskiy, Jakob Uszkoreit, Lucas Beyer, Neil Houlsby link

CNNs

Paper Name Status Topic Category Year Conference Author Summary Link
0 ZF Net (Visualizing and Understanding Convolutional Networks) Read CNNs, CV , Image Visualization 2014 ECCV Matthew D. Zeiler, Rob Fergus Visualize CNN Filters / Kernels using De-Convolutions on CNN filter activations. link
1 Inception-v1 (Going Deeper With Convolutions) Read CNNs, CV , Image Architecture 2015 CVPR Christian Szegedy, Wei Liu Propose the use of 1x1 conv operations to reduce the number of parameters in a deep and wide CNN link
2 ResNet (Deep Residual Learning for Image Recognition) Read CNNs, CV , Image Architecture 2016 CVPR Kaiming He, Xiangyu Zhang Introduces Residual or Skip Connections to allow increase in the depth of a DNN link
3 MobileNet (Efficient Convolutional Neural Networks for Mobile Vision Applications) Pending CNNs, CV , Image Architecture, Optimization-No. of params 2017 arXiv Andrew G. Howard, Menglong Zhu link
4 Evaluation of neural network architectures for embedded systems Read CNNs, CV , Image Comparison 2017 IEEE ISCAS Adam Paszke, Alfredo Canziani, Eugenio Culurciello Compare CNN classification architectures on accuracy, memory footprint, parameters, operations count, inference time and power consumption. link
5 SqueezeNet Read CNNs, CV , Image Architecture, Optimization-No. of params 2016 arXiv Forrest N. Iandola, Song Han Explores model compression by using 1x1 convolutions called fire modules. link
6 Pruning Filters for Efficient ConvNets Pending CNNs, CV , Image Optimization-No. of params 2017 arXiv Asim Kadav, Hao Li link
7 Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet Reading CNNs, CV , Image 2019 arXiv Matthias Bethge, Wieland Brendel link
8 Breaking neural networks with adversarial attacks Pending CNNs, Image Adversarial 2019 Blog Anant Jain link
9 Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs Pending CNNs, Image 2020 arXiv Ari S. Morcos, David J. Schwab, Jonathan Frankle link
10 Arbitrary Style Transfer in Real-Time With Adaptive Instance Normalization Pending CNNs, Image 2017 ICCV Serge Belongie, Xun Huang link
11 Few-Shot Learning with Localization in Realistic Settings Pending CNNs, Image Few-shot-learning 2019 CVPR Bharath Hariharan, Davis Wertheimer link
12 Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition Pending CNNs, Image Few-shot-learning 2020 CVPR Bharath Hariharan, Davis Wertheimer, Luming Tang link
13 Occupancy Anticipation for Efficient Exploration and Navigation Pending CNNs, Image Reinforcement-Learning 2020 ECCV Kristen Grauman, Santhosh K. Ramakrishnan, Ziad Al-Halah link
14 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link

CV

Paper Name Status Topic Category Year Conference Author Summary Link
0 ZF Net (Visualizing and Understanding Convolutional Networks) Read CNNs, CV , Image Visualization 2014 ECCV Matthew D. Zeiler, Rob Fergus Visualize CNN Filters / Kernels using De-Convolutions on CNN filter activations. link
1 Inception-v1 (Going Deeper With Convolutions) Read CNNs, CV , Image Architecture 2015 CVPR Christian Szegedy, Wei Liu Propose the use of 1x1 conv operations to reduce the number of parameters in a deep and wide CNN link
2 ResNet (Deep Residual Learning for Image Recognition) Read CNNs, CV , Image Architecture 2016 CVPR Kaiming He, Xiangyu Zhang Introduces Residual or Skip Connections to allow increase in the depth of a DNN link
3 MobileNet (Efficient Convolutional Neural Networks for Mobile Vision Applications) Pending CNNs, CV , Image Architecture, Optimization-No. of params 2017 arXiv Andrew G. Howard, Menglong Zhu link
4 Evaluation of neural network architectures for embedded systems Read CNNs, CV , Image Comparison 2017 IEEE ISCAS Adam Paszke, Alfredo Canziani, Eugenio Culurciello Compare CNN classification architectures on accuracy, memory footprint, parameters, operations count, inference time and power consumption. link
5 SqueezeNet Read CNNs, CV , Image Architecture, Optimization-No. of params 2016 arXiv Forrest N. Iandola, Song Han Explores model compression by using 1x1 convolutions called fire modules. link
6 Pruning Filters for Efficient ConvNets Pending CNNs, CV , Image Optimization-No. of params 2017 arXiv Asim Kadav, Hao Li link
7 A 2019 guide to Human Pose Estimation with Deep Learning Pending CV , Pose Estimation Comparison 2019 Blog Sudharshan Chandra Babu link
8 A Simple yet Effective Baseline for 3D Human Pose Estimation Pending CV , Pose Estimation 2017 ICCV James J. Little, Javier Romero, Julieta Martinez, Rayat Hossain link
9 Bag of Tricks for Image Classification with Convolutional Neural Networks Read CV , Image Optimizations, Tips & Tricks 2018 arXiv Tong He, Zhi Zhang Shows a dozen tricks (mixup, label smoothing, etc.) to improve CNN accuracy and training time. link
10 Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet Reading CNNs, CV , Image 2019 arXiv Matthias Bethge, Wieland Brendel link
11 Capsule Networks: Dynamic Routing Between Capsules Pending CV , Image Architecture 2017 arXiv Geoffrey E Hinton, Nicholas Frosst, Sara Sabour link
12 Understanding Loss Functions in Computer Vision Pending CV , GANs, Image , Loss Function Comparison, Tips & Tricks 2020 Blog Sowmya Yellapragada link
13 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link

Dataset

Paper Name Status Topic Category Year Conference Author Summary Link
0 ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning Pending AGI, Dataset, Text 2019 AAAI Maarten Sap, Noah A. Smith, Ronan Le Bras, Yejin Choi link
1 VisualCOMET: Reasoning about the Dynamic Context of a Still Image Pending AGI, Dataset, Image , Text , Transformers 2020 ECCV Ali Farhadi, Chandra Bhagavatula, Jae Sung Park, Yejin Choi link
2 Symbolic Knowledge Distillation: from General Language Models to Commonsense Models Pending Dataset, Text , Transformers Optimizations, Tips & Tricks 2021 arXiv Chandra Bhagavatula, Jack Hessel, Peter West, Yejin Choi link
3 Large Language Models for Data Annotation: A Survey This week Dataset, Generative, Large-Language-Models Prompting, Tips & Tricks 2024 arXiv Alimohammad Beigi, Zhen Tan link

GANs

Paper Name Status Topic Category Year Conference Author Summary Link
0 SAGAN: Self-Attention Generative Adversarial Networks Pending Attention, GANs, Image Architecture 2018 arXiv Augustus Odena, Dimitris Metaxas, Han Zhang, Ian Goodfellow link
1 Pix2Pix: Image-to-Image Translation with Conditional Adversarial Nets Read GANs, Image 2017 CVPR Alexei A. Efros, Jun-Yan Zhu, Phillip Isola, Tinghui Zhou Image to image translation using Conditional GANs and dataset of image pairs from one domain to another. link
2 CycleGAN: Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks Pending GANs, Image Architecture 2017 ICCV Alexei A. Efros, Jun-Yan Zhu, Phillip Isola, Taesung Park link
3 Unsupervised Machine Translation Using Monolingual Corpora Only Pending GANs, NMT, Text , Transformers Unsupervised 2017 arXiv Alexis Conneau, Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato, Myle Ott link
4 WGAN: Wasserstein GAN Pending GANs, Loss Function 2017 arXiv Léon Bottou, Martin Arjovsky, Soumith Chintala link
5 Spectral Normalization for GANs Pending GANs, Normalization Optimizations 2018 arXiv Masanori Koyama, Takeru Miyato, Toshiki Kataoka, Yuichi Yoshida link
6 Understanding Loss Functions in Computer Vision Pending CV , GANs, Image , Loss Function Comparison, Tips & Tricks 2020 Blog Sowmya Yellapragada link
7 StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks Pending GANs, Image 2019 CVPR Samuli Laine, Tero Karras, Timo Aila link
8 Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? Pending GANs, Image 2019 ICCV Peter Wonka, Rameen Abdal, Yipeng Qin link
9 Improved Techniques for Training GANs Pending GANs, Image Semi-Supervised 2016 NIPS Alec Radford, Ian Goodfellow, Tim Salimans, Vicki Cheung, Wojciech Zaremba, Xi Chen link
10 AnimeGAN: Towards the Automatic Anime Characters Creation with Generative Adversarial Networks Pending GANs, Image 2017 NIPS Jiakai Zhang, Minjun Li, Yanghua Jin link
11 Progressive Growing of GANs for Improved Quality, Stability, and Variation Pending GANs, Image Tips & Tricks 2018 ICLR Jaakko Lehtinen, Samuli Laine, Tero Karras, Timo Aila link
12 BEGAN: Boundary Equilibrium Generative Adversarial Networks Pending GANs, Image 2017 arXiv David Berthelot, Luke Metz, Thomas Schumm link
13 StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation Pending GANs, Image 2018 CVPR Jaegul Choo, Jung-Woo Ha, Minje Choi, Munyoung Kim, Sunghun Kim, Yunjey Choi link
14 IMLE-GAN: Inclusive GAN: Improving Data and Minority Coverage in Generative Models Pending GANs 2020 arXiv Jitendra Malik, Ke Li, Larry Davis, Mario Fritz, Ning Yu, Peng Zhou link
15 TransGAN: Two Transformers Can Make One Strong GAN Pending GANs, Image , Transformers Architecture 2021 arXiv Shiyu Chang, Yifan Jiang, Zhangyang Wang link

Generative

Paper Name Status Topic Category Year Conference Author Summary Link
0 Transforming Sequence Tagging Into A Seq2Seq Task Pending Generative, Text Comparison, Tips & Tricks 2022 arXiv Iftekhar Naim, Karthik Raman, Krishna Srinivasan link
1 Large Language Models are Zero-Shot Reasoners Pending Generative, Question-Answering, Text Tips & Tricks, Zero-shot-learning 2022 arXiv Takeshi Kojima, Yusuke Iwasawa link
2 Flan-T5: Scaling Instruction-Finetuned Language Models Pending Generative, Text , Transformers Architecture, Pre-Training 2022 arXiv Hyung Won Chung, Le Hou link
3 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link
4 Scaling Instruction-Finetuned Language Models (FLAN) Pending Generative, Large-Language-Models, Question-Answering, Text , Transformers Instruction-Finetuning 2022 arXiv Hyung Won Chung, Jason Wei, Jeffrey Dean, Le Hou, Quoc V. Le, Shayne Longpre https://arxiv.org/abs/2210.11416 introduces FLAN (Fine-tuned LAnguage Net), an instruction finetuning method, and presents the results of its application. The study demonstrates that by fine-tuning the 540B PaLM model on 1836 tasks while incorporating Chain-of-Thought Reasoning data, FLAN achieves improvements in generalization, human usability, and zero-shot reasoning over the base model. The paper also provides detailed information on how each these aspects was evaluated. link
5 ReAct: Synergizing Reasoning and Acting in Language Models Pending Generative, Large-Language-Models, Text Optimizations, Tips & Tricks 2023 ICLR Dian Yu, Izhak Shafran, Jeffrey Zhao, Karthik Narasimhan, Nan Du, Shunyu Yao, Yuan Cao This paper introduces ReAct, a novel approach that leverages Large Language Models (LLMs) to interleave reasoning traces and task-specific actions. ReAct outperforms existing methods on various language and decision-making tasks, addressing issues like hallucination, error propagation, and improving human interpretability and trustworthiness. link
6 Training language models to follow instructions with human feedback Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning, Reinforcement-Learning, Semi-Supervised 2022 arXiv Carroll L. Wainwright, Diogo Almeida, Jan Leike, Jeff Wu, Long Ouyang, Pamela Mishkin, Paul Christiano, Ryan Lowe, Xu Jiang This paper presents InstructGPT, a model fine-tuned with human feedback to better align with user intent across various tasks. Despite having significantly fewer parameters than larger models, InstructGPT outperforms them in human evaluations, demonstrating improved truthfulness, reduced toxicity, and minimal performance regressions on public NLP datasets, highlighting the potential of fine-tuning with human feedback for enhancing language model alignment with human intent. link
7 Constitutional AI: Harmlessness from AI Feedback Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning, Reinforcement-Learning, Unsupervised 2022 arXiv Jared Kaplan, Yuntao Ba The paper introduces Constitutional AI, a method for training a safe AI assistant without human-labeled data on harmful outputs. It combines supervised learning and reinforcement learning phases, enabling the AI to engage with harmful queries by explaining its objections, thus improving control, transparency, and human-judged performance with minimal human oversight. link
8 Self-Alignment with Instruction Backtranslation Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning 2023 arXiv Jason Weston, Mike Lewis, Ping Yu, Xian Li The paper introduces a scalable method called "instruction backtranslation" to create a high-quality instruction-following language model. This method involves self-augmentation and self-curation of training examples generated from web documents, resulting in a model that outperforms others in its category without relying on distillation data, showcasing its effective self-alignment capability. link
9 Table-GPT: Table-tuned GPT for Diverse Table Tasks Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning 2023 arXiv link
10 Large Language Models for Data Annotation: A Survey This week Dataset, Generative, Large-Language-Models Prompting, Tips & Tricks 2024 arXiv Alimohammad Beigi, Zhen Tan link

GraphNN

Paper Name Status Topic Category Year Conference Author Summary Link
0 Graph Neural Network: Relational inductive biases, deep learning, and graph networks Pending GraphNN Architecture 2018 arXiv Jessica B. Hamrick, Oriol Vinyals, Peter W. Battaglia link

Image

Paper Name Status Topic Category Year Conference Author Summary Link
0 ZF Net (Visualizing and Understanding Convolutional Networks) Read CNNs, CV , Image Visualization 2014 ECCV Matthew D. Zeiler, Rob Fergus Visualize CNN Filters / Kernels using De-Convolutions on CNN filter activations. link
1 Inception-v1 (Going Deeper With Convolutions) Read CNNs, CV , Image Architecture 2015 CVPR Christian Szegedy, Wei Liu Propose the use of 1x1 conv operations to reduce the number of parameters in a deep and wide CNN link
2 ResNet (Deep Residual Learning for Image Recognition) Read CNNs, CV , Image Architecture 2016 CVPR Kaiming He, Xiangyu Zhang Introduces Residual or Skip Connections to allow increase in the depth of a DNN link
3 MobileNet (Efficient Convolutional Neural Networks for Mobile Vision Applications) Pending CNNs, CV , Image Architecture, Optimization-No. of params 2017 arXiv Andrew G. Howard, Menglong Zhu link
4 Evaluation of neural network architectures for embedded systems Read CNNs, CV , Image Comparison 2017 IEEE ISCAS Adam Paszke, Alfredo Canziani, Eugenio Culurciello Compare CNN classification architectures on accuracy, memory footprint, parameters, operations count, inference time and power consumption. link
5 SqueezeNet Read CNNs, CV , Image Architecture, Optimization-No. of params 2016 arXiv Forrest N. Iandola, Song Han Explores model compression by using 1x1 convolutions called fire modules. link
6 Pruning Filters for Efficient ConvNets Pending CNNs, CV , Image Optimization-No. of params 2017 arXiv Asim Kadav, Hao Li link
7 SAGAN: Self-Attention Generative Adversarial Networks Pending Attention, GANs, Image Architecture 2018 arXiv Augustus Odena, Dimitris Metaxas, Han Zhang, Ian Goodfellow link
8 Bag of Tricks for Image Classification with Convolutional Neural Networks Read CV , Image Optimizations, Tips & Tricks 2018 arXiv Tong He, Zhi Zhang Shows a dozen tricks (mixup, label smoothing, etc.) to improve CNN accuracy and training time. link
9 Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet Reading CNNs, CV , Image 2019 arXiv Matthias Bethge, Wieland Brendel link
10 Breaking neural networks with adversarial attacks Pending CNNs, Image Adversarial 2019 Blog Anant Jain link
11 Pix2Pix: Image-to-Image Translation with Conditional Adversarial Nets Read GANs, Image 2017 CVPR Alexei A. Efros, Jun-Yan Zhu, Phillip Isola, Tinghui Zhou Image to image translation using Conditional GANs and dataset of image pairs from one domain to another. link
12 CycleGAN: Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks Pending GANs, Image Architecture 2017 ICCV Alexei A. Efros, Jun-Yan Zhu, Phillip Isola, Taesung Park link
13 Capsule Networks: Dynamic Routing Between Capsules Pending CV , Image Architecture 2017 arXiv Geoffrey E Hinton, Nicholas Frosst, Sara Sabour link
14 Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs Pending CNNs, Image 2020 arXiv Ari S. Morcos, David J. Schwab, Jonathan Frankle link
15 Arbitrary Style Transfer in Real-Time With Adaptive Instance Normalization Pending CNNs, Image 2017 ICCV Serge Belongie, Xun Huang link
16 One-shot Text Field Labeling using Attention and Belief Propagation for Structure Information Extraction Pending Image , Text 2020 arXiv Jun Huang, Mengli Cheng, Minghui Qiu, Wei Lin, Xing Shi link
17 Topological Loss: Beyond the Pixel-Wise Loss for Topology-Aware Delineation Pending Image , Loss Function, Segmentation 2018 CVPR Agata Mosinska, Mateusz Koziński, Pablo Márquez-Neila, Pascal Fua link
18 Understanding Loss Functions in Computer Vision Pending CV , GANs, Image , Loss Function Comparison, Tips & Tricks 2020 Blog Sowmya Yellapragada link
19 StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks Pending GANs, Image 2019 CVPR Samuli Laine, Tero Karras, Timo Aila link
20 Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? Pending GANs, Image 2019 ICCV Peter Wonka, Rameen Abdal, Yipeng Qin link
21 Improved Techniques for Training GANs Pending GANs, Image Semi-Supervised 2016 NIPS Alec Radford, Ian Goodfellow, Tim Salimans, Vicki Cheung, Wojciech Zaremba, Xi Chen link
22 AnimeGAN: Towards the Automatic Anime Characters Creation with Generative Adversarial Networks Pending GANs, Image 2017 NIPS Jiakai Zhang, Minjun Li, Yanghua Jin link
23 Progressive Growing of GANs for Improved Quality, Stability, and Variation Pending GANs, Image Tips & Tricks 2018 ICLR Jaakko Lehtinen, Samuli Laine, Tero Karras, Timo Aila link
24 BEGAN: Boundary Equilibrium Generative Adversarial Networks Pending GANs, Image 2017 arXiv David Berthelot, Luke Metz, Thomas Schumm link
25 StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation Pending GANs, Image 2018 CVPR Jaegul Choo, Jung-Woo Ha, Minje Choi, Munyoung Kim, Sunghun Kim, Yunjey Choi link
26 Few-Shot Learning with Localization in Realistic Settings Pending CNNs, Image Few-shot-learning 2019 CVPR Bharath Hariharan, Davis Wertheimer link
27 Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition Pending CNNs, Image Few-shot-learning 2020 CVPR Bharath Hariharan, Davis Wertheimer, Luming Tang link
28 VisualCOMET: Reasoning about the Dynamic Context of a Still Image Pending AGI, Dataset, Image , Text , Transformers 2020 ECCV Ali Farhadi, Chandra Bhagavatula, Jae Sung Park, Yejin Choi link
29 Occupancy Anticipation for Efficient Exploration and Navigation Pending CNNs, Image Reinforcement-Learning 2020 ECCV Kristen Grauman, Santhosh K. Ramakrishnan, Ziad Al-Halah link
30 Vision Transformer: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Pending Attention, Image , Transformers 2021 ICLR Alexey Dosovitskiy, Jakob Uszkoreit, Lucas Beyer, Neil Houlsby link
31 DALL·E: Creating Images from Text Pending Image , Text , Transformers 2021 Blog Aditya Ramesh, Gabriel Goh, Ilya Sutskever, Mikhail Pavlov, Scott Gray link
32 CLIP: Connecting Text and Images Pending Image , Text , Transformers Multimodal, Pre-Training 2021 arXiv Alec Radford, Ilya Sutskever, Jong Wook Kim link
33 Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision This week Image , Text , Transformers Multimodal 2020 EMNLP Hao Tan, Mohit Bansal link
34 TransGAN: Two Transformers Can Make One Strong GAN Pending GANs, Image , Transformers Architecture 2021 arXiv Shiyu Chang, Yifan Jiang, Zhangyang Wang link
35 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link

LSTMs

Paper Name Status Topic Category Year Conference Author Summary Link
0 Single Headed Attention RNN: Stop Thinking With Your Head Pending Attention, LSTMs, Text Optimization-No. of params 2019 arXiv Stephen Merity link

Large-Language-Models

Paper Name Status Topic Category Year Conference Author Summary Link
0 Training Compute-Optimal Large Language Models Pending Large-Language-Models, Transformers Architecture, Optimization-No. of params, Pre-Training, Tips & Tricks 2022 arXiv Jordan Hoffmann, Laurent Sifre, Oriol Vinyals, Sebastian Borgeaud link
1 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link
2 Scaling Instruction-Finetuned Language Models (FLAN) Pending Generative, Large-Language-Models, Question-Answering, Text , Transformers Instruction-Finetuning 2022 arXiv Hyung Won Chung, Jason Wei, Jeffrey Dean, Le Hou, Quoc V. Le, Shayne Longpre https://arxiv.org/abs/2210.11416 introduces FLAN (Fine-tuned LAnguage Net), an instruction finetuning method, and presents the results of its application. The study demonstrates that by fine-tuning the 540B PaLM model on 1836 tasks while incorporating Chain-of-Thought Reasoning data, FLAN achieves improvements in generalization, human usability, and zero-shot reasoning over the base model. The paper also provides detailed information on how each these aspects was evaluated. link
3 ReAct: Synergizing Reasoning and Acting in Language Models Pending Generative, Large-Language-Models, Text Optimizations, Tips & Tricks 2023 ICLR Dian Yu, Izhak Shafran, Jeffrey Zhao, Karthik Narasimhan, Nan Du, Shunyu Yao, Yuan Cao This paper introduces ReAct, a novel approach that leverages Large Language Models (LLMs) to interleave reasoning traces and task-specific actions. ReAct outperforms existing methods on various language and decision-making tasks, addressing issues like hallucination, error propagation, and improving human interpretability and trustworthiness. link
4 Training language models to follow instructions with human feedback Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning, Reinforcement-Learning, Semi-Supervised 2022 arXiv Carroll L. Wainwright, Diogo Almeida, Jan Leike, Jeff Wu, Long Ouyang, Pamela Mishkin, Paul Christiano, Ryan Lowe, Xu Jiang This paper presents InstructGPT, a model fine-tuned with human feedback to better align with user intent across various tasks. Despite having significantly fewer parameters than larger models, InstructGPT outperforms them in human evaluations, demonstrating improved truthfulness, reduced toxicity, and minimal performance regressions on public NLP datasets, highlighting the potential of fine-tuning with human feedback for enhancing language model alignment with human intent. link
5 Constitutional AI: Harmlessness from AI Feedback Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning, Reinforcement-Learning, Unsupervised 2022 arXiv Jared Kaplan, Yuntao Ba The paper introduces Constitutional AI, a method for training a safe AI assistant without human-labeled data on harmful outputs. It combines supervised learning and reinforcement learning phases, enabling the AI to engage with harmful queries by explaining its objections, thus improving control, transparency, and human-judged performance with minimal human oversight. link
6 Self-Alignment with Instruction Backtranslation Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning 2023 arXiv Jason Weston, Mike Lewis, Ping Yu, Xian Li The paper introduces a scalable method called "instruction backtranslation" to create a high-quality instruction-following language model. This method involves self-augmentation and self-curation of training examples generated from web documents, resulting in a model that outperforms others in its category without relying on distillation data, showcasing its effective self-alignment capability. link
7 Table-GPT: Table-tuned GPT for Diverse Table Tasks Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning 2023 arXiv link
8 Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering Pending Large-Language-Models Prompting, Tips & Tricks 2024 arXiv Dedy Kredo, Itamar Friedman, Tal Ridnik This paper introduces AlphaCodium, a novel test-based, multi-stage, code-oriented iterative approach for improving the performance of Language Model Models (LLMs) on code generation tasks. link
9 Large Language Models for Data Annotation: A Survey This week Dataset, Generative, Large-Language-Models Prompting, Tips & Tricks 2024 arXiv Alimohammad Beigi, Zhen Tan link

Loss Function

Paper Name Status Topic Category Year Conference Author Summary Link
0 Class-Balanced Loss Based on Effective Number of Samples Pending Loss Function Tips & Tricks 2019 CVPR Menglin Jia, Yin Cui link
1 WGAN: Wasserstein GAN Pending GANs, Loss Function 2017 arXiv Léon Bottou, Martin Arjovsky, Soumith Chintala link
2 Perceptual Losses for Real-Time Style Transfer and Super-Resolution Pending Loss Function, NNs 2016 ECCV Alexandre Alahi, Justin Johnson, Li Fei-Fei link
3 Topological Loss: Beyond the Pixel-Wise Loss for Topology-Aware Delineation Pending Image , Loss Function, Segmentation 2018 CVPR Agata Mosinska, Mateusz Koziński, Pablo Márquez-Neila, Pascal Fua link
4 Understanding Loss Functions in Computer Vision Pending CV , GANs, Image , Loss Function Comparison, Tips & Tricks 2020 Blog Sowmya Yellapragada link

NMT

Paper Name Status Topic Category Year Conference Author Summary Link
0 Phrase-Based & Neural Unsupervised Machine Translation Pending NMT, Text , Transformers Unsupervised 2018 arXiv Alexis Conneau, Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato, Myle Ott link
1 Unsupervised Machine Translation Using Monolingual Corpora Only Pending GANs, NMT, Text , Transformers Unsupervised 2017 arXiv Alexis Conneau, Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato, Myle Ott link
2 Cross-lingual Language Model Pretraining Pending NMT, Text , Transformers Unsupervised 2019 arXiv Alexis Conneau, Guillaume Lample link

NN Initialization

Paper Name Status Topic Category Year Conference Author Summary Link
0 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Read NN Initialization, NNs Optimization-No. of params, Tips & Tricks 2019 ICLR Jonathan Frankle, Michael Carbin Lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolation— reach test accuracy comparable to the original network in a similar number of iterations. link
1 All you need is a good init Pending NN Initialization Tips & Tricks 2015 arXiv Dmytro Mishkin, Jiri Matas link
2 Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask Read NN Initialization, NNs Comparison, Optimization-No. of params, Tips & Tricks 2019 NeurIPS Hattie Zhou, Janice Lan, Jason Yosinski, Rosanne Liu Follow up on Lottery Ticket Hypothesis exploring the effects of different Masking criteria as well as Mask-1 and Mask-0 actions. link

NNs

Paper Name Status Topic Category Year Conference Author Summary Link
0 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Read NN Initialization, NNs Optimization-No. of params, Tips & Tricks 2019 ICLR Jonathan Frankle, Michael Carbin Lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolation— reach test accuracy comparable to the original network in a similar number of iterations. link
1 How Does Batch Normalization Help Optimization? Pending NNs, Normalization Optimizations 2018 arXiv Aleksander Madry, Andrew Ilyas, Dimitris Tsipras, Shibani Santurkar link
2 Group Normalization Pending NNs, Normalization Optimizations 2018 arXiv Kaiming He, Yuxin Wu link
3 Perceptual Losses for Real-Time Style Transfer and Super-Resolution Pending Loss Function, NNs 2016 ECCV Alexandre Alahi, Justin Johnson, Li Fei-Fei link
4 NADAM: Incorporating Nesterov Momentum into Adam Pending NNs, Optimizers Comparison 2016 Timothy Dozat link
5 Deep Double Descent: Where Bigger Models and More Data Hurt Pending NNs 2019 arXiv Boaz Barak, Gal Kaplun, Ilya Sutskever, Preetum Nakkiran, Tristan Yang, Yamini Bansal link
6 Adam: A Method for Stochastic Optimization Pending NNs, Optimizers 2015 ICLR Diederik P. Kingma, Jimmy Ba link
7 Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask Read NN Initialization, NNs Comparison, Optimization-No. of params, Tips & Tricks 2019 NeurIPS Hattie Zhou, Janice Lan, Jason Yosinski, Rosanne Liu Follow up on Lottery Ticket Hypothesis exploring the effects of different Masking criteria as well as Mask-1 and Mask-0 actions. link

Normalization

Paper Name Status Topic Category Year Conference Author Summary Link
0 How Does Batch Normalization Help Optimization? Pending NNs, Normalization Optimizations 2018 arXiv Aleksander Madry, Andrew Ilyas, Dimitris Tsipras, Shibani Santurkar link
1 Group Normalization Pending NNs, Normalization Optimizations 2018 arXiv Kaiming He, Yuxin Wu link
2 Spectral Normalization for GANs Pending GANs, Normalization Optimizations 2018 arXiv Masanori Koyama, Takeru Miyato, Toshiki Kataoka, Yuichi Yoshida link

Optimizers

Paper Name Status Topic Category Year Conference Author Summary Link
0 NADAM: Incorporating Nesterov Momentum into Adam Pending NNs, Optimizers Comparison 2016 Timothy Dozat link
1 Adam: A Method for Stochastic Optimization Pending NNs, Optimizers 2015 ICLR Diederik P. Kingma, Jimmy Ba link

Other

Paper Name Status Topic Category Year Conference Author Summary Link
0 MuZero: Mastering Go, chess, shogi and Atari without rules Pending Other Reinforcement-Learning 2020 Nature David Silver, Demis Hassabis, Ioannis Antonoglou, Julian Schrittwiese link

Pose Estimation

Paper Name Status Topic Category Year Conference Author Summary Link
0 A 2019 guide to Human Pose Estimation with Deep Learning Pending CV , Pose Estimation Comparison 2019 Blog Sudharshan Chandra Babu link
1 A Simple yet Effective Baseline for 3D Human Pose Estimation Pending CV , Pose Estimation 2017 ICCV James J. Little, Javier Romero, Julieta Martinez, Rayat Hossain link

Question-Answering

Paper Name Status Topic Category Year Conference Author Summary Link
0 SpanBERT: Improving Pre-training by Representing and Predicting Spans Read Question-Answering, Text , Transformers Pre-Training 2020 TACL Danqi Chen, Mandar Joshi A different pre-training strategy for BERT model to improve performance for Question Answering task. link
1 Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach Read Question-Answering, Text , Transformers Zero-shot-learning 2020 KDD Li Yang, Qifan Wang Question Answering BERT model used to extract attributes from products. Introduce further No Answer loss and distillation to promote zero shot learning. link
2 Chain of Thought Prompting Elicits Reasoning in Large Language Models Pending Question-Answering, Text , Transformers 2022 arXiv Denny Zhou, Jason Wei, Xuezhi Wang link
3 Large Language Models are Zero-Shot Reasoners Pending Generative, Question-Answering, Text Tips & Tricks, Zero-shot-learning 2022 arXiv Takeshi Kojima, Yusuke Iwasawa link
4 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link
5 Scaling Instruction-Finetuned Language Models (FLAN) Pending Generative, Large-Language-Models, Question-Answering, Text , Transformers Instruction-Finetuning 2022 arXiv Hyung Won Chung, Jason Wei, Jeffrey Dean, Le Hou, Quoc V. Le, Shayne Longpre https://arxiv.org/abs/2210.11416 introduces FLAN (Fine-tuned LAnguage Net), an instruction finetuning method, and presents the results of its application. The study demonstrates that by fine-tuning the 540B PaLM model on 1836 tasks while incorporating Chain-of-Thought Reasoning data, FLAN achieves improvements in generalization, human usability, and zero-shot reasoning over the base model. The paper also provides detailed information on how each these aspects was evaluated. link

Segmentation

Paper Name Status Topic Category Year Conference Author Summary Link
0 Topological Loss: Beyond the Pixel-Wise Loss for Topology-Aware Delineation Pending Image , Loss Function, Segmentation 2018 CVPR Agata Mosinska, Mateusz Koziński, Pablo Márquez-Neila, Pascal Fua link

Siamese Network

Paper Name Status Topic Category Year Conference Author Summary Link
0 Language-Agnostic BERT Sentence Embedding Read Attention, Siamese Network, Text , Transformers Embeddings 2020 arXiv Fangxiaoyu Feng, Yinfei Yang A BERT model with multilingual sentence embeddings learned over 112 languages and Zero-shot learning over unseen languages. link

Tabular

Paper Name Status Topic Category Year Conference Author Summary Link
0 Self-Normalizing Neural Networks Pending Activation Function, Tabular Optimizations, Tips & Tricks 2017 NIPS Andreas Mayr, Günter Klambauer, Thomas Unterthiner link

Text

Paper Name Status Topic Category Year Conference Author Summary Link
0 Attention is All you Need Read Attention, Text , Transformers Architecture 2017 NIPS Ashish Vaswani, Illia Polosukhin, Noam Shazeer, Łukasz Kaiser Talks about Transformer architecture which brings SOTA performance for different tasks in NLP link
1 GPT-2 (Language Models are Unsupervised Multitask Learners) Pending Attention, Text , Transformers 2019 Alec Radford, Dario Amodei, Ilya Sutskever, Jeffrey Wu link
2 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Read Attention, Text , Transformers Embeddings 2018 NAACL Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang BERT is an extension to Transformer based architecture which introduces a masked word pretraining and next sentence prediction task to pretrain the model for a wide variety of tasks. link
3 Single Headed Attention RNN: Stop Thinking With Your Head Pending Attention, LSTMs, Text Optimization-No. of params 2019 arXiv Stephen Merity link
4 Reformer: The Efficient Transformer Read Attention, Text , Transformers Architecture, Optimization-Memory, Optimization-No. of params 2020 arXiv Anselm Levskaya, Lukasz Kaiser, Nikita Kitaev Overcome time and memory complexity of Transformers by bucketing Query, Keys and using Reversible residual connections. link
5 Language-Agnostic BERT Sentence Embedding Read Attention, Siamese Network, Text , Transformers Embeddings 2020 arXiv Fangxiaoyu Feng, Yinfei Yang A BERT model with multilingual sentence embeddings learned over 112 languages and Zero-shot learning over unseen languages. link
6 Phrase-Based & Neural Unsupervised Machine Translation Pending NMT, Text , Transformers Unsupervised 2018 arXiv Alexis Conneau, Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato, Myle Ott link
7 Unsupervised Machine Translation Using Monolingual Corpora Only Pending GANs, NMT, Text , Transformers Unsupervised 2017 arXiv Alexis Conneau, Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato, Myle Ott link
8 Cross-lingual Language Model Pretraining Pending NMT, Text , Transformers Unsupervised 2019 arXiv Alexis Conneau, Guillaume Lample link
9 Word2Vec: Efficient Estimation of Word Representations in Vector Space Pending Text Embeddings, Tips & Tricks 2013 arXiv Greg Corrado, Jeffrey Dean, Kai Chen, Tomas Mikolov link
10 One-shot Text Field Labeling using Attention and Belief Propagation for Structure Information Extraction Pending Image , Text 2020 arXiv Jun Huang, Mengli Cheng, Minghui Qiu, Wei Lin, Xing Shi link
11 ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning Pending AGI, Dataset, Text 2019 AAAI Maarten Sap, Noah A. Smith, Ronan Le Bras, Yejin Choi link
12 COMET: Commonsense Transformers for Automatic Knowledge Graph Construction Pending AGI, Text , Transformers 2019 ACL Antoine Bosselut, Hannah Rashkin, Yejin Choi link
13 VisualCOMET: Reasoning about the Dynamic Context of a Still Image Pending AGI, Dataset, Image , Text , Transformers 2020 ECCV Ali Farhadi, Chandra Bhagavatula, Jae Sung Park, Yejin Choi link
14 T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Read Attention, Text , Transformers 2020 JMLR Colin Raffel, Noam Shazeer, Peter J. Liu, Wei Liu, Yanqi Zhou Presents a Text-to-Text transformer model with multi-task learning capabilities, simultaneously solving problems such as machine translation, document summarization, question answering, and classification tasks. link
15 DALL·E: Creating Images from Text Pending Image , Text , Transformers 2021 Blog Aditya Ramesh, Gabriel Goh, Ilya Sutskever, Mikhail Pavlov, Scott Gray link
16 CLIP: Connecting Text and Images Pending Image , Text , Transformers Multimodal, Pre-Training 2021 arXiv Alec Radford, Ilya Sutskever, Jong Wook Kim link
17 Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision This week Image , Text , Transformers Multimodal 2020 EMNLP Hao Tan, Mohit Bansal link
18 SpanBERT: Improving Pre-training by Representing and Predicting Spans Read Question-Answering, Text , Transformers Pre-Training 2020 TACL Danqi Chen, Mandar Joshi A different pre-training strategy for BERT model to improve performance for Question Answering task. link
19 Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach Read Question-Answering, Text , Transformers Zero-shot-learning 2020 KDD Li Yang, Qifan Wang Question Answering BERT model used to extract attributes from products. Introduce further No Answer loss and distillation to promote zero shot learning. link
20 Interpreting Deep Learning Models in Natural Language Processing: A Review Pending Text Comparison, Visualization 2021 arXiv Diyi Yang, Xiaofei Sun link
21 Symbolic Knowledge Distillation: from General Language Models to Commonsense Models Pending Dataset, Text , Transformers Optimizations, Tips & Tricks 2021 arXiv Chandra Bhagavatula, Jack Hessel, Peter West, Yejin Choi link
22 Chain of Thought Prompting Elicits Reasoning in Large Language Models Pending Question-Answering, Text , Transformers 2022 arXiv Denny Zhou, Jason Wei, Xuezhi Wang link
23 Transforming Sequence Tagging Into A Seq2Seq Task Pending Generative, Text Comparison, Tips & Tricks 2022 arXiv Iftekhar Naim, Karthik Raman, Krishna Srinivasan link
24 Large Language Models are Zero-Shot Reasoners Pending Generative, Question-Answering, Text Tips & Tricks, Zero-shot-learning 2022 arXiv Takeshi Kojima, Yusuke Iwasawa link
25 Flan-T5: Scaling Instruction-Finetuned Language Models Pending Generative, Text , Transformers Architecture, Pre-Training 2022 arXiv Hyung Won Chung, Le Hou link
26 Decoding a Neural Retriever’s Latent Space for Query Suggestion Pending Text Embeddings, Latent space 2022 arXiv Christian Buck, Leonard Adolphs, Michelle Chen Huebscher link
27 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link
28 Scaling Instruction-Finetuned Language Models (FLAN) Pending Generative, Large-Language-Models, Question-Answering, Text , Transformers Instruction-Finetuning 2022 arXiv Hyung Won Chung, Jason Wei, Jeffrey Dean, Le Hou, Quoc V. Le, Shayne Longpre https://arxiv.org/abs/2210.11416 introduces FLAN (Fine-tuned LAnguage Net), an instruction finetuning method, and presents the results of its application. The study demonstrates that by fine-tuning the 540B PaLM model on 1836 tasks while incorporating Chain-of-Thought Reasoning data, FLAN achieves improvements in generalization, human usability, and zero-shot reasoning over the base model. The paper also provides detailed information on how each these aspects was evaluated. link
29 ReAct: Synergizing Reasoning and Acting in Language Models Pending Generative, Large-Language-Models, Text Optimizations, Tips & Tricks 2023 ICLR Dian Yu, Izhak Shafran, Jeffrey Zhao, Karthik Narasimhan, Nan Du, Shunyu Yao, Yuan Cao This paper introduces ReAct, a novel approach that leverages Large Language Models (LLMs) to interleave reasoning traces and task-specific actions. ReAct outperforms existing methods on various language and decision-making tasks, addressing issues like hallucination, error propagation, and improving human interpretability and trustworthiness. link

Training Method

Paper Name Status Topic Category Year Conference Author Summary Link
0 Training language models to follow instructions with human feedback Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning, Reinforcement-Learning, Semi-Supervised 2022 arXiv Carroll L. Wainwright, Diogo Almeida, Jan Leike, Jeff Wu, Long Ouyang, Pamela Mishkin, Paul Christiano, Ryan Lowe, Xu Jiang This paper presents InstructGPT, a model fine-tuned with human feedback to better align with user intent across various tasks. Despite having significantly fewer parameters than larger models, InstructGPT outperforms them in human evaluations, demonstrating improved truthfulness, reduced toxicity, and minimal performance regressions on public NLP datasets, highlighting the potential of fine-tuning with human feedback for enhancing language model alignment with human intent. link
1 Constitutional AI: Harmlessness from AI Feedback Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning, Reinforcement-Learning, Unsupervised 2022 arXiv Jared Kaplan, Yuntao Ba The paper introduces Constitutional AI, a method for training a safe AI assistant without human-labeled data on harmful outputs. It combines supervised learning and reinforcement learning phases, enabling the AI to engage with harmful queries by explaining its objections, thus improving control, transparency, and human-judged performance with minimal human oversight. link
2 Self-Alignment with Instruction Backtranslation Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning 2023 arXiv Jason Weston, Mike Lewis, Ping Yu, Xian Li The paper introduces a scalable method called "instruction backtranslation" to create a high-quality instruction-following language model. This method involves self-augmentation and self-curation of training examples generated from web documents, resulting in a model that outperforms others in its category without relying on distillation data, showcasing its effective self-alignment capability. link
3 Table-GPT: Table-tuned GPT for Diverse Table Tasks Pending Generative, Large-Language-Models, Training Method Instruction-Finetuning 2023 arXiv link

Transformers

Paper Name Status Topic Category Year Conference Author Summary Link
0 Attention is All you Need Read Attention, Text , Transformers Architecture 2017 NIPS Ashish Vaswani, Illia Polosukhin, Noam Shazeer, Łukasz Kaiser Talks about Transformer architecture which brings SOTA performance for different tasks in NLP link
1 GPT-2 (Language Models are Unsupervised Multitask Learners) Pending Attention, Text , Transformers 2019 Alec Radford, Dario Amodei, Ilya Sutskever, Jeffrey Wu link
2 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Read Attention, Text , Transformers Embeddings 2018 NAACL Jacob Devlin, Kenton Lee, Kristina Toutanova, Ming-Wei Chang BERT is an extension to Transformer based architecture which introduces a masked word pretraining and next sentence prediction task to pretrain the model for a wide variety of tasks. link
3 Reformer: The Efficient Transformer Read Attention, Text , Transformers Architecture, Optimization-Memory, Optimization-No. of params 2020 arXiv Anselm Levskaya, Lukasz Kaiser, Nikita Kitaev Overcome time and memory complexity of Transformers by bucketing Query, Keys and using Reversible residual connections. link
4 Language-Agnostic BERT Sentence Embedding Read Attention, Siamese Network, Text , Transformers Embeddings 2020 arXiv Fangxiaoyu Feng, Yinfei Yang A BERT model with multilingual sentence embeddings learned over 112 languages and Zero-shot learning over unseen languages. link
5 Phrase-Based & Neural Unsupervised Machine Translation Pending NMT, Text , Transformers Unsupervised 2018 arXiv Alexis Conneau, Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato, Myle Ott link
6 Unsupervised Machine Translation Using Monolingual Corpora Only Pending GANs, NMT, Text , Transformers Unsupervised 2017 arXiv Alexis Conneau, Guillaume Lample, Ludovic Denoyer, Marc'Aurelio Ranzato, Myle Ott link
7 Cross-lingual Language Model Pretraining Pending NMT, Text , Transformers Unsupervised 2019 arXiv Alexis Conneau, Guillaume Lample link
8 COMET: Commonsense Transformers for Automatic Knowledge Graph Construction Pending AGI, Text , Transformers 2019 ACL Antoine Bosselut, Hannah Rashkin, Yejin Choi link
9 VisualCOMET: Reasoning about the Dynamic Context of a Still Image Pending AGI, Dataset, Image , Text , Transformers 2020 ECCV Ali Farhadi, Chandra Bhagavatula, Jae Sung Park, Yejin Choi link
10 T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Read Attention, Text , Transformers 2020 JMLR Colin Raffel, Noam Shazeer, Peter J. Liu, Wei Liu, Yanqi Zhou Presents a Text-to-Text transformer model with multi-task learning capabilities, simultaneously solving problems such as machine translation, document summarization, question answering, and classification tasks. link
11 GPT-f: Generative Language Modeling for Automated Theorem Proving Pending Attention, Transformers 2020 arXiv Ilya Sutskever, Stanislas Polu link
12 Vision Transformer: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Pending Attention, Image , Transformers 2021 ICLR Alexey Dosovitskiy, Jakob Uszkoreit, Lucas Beyer, Neil Houlsby link
13 DALL·E: Creating Images from Text Pending Image , Text , Transformers 2021 Blog Aditya Ramesh, Gabriel Goh, Ilya Sutskever, Mikhail Pavlov, Scott Gray link
14 CLIP: Connecting Text and Images Pending Image , Text , Transformers Multimodal, Pre-Training 2021 arXiv Alec Radford, Ilya Sutskever, Jong Wook Kim link
15 Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision This week Image , Text , Transformers Multimodal 2020 EMNLP Hao Tan, Mohit Bansal link
16 SpanBERT: Improving Pre-training by Representing and Predicting Spans Read Question-Answering, Text , Transformers Pre-Training 2020 TACL Danqi Chen, Mandar Joshi A different pre-training strategy for BERT model to improve performance for Question Answering task. link
17 Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach Read Question-Answering, Text , Transformers Zero-shot-learning 2020 KDD Li Yang, Qifan Wang Question Answering BERT model used to extract attributes from products. Introduce further No Answer loss and distillation to promote zero shot learning. link
18 TransGAN: Two Transformers Can Make One Strong GAN Pending GANs, Image , Transformers Architecture 2021 arXiv Shiyu Chang, Yifan Jiang, Zhangyang Wang link
19 Symbolic Knowledge Distillation: from General Language Models to Commonsense Models Pending Dataset, Text , Transformers Optimizations, Tips & Tricks 2021 arXiv Chandra Bhagavatula, Jack Hessel, Peter West, Yejin Choi link
20 Chain of Thought Prompting Elicits Reasoning in Large Language Models Pending Question-Answering, Text , Transformers 2022 arXiv Denny Zhou, Jason Wei, Xuezhi Wang link
21 Flan-T5: Scaling Instruction-Finetuned Language Models Pending Generative, Text , Transformers Architecture, Pre-Training 2022 arXiv Hyung Won Chung, Le Hou link
22 Training Compute-Optimal Large Language Models Pending Large-Language-Models, Transformers Architecture, Optimization-No. of params, Pre-Training, Tips & Tricks 2022 arXiv Jordan Hoffmann, Laurent Sifre, Oriol Vinyals, Sebastian Borgeaud link
23 VL-T5: Unifying Vision-and-Language Tasks via Text Generation Read CNNs, CV , Generative, Image , Large-Language-Models, Question-Answering, Text , Transformers Architecture, Embeddings, Multimodal, Pre-Training 2021 arXiv Hao Tan, Jaemin Cho, Jie Le, Mohit Bansal Unifying two modalities (image and text) together in a single transformer model to solve multiple tasks in a single architecture using text prefixes similar to T5. link
24 Scaling Instruction-Finetuned Language Models (FLAN) Pending Generative, Large-Language-Models, Question-Answering, Text , Transformers Instruction-Finetuning 2022 arXiv Hyung Won Chung, Jason Wei, Jeffrey Dean, Le Hou, Quoc V. Le, Shayne Longpre https://arxiv.org/abs/2210.11416 introduces FLAN (Fine-tuned LAnguage Net), an instruction finetuning method, and presents the results of its application. The study demonstrates that by fine-tuning the 540B PaLM model on 1836 tasks while incorporating Chain-of-Thought Reasoning data, FLAN achieves improvements in generalization, human usability, and zero-shot reasoning over the base model. The paper also provides detailed information on how each these aspects was evaluated. link