Ilya30u30 Paper Research

Purpose

This repo, inspired by the influential reading list recommended by Ilya Sutskever to John Carmack in 2020, curates and reproduces the essential AI and deep learning papers that shaped the field. Dubbed as the "ilya30u30," this collection represents the 27 papers that provide a comprehensive foundation and advanced insights into neural networks, generative models, optimization, and more. Each paper is researched and reproduced, offering detailed notes and code implementations aimed at deepening the understanding of key concepts for researchers, students, and practitioners alike. Through reproduction, I aim to deepen my understanding of the implementation intricacies, validate the findings, and explore potential improvements. By sharing this work, I hope to provide valuable insights and foster further exploration in the deep learning community.

How to Read

The collection is organized to guide readers from foundational theories to advanced topics. Each entry includes a brief summary and a link to the original paper or resource. The suggested reading order at the end can help you navigate through the materials effectively.

First, use the Reading Order section to understand the knowledge dependencies between the 27 papers. Then, explore the Reproduction Research section to delve deeper into the detailed studies, including reproductions and summaries of each paper.

Reading Order

Foundational Theory and Introduction to Neural Networks

CS231n Convolutional Neural Networks for Visual Recognition
- A comprehensive course from Stanford University that covers the basics of CNNs, architectures, and training techniques—a great starting point for learning visual recognition.
The Unreasonable Effectiveness of Recurrent Neural Networks
- Emphasizes the powerful capabilities of RNNs in handling sequential data, further solidifying your understanding of RNNs.
Understanding LSTM Networks
- Introduces LSTMs and their advantages in handling long-term dependencies, forming a foundation for understanding RNNs.
Recurrent Neural Network Regularization
- Learn how to optimize RNNs and LSTMs by reducing overfitting through regularization techniques.
ImageNet Classification with Deep Convolutional Neural Networks
- Understand the basics of Convolutional Neural Networks (CNNs) and their applications in image recognition.
Deep Residual Learning for Image Recognition
- Learn how ResNet addresses training challenges in deep networks and enhances accuracy.
Multi-Scale Context Aggregation by Dilated Convolutions
- Delve into using dilated convolutions in semantic segmentation tasks to achieve multi-scale context aggregation.

Machine Translation and Natural Language Processing

Neural Machine Translation by Jointly Learning to Align and Translate
- Study the foundational models and alignment mechanisms in neural machine translation.
Attention Is All You Need and The Annotated Transformer
- Deeply understand the Transformer model's implementation details and code through practical explanations.
Pointer Networks
- Learn about a new network structure that addresses variable-length output sequences, suitable for tasks like sorting and combinatorial optimization.
Scaling Laws for Neural Language Models
- Explore factors affecting the performance of large language models and their scaling laws.

Deep Learning and Optimization

GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism
- Learn how to scale neural network capacity through micro-batch pipeline parallelism and understand parallel training techniques for large models.
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
- Understand how to build and optimize end-to-end speech recognition systems, especially for handling different languages.

Graph Structures and Relational Reasoning

Neural Message Passing for Quantum Chemistry
- Study the application of graph neural networks in quantum chemistry, a key to understanding supervised learning on graph data structures.
A Simple Neural Network Module for Relational Reasoning
- Learn how to enhance existing neural networks with relational reasoning modules for unstructured input.
Relational Recurrent Neural Networks
- Explore new memory modules that perform relational reasoning in sequential data, combining memory and reasoning.

Generative Models and Complexity

Variational Lossy Autoencoder
- Explore the combination of autoregressive models and variational autoencoders to achieve complex generative tasks.
The First Law of Complexodynamics
- Understand the evolution of complexity in physical systems and its relation to Kolmogorov complexity.
Kolmogorov Complexity and Algorithmic Randomness
- Learn Kolmogorov complexity theory and its applications in algorithmic randomness, laying the foundation for unsupervised learning.
Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton
- Delve into the quantitative measurement of complexity in closed systems and understand the trend of complexity over time.

Advanced Topics

Neural Turing Machines
- Combine neural networks with the concept of Turing machines to expand your understanding of neural network architectures.
Identity Mappings in Deep Residual Networks
- Further study the internal mechanisms of ResNet and the importance of identity mappings in information propagation.
Order Matters: Sequence to Sequence for Sets
- Explore how to handle cases where inputs and outputs are not necessarily ordered sequences.
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
- Learn how the Minimum Description Length principle applies to simplifying neural networks and balancing weight information content.
A Tutorial Introduction to the Minimum Description Length Principle
- Deepen your understanding of the MDL principle and its applications in model selection and data compression.
Machine Super Intelligence
- Explore the concepts and research progress in machine superintelligence, understanding the definition of intelligence and its quantifiable standards.

Reproduction Research

Note: The reproduction of the code and detailed notes for each paper are currently a work in progress. These sections will be updated in the main branch once finalized. Until then, you can find the basic information about each paper and follow our progress through ongoing updates.

Field	Name	Summary	Note	Reproduction
Foundational Theory and Neural Networks	CS231n Convolutional Neural Networks for Visual Recognition	A comprehensive course from Stanford University that covers the basics of CNNs, architectures, and training techniques—a great starting point for learning visual recognition.	note	code
Foundational Theory and Neural Networks	The Unreasonable Effectiveness of Recurrent Neural Networks	Emphasizes the powerful capabilities of RNNs in handling sequential data, further solidifying your understanding of RNNs.	note	code
Foundational Theory and Neural Networks	Understanding LSTM Networks	Introduces LSTMs and their advantages in handling long-term dependencies, forming a foundation for understanding RNNs.	note	code
Foundational Theory and Neural Networks	Recurrent Neural Network Regularization	Learn how to optimize RNNs and LSTMs by reducing overfitting through regularization techniques.	note	code
Foundational Theory and Neural Networks	ImageNet Classification with Deep Convolutional Neural Networks	Understand the basics of Convolutional Neural Networks (CNNs) and their applications in image recognition.	note	code
Foundational Theory and Neural Networks	Deep Residual Learning for Image Recognition	Learn how ResNet addresses training challenges in deep networks and enhances accuracy.	note	code
Foundational Theory and Neural Networks	Multi-Scale Context Aggregation by Dilated Convolutions	Delve into using dilated convolutions in semantic segmentation tasks to achieve multi-scale context aggregation.	note	code
Machine Translation and NLP	Neural Machine Translation by Jointly Learning to Align and Translate	Study the foundational models and alignment mechanisms in neural machine translation.	note	code
Machine Translation and NLP	Attention Is All You Need	No explanation needed—the seminal Transformer paper that is a must-read.	note ✅	code
Machine Translation and NLP	The Annotated Transformer	Deeply understand the Transformer model's implementation details and code through practical explanations.	note	code
Machine Translation and NLP	Pointer Networks	Learn about a new network structure that addresses variable-length output sequences, suitable for tasks like sorting and combinatorial optimization.	note	code
Machine Translation and NLP	Scaling Laws for Neural Language Models	Explore factors affecting the performance of large language models and their scaling laws.	note	code
Deep Learning and Optimization	GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism	Learn how to scale neural network capacity through micro-batch pipeline parallelism and understand parallel training techniques for large models.	note	code
Deep Learning and Optimization	Deep Speech 2: End-to-End Speech Recognition in English and Mandarin	Understand how to build and optimize end-to-end speech recognition systems, especially for handling different languages.	note	code
Graph Structures and Relational Reasoning	Neural Message Passing for Quantum Chemistry	Study the application of graph neural networks in quantum chemistry, a key to understanding supervised learning on graph data structures.	note	code
Graph Structures and Relational Reasoning	A Simple Neural Network Module for Relational Reasoning	Learn how to enhance existing neural networks with relational reasoning modules for unstructured input.	note	code
Graph Structures and Relational Reasoning	Relational Recurrent Neural Networks	Explore new memory modules that perform relational reasoning in sequential data, combining memory and reasoning.	note	code
Generative Models and Complexity	Variational Lossy Autoencoder	Explore the combination of autoregressive models and variational autoencoders to achieve complex generative tasks.	note	code
Generative Models and Complexity	The First Law of Complexodynamics	Understand the evolution of complexity in physical systems and its relation to Kolmogorov complexity.	note	code
Generative Models and Complexity	Kolmogorov Complexity and Algorithmic Randomness	Learn Kolmogorov complexity theory and its applications in algorithmic randomness, laying the foundation for unsupervised learning.	note	code
Generative Models and Complexity	Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton	Delve into the quantitative measurement of complexity in closed systems and understand the trend of complexity over time.	note	code
Advanced Topics	Neural Turing Machines	Combine neural networks with the concept of Turing machines to expand your understanding of neural network architectures.	note	code
Advanced Topics	Identity Mappings in Deep Residual Networks	Further study the internal mechanisms of ResNet and the importance of identity mappings in information propagation.	note	code
Advanced Topics	Order Matters: Sequence to Sequence for Sets	Explore how to handle cases where inputs and outputs are not necessarily ordered sequences.	note	code
Advanced Topics	Keeping Neural Networks Simple by Minimizing the Description Length of the Weights	Learn how the Minimum Description Length principle applies to simplifying neural networks and balancing weight information content.	note	code
Advanced Topics	A Tutorial Introduction to the Minimum Description Length Principle	Deepen your understanding of the MDL principle and its applications in model selection and data compression.	note	code
Advanced Topics	Machine Super Intelligence	Explore the concepts and research progress in machine superintelligence, understanding the definition of intelligence and its quantifiable standards.	note	code

Full Paper List

1) Attention Is All You Need

No explanation needed—the seminal Transformer paper that is a must-read.

Link: https://arxiv.org/pdf/1706.03762

2) The Annotated Transformer

This article is a blog post written in 2018 by researchers including Alexander Rush, an associate professor at Cornell University. It provides a line-by-line explanation of the Transformer and includes a complete Python implementation. This helps readers understand the theory while deepening their knowledge through practice.

Article: https://nlp.seas.harvard.edu/2018/04/03/attention.html

Code: https://github.com/harvardnlp/annotated-transformer/

3) The First Law of Complexodynamics

This is an article titled "The First Law of Complexodynamics" by Scott Aaronson, discussing why the "complexity" or "interestingness" of physical systems seems to increase over time, reach a maximum, and then decrease, while entropy increases monotonically. Aaronson attempts to explain this phenomenon using Kolmogorov complexity and related concepts, pointing out several challenges and possible solutions in this field.

Article: https://scottaaronson.blog/?p=762

4) The Unreasonable Effectiveness of Recurrent Neural Networks

This article, written by Andrej Karpathy in 2015, emphasizes the effectiveness of Recurrent Neural Networks (RNNs). It explores the powerful capabilities of RNNs in handling sequential data.

Link: https://karpathy.github.io/2015/05/21/rnn-effectiveness/

5) Understanding LSTM Networks

Written in 2015 by Christopher Olah, this article introduces Long Short-Term Memory (LSTM) networks, a special kind of RNN capable of handling long-term dependencies. LSTMs have achieved great success in fields like speech recognition, language modeling, translation, and image captioning.

Link: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

6) Recurrent Neural Network Regularization

Authored by Ilya Sutskever in 2015, this paper proposes a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. The paper demonstrates how to correctly apply dropout to LSTM networks, significantly reducing overfitting across various tasks, including language modeling, speech recognition, image caption generation, and machine translation.

Link: https://arxiv.org/pdf/1409.2329.pdf

7) Keeping Neural Networks Simple by Minimizing the Description Length of the Weights

This paper discusses how the generalization ability of supervised neural networks improves when the weights contain less information than the output vectors of the training cases. By penalizing the information content of the weights during learning, the weights are kept simple. This can be achieved by adding Gaussian noise to control the information content of the weights.

Link: https://www.cs.toronto.edu/~hinton/absps/colt93.pdf

8) Pointer Networks

This paper introduces a new neural network architecture designed to learn the conditional probability of output sequences composed of discrete tokens that represent positions in an input sequence. Existing models like sequence-to-sequence and Neural Turing Machines struggle with problems where the target output dictionary size depends on the input length, such as sorting variable-length sequences and various combinatorial optimization problems.

Link: https://arxiv.org/pdf/1506.03134

9) ImageNet Classification with Deep Convolutional Neural Networks

Authored by Geoffrey Hinton, Ilya Sutskever, and others, this groundbreaking paper introduced AlexNet, revolutionizing image recognition and kickstarting the deep learning revolution. They trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into 1000 different classes.

Link: https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

10) Order Matters: Sequence to Sequence for Sets

The paper explores how the order in which data is organized affects the learning of underlying patterns. The authors investigate an extension of the sequence-to-sequence (seq2seq) framework to go beyond sequences and handle input sets in a principled way. They also propose a loss function to address the lack of structure in output sets by exploring different data sequences during training.

Link: https://arxiv.org/pdf/1511.06391

11) GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism

This paper introduces GPipe, a model parallelism library that allows scaling the capacity of large neural networks through micro-batch pipeline parallelism. The authors demonstrate its application in image classification and multilingual neural machine translation tasks.

Link: https://arxiv.org/pdf/1811.06965

12) Deep Residual Learning for Image Recognition

Authored by Kaiming He, this 2016 CVPR Best Paper describes the Deep Residual Learning framework, significantly reducing the difficulty of training deeper neural networks and improving accuracy.

Link: https://arxiv.org/pdf/1512.03385

13) Multi-Scale Context Aggregation by Dilated Convolutions

The authors develop a novel convolutional network module tailored for dense prediction tasks like semantic segmentation. This module uses dilated convolutions to effectively aggregate multi-scale contextual information without reducing image resolution.

Link: https://arxiv.org/pdf/1511.07122

14) Neural Message Passing for Quantum Chemistry

The paper summarizes and organizes existing neural network models for graph-structured data that the authors believe are most promising. It proposes a general framework for supervised learning on graphs called Message Passing Neural Networks (MPNNs).

Link: https://arxiv.org/pdf/1704.01212

15) Neural Machine Translation by Jointly Learning to Align and Translate

Published in 2014 by Bahdanau et al., this paper is one of the pioneering works in neural machine translation. It introduces a novel model architecture and training method that allows neural networks to automatically search for parts of a source sentence that are relevant to predicting a target word.

Link: https://arxiv.org/pdf/1409.0473

16) Identity Mappings in Deep Residual Networks

This paper, also authored by Kaiming He, further analyzes the propagation mechanisms behind residual blocks. The authors propose a new residual unit that simplifies the training process and improves model generalization.

Link: https://arxiv.org/pdf/1603.05027

17) A Simple Neural Network Module for Relational Reasoning

To explore relational reasoning further and test whether this capability can be easily added to existing systems, DeepMind researchers developed a simple, plug-and-play module called the Relation Network (RN). This module can be inserted into existing neural network architectures to equip them with the ability to reason about relationships between entities.

Link: https://arxiv.org/pdf/1706.01427

18) Variational Lossy Autoencoder

This paper successfully combines autoregressive models with Variational Autoencoders (VAEs) to achieve generative tasks. It addresses the issue where VAEs tend to ignore some latent representations during training and introduces the Variational Lossy Autoencoder (VLAE).

Link: https://arxiv.org/pdf/1611.02731

19) Relational Recurrent Neural Networks

This paper from DeepMind and University College London introduces the Relational Memory Core (RMC), capable of performing relational reasoning in sequential information. It achieves state-of-the-art performance on the WikiText-103, Project Gutenberg, and GigaWord datasets.

Link: https://arxiv.org/pdf/1806.01822

20) Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

This paper attempts to measure the pattern where the "complexity" or "interestingness" of closed systems increases over time, reaches a maximum, and then decreases, unlike entropy, which increases monotonically. The authors use a simple two-dimensional cellular automaton model to simulate the mixing of two liquids ("coffee" and "cream") and propose "structural complexity" as an approximate measure of Kolmogorov complexity.

Link: https://arxiv.org/pdf/1405.6903

Further Reading: Beauty and Structural Complexity (in Chinese)

21) Neural Turing Machines

Neural Turing Machines (NTMs) are a deep learning algorithm that combines neural networks and the concept of Turing machines. The paper enhances the capabilities of neural networks by coupling them to external memory resources, with which they can interact using attention mechanisms.

Link: https://arxiv.org/pdf/1410.5401

22) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Published by Baidu Research's Silicon Valley AI Lab, the authors demonstrate an end-to-end deep learning approach that can recognize English and Mandarin speech. They replace hand-engineered components with neural networks, handling various speech scenarios, including noisy environments and different accents.

Link: https://arxiv.org/pdf/1512.02595.pdf

23) Scaling Laws for Neural Language Models

A classic paper from OpenAI, the authors explore the factors that affect language model performance in terms of cross-entropy loss. They find that model size, dataset size, and training compute affect the loss and can be largely traded off against each other.

Link: https://arxiv.org/pdf/2001.08361

24) A Tutorial Introduction to the Minimum Description Length Principle

This paper provides a tutorial introduction to the Minimum Description Length (MDL) principle, a method for model selection and data compression.

Link: https://arxiv.org/pdf/math/0406077

25) Machine Super Intelligence

Authored by DeepMind co-founder and chief scientist Shane Legg, this 2008 doctoral thesis is considered one of the earliest academic papers to systematically explore Artificial General Intelligence (AGI). It lays the foundation for subsequent research in the field.

Link: https://www.vetta.org/documents/Machine_Super_Intelligence.pdf

26) Kolmogorov Complexity and Algorithmic Randomness

Published by the American Mathematical Society, this book by A. Shen, V. A. Uspenskii, and N. K. Vereshchagin introduces Kolmogorov complexity theory and its applications in algorithmic randomness, providing a theoretical foundation for understanding computational complexity and randomness.

Link: https://www.lirmm.fr/~ashen/kolmbook-eng-scan.pdf

27) CS231n Convolutional Neural Networks for Visual Recognition

CS231n is a machine learning course at Stanford University, focusing on using Convolutional Neural Networks for visual recognition. It comprehensively covers CNN architectures, training techniques, and the latest research findings.

Link: https://cs231n.github.io/

References

Ref: Exclusive Q&A: John Carmack’s ‘Different Path’ to Artificial General Intelligence

"So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head."

Ref: https://x.com/ID_AA_Carmack/status/1622673143469858816

I rather expected @ilyasut to have made a public post by now after all the discussion of the AI reading list he gave me. A canonical list of references from a leading figure would be appreciated by many. I would be curious myself about what he would add from the last three years.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
notes		notes
.gitignore		.gitignore
01-Attention Is All You Need.pdf		01-Attention Is All You Need.pdf
02-The Annotated Transformer.html		02-The Annotated Transformer.html
03-The First Law of Complexodynamics.html		03-The First Law of Complexodynamics.html
04-The Unreasonable Effectiveness of Recurrent Neural Networks.mhtml		04-The Unreasonable Effectiveness of Recurrent Neural Networks.mhtml
05-Understanding LSTM Networks.mhtml		05-Understanding LSTM Networks.mhtml
06-Recurrent Neural Networks Regularization.pdf		06-Recurrent Neural Networks Regularization.pdf
07-Keeping Neural Networks Simple by Minimizing the Description Length of the Weights.pdf		07-Keeping Neural Networks Simple by Minimizing the Description Length of the Weights.pdf
08-Pointer Networks.pdf		08-Pointer Networks.pdf
09-ImageNet Classification with Deep Convolutional Neural Networks.pdf		09-ImageNet Classification with Deep Convolutional Neural Networks.pdf
10-Order Matters Sequence to Sequence for Sets.pdf		10-Order Matters Sequence to Sequence for Sets.pdf
11-GPipe Easy Scaling with Micro-Batch Pipeline Parallelism.pdf		11-GPipe Easy Scaling with Micro-Batch Pipeline Parallelism.pdf
12-Deep Residual Learning for Image Recognition.pdf		12-Deep Residual Learning for Image Recognition.pdf
13-Multi-Scale Context Aggregation by Dilated Convolutions.pdf		13-Multi-Scale Context Aggregation by Dilated Convolutions.pdf
14-Neural Message Passing for Quantum Chemistry.pdf		14-Neural Message Passing for Quantum Chemistry.pdf
15-Neural Machine Translation by Jointly Learning to Align and Translate.pdf		15-Neural Machine Translation by Jointly Learning to Align and Translate.pdf
16-Identity Mappings in Deep Residual Networks.pdf		16-Identity Mappings in Deep Residual Networks.pdf
17-A simple neural network module for relational reasoning.pdf		17-A simple neural network module for relational reasoning.pdf
18-Variational Lossy Autoencoder.pdf		18-Variational Lossy Autoencoder.pdf
19-Relational recurrent neural networks.pdf		19-Relational recurrent neural networks.pdf
20-Quantifying the Rise and Fall of Complexity in Closed Systems The Coffee Automaton.pdf		20-Quantifying the Rise and Fall of Complexity in Closed Systems The Coffee Automaton.pdf
21-Neural Turing Machines.pdf		21-Neural Turing Machines.pdf
22-Deep Speech 2 End-to-End Speech Recognition in English and Mandarin.pdf		22-Deep Speech 2 End-to-End Speech Recognition in English and Mandarin.pdf
23-Scaling Laws for Neural Language Model.pdf		23-Scaling Laws for Neural Language Model.pdf
24-A Tutorial Introduction to the Minimum Description Length Principle.pdf		24-A Tutorial Introduction to the Minimum Description Length Principle.pdf
25-Machine_Super_Intelligence.pdf		25-Machine_Super_Intelligence.pdf
26-Algorithmic Statistics.pdf		26-Algorithmic Statistics.pdf
LICENSE.txt		LICENSE.txt
Readme.md		Readme.md

License

deepbiolab/ilya30u30-paper-research

Folders and files

Latest commit

History

Repository files navigation

Ilya30u30 Paper Research

Purpose

How to Read

Table of Contents

Reading Order

Reproduction Research

Full Paper List

References

Reading Order

Foundational Theory and Introduction to Neural Networks

Machine Translation and Natural Language Processing

Deep Learning and Optimization

Graph Structures and Relational Reasoning

Generative Models and Complexity

Advanced Topics

Reproduction Research

Full Paper List

1) Attention Is All You Need

2) The Annotated Transformer

3) The First Law of Complexodynamics

4) The Unreasonable Effectiveness of Recurrent Neural Networks

5) Understanding LSTM Networks

6) Recurrent Neural Network Regularization

7) Keeping Neural Networks Simple by Minimizing the Description Length of the Weights

8) Pointer Networks

9) ImageNet Classification with Deep Convolutional Neural Networks

10) Order Matters: Sequence to Sequence for Sets

11) GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism

12) Deep Residual Learning for Image Recognition

13) Multi-Scale Context Aggregation by Dilated Convolutions

14) Neural Message Passing for Quantum Chemistry

15) Neural Machine Translation by Jointly Learning to Align and Translate

16) Identity Mappings in Deep Residual Networks

17) A Simple Neural Network Module for Relational Reasoning

18) Variational Lossy Autoencoder

19) Relational Recurrent Neural Networks

20) Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

21) Neural Turing Machines

22) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

23) Scaling Laws for Neural Language Models

24) A Tutorial Introduction to the Minimum Description Length Principle

25) Machine Super Intelligence

26) Kolmogorov Complexity and Algorithmic Randomness

27) CS231n Convolutional Neural Networks for Visual Recognition

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages