Skip to content

Inspired by Ilya Sutskever’s 2020 reading list to John Carmack, this repo reproduces and explores key AI papers, known as the "ilya30u30." Dive into detailed notes, code, and insights to deepen your understanding of foundational and advanced deep learning concepts.

License

Notifications You must be signed in to change notification settings

deepbiolab/ilya30u30-paper-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ilya30u30 Paper Research

Purpose

This repo, inspired by the influential reading list recommended by Ilya Sutskever to John Carmack in 2020, curates and reproduces the essential AI and deep learning papers that shaped the field. Dubbed as the "ilya30u30," this collection represents the 27 papers that provide a comprehensive foundation and advanced insights into neural networks, generative models, optimization, and more. Each paper is researched and reproduced, offering detailed notes and code implementations aimed at deepening the understanding of key concepts for researchers, students, and practitioners alike. Through reproduction, I aim to deepen my understanding of the implementation intricacies, validate the findings, and explore potential improvements. By sharing this work, I hope to provide valuable insights and foster further exploration in the deep learning community.

How to Read

The collection is organized to guide readers from foundational theories to advanced topics. Each entry includes a brief summary and a link to the original paper or resource. The suggested reading order at the end can help you navigate through the materials effectively.

First, use the Reading Order section to understand the knowledge dependencies between the 27 papers. Then, explore the Reproduction Research section to delve deeper into the detailed studies, including reproductions and summaries of each paper.

Table of Contents


Reading Order

Foundational Theory and Introduction to Neural Networks

  1. CS231n Convolutional Neural Networks for Visual Recognition
    • A comprehensive course from Stanford University that covers the basics of CNNs, architectures, and training techniques—a great starting point for learning visual recognition.
  2. The Unreasonable Effectiveness of Recurrent Neural Networks
    • Emphasizes the powerful capabilities of RNNs in handling sequential data, further solidifying your understanding of RNNs.
  3. Understanding LSTM Networks
    • Introduces LSTMs and their advantages in handling long-term dependencies, forming a foundation for understanding RNNs.
  4. Recurrent Neural Network Regularization
    • Learn how to optimize RNNs and LSTMs by reducing overfitting through regularization techniques.
  5. ImageNet Classification with Deep Convolutional Neural Networks
    • Understand the basics of Convolutional Neural Networks (CNNs) and their applications in image recognition.
  6. Deep Residual Learning for Image Recognition
    • Learn how ResNet addresses training challenges in deep networks and enhances accuracy.
  7. Multi-Scale Context Aggregation by Dilated Convolutions
    • Delve into using dilated convolutions in semantic segmentation tasks to achieve multi-scale context aggregation.

Machine Translation and Natural Language Processing

  1. Neural Machine Translation by Jointly Learning to Align and Translate
    • Study the foundational models and alignment mechanisms in neural machine translation.
  2. Attention Is All You Need and The Annotated Transformer
    • Deeply understand the Transformer model's implementation details and code through practical explanations.
  3. Pointer Networks
    • Learn about a new network structure that addresses variable-length output sequences, suitable for tasks like sorting and combinatorial optimization.
  4. Scaling Laws for Neural Language Models
    • Explore factors affecting the performance of large language models and their scaling laws.

Deep Learning and Optimization

  1. GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism
    • Learn how to scale neural network capacity through micro-batch pipeline parallelism and understand parallel training techniques for large models.
  2. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
    • Understand how to build and optimize end-to-end speech recognition systems, especially for handling different languages.

Graph Structures and Relational Reasoning

  1. Neural Message Passing for Quantum Chemistry
    • Study the application of graph neural networks in quantum chemistry, a key to understanding supervised learning on graph data structures.
  2. A Simple Neural Network Module for Relational Reasoning
    • Learn how to enhance existing neural networks with relational reasoning modules for unstructured input.
  3. Relational Recurrent Neural Networks
    • Explore new memory modules that perform relational reasoning in sequential data, combining memory and reasoning.

Generative Models and Complexity

  1. Variational Lossy Autoencoder
    • Explore the combination of autoregressive models and variational autoencoders to achieve complex generative tasks.
  2. The First Law of Complexodynamics
    • Understand the evolution of complexity in physical systems and its relation to Kolmogorov complexity.
  3. Kolmogorov Complexity and Algorithmic Randomness
    • Learn Kolmogorov complexity theory and its applications in algorithmic randomness, laying the foundation for unsupervised learning.
  4. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton
    • Delve into the quantitative measurement of complexity in closed systems and understand the trend of complexity over time.

Advanced Topics

  1. Neural Turing Machines
    • Combine neural networks with the concept of Turing machines to expand your understanding of neural network architectures.
  2. Identity Mappings in Deep Residual Networks
    • Further study the internal mechanisms of ResNet and the importance of identity mappings in information propagation.
  3. Order Matters: Sequence to Sequence for Sets
    • Explore how to handle cases where inputs and outputs are not necessarily ordered sequences.
  4. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
    • Learn how the Minimum Description Length principle applies to simplifying neural networks and balancing weight information content.
  5. A Tutorial Introduction to the Minimum Description Length Principle
    • Deepen your understanding of the MDL principle and its applications in model selection and data compression.
  6. Machine Super Intelligence
    • Explore the concepts and research progress in machine superintelligence, understanding the definition of intelligence and its quantifiable standards.

Reproduction Research

Note: The reproduction of the code and detailed notes for each paper are currently a work in progress. These sections will be updated in the main branch once finalized. Until then, you can find the basic information about each paper and follow our progress through ongoing updates.

Field Name Summary Note Reproduction
Foundational Theory and Neural Networks CS231n Convolutional Neural Networks for Visual Recognition A comprehensive course from Stanford University that covers the basics of CNNs, architectures, and training techniques—a great starting point for learning visual recognition. note code
Foundational Theory and Neural Networks The Unreasonable Effectiveness of Recurrent Neural Networks Emphasizes the powerful capabilities of RNNs in handling sequential data, further solidifying your understanding of RNNs. note code
Foundational Theory and Neural Networks Understanding LSTM Networks Introduces LSTMs and their advantages in handling long-term dependencies, forming a foundation for understanding RNNs. note code
Foundational Theory and Neural Networks Recurrent Neural Network Regularization Learn how to optimize RNNs and LSTMs by reducing overfitting through regularization techniques. note code
Foundational Theory and Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Understand the basics of Convolutional Neural Networks (CNNs) and their applications in image recognition. note code
Foundational Theory and Neural Networks Deep Residual Learning for Image Recognition Learn how ResNet addresses training challenges in deep networks and enhances accuracy. note code
Foundational Theory and Neural Networks Multi-Scale Context Aggregation by Dilated Convolutions Delve into using dilated convolutions in semantic segmentation tasks to achieve multi-scale context aggregation. note code
Machine Translation and NLP Neural Machine Translation by Jointly Learning to Align and Translate Study the foundational models and alignment mechanisms in neural machine translation. note code
Machine Translation and NLP Attention Is All You Need No explanation needed—the seminal Transformer paper that is a must-read. note code
Machine Translation and NLP The Annotated Transformer Deeply understand the Transformer model's implementation details and code through practical explanations. note code
Machine Translation and NLP Pointer Networks Learn about a new network structure that addresses variable-length output sequences, suitable for tasks like sorting and combinatorial optimization. note code
Machine Translation and NLP Scaling Laws for Neural Language Models Explore factors affecting the performance of large language models and their scaling laws. note code
Deep Learning and Optimization GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism Learn how to scale neural network capacity through micro-batch pipeline parallelism and understand parallel training techniques for large models. note code
Deep Learning and Optimization Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Understand how to build and optimize end-to-end speech recognition systems, especially for handling different languages. note code
Graph Structures and Relational Reasoning Neural Message Passing for Quantum Chemistry Study the application of graph neural networks in quantum chemistry, a key to understanding supervised learning on graph data structures. note code
Graph Structures and Relational Reasoning A Simple Neural Network Module for Relational Reasoning Learn how to enhance existing neural networks with relational reasoning modules for unstructured input. note code
Graph Structures and Relational Reasoning Relational Recurrent Neural Networks Explore new memory modules that perform relational reasoning in sequential data, combining memory and reasoning. note code
Generative Models and Complexity Variational Lossy Autoencoder Explore the combination of autoregressive models and variational autoencoders to achieve complex generative tasks. note code
Generative Models and Complexity The First Law of Complexodynamics Understand the evolution of complexity in physical systems and its relation to Kolmogorov complexity. note code
Generative Models and Complexity Kolmogorov Complexity and Algorithmic Randomness Learn Kolmogorov complexity theory and its applications in algorithmic randomness, laying the foundation for unsupervised learning. note code
Generative Models and Complexity Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton Delve into the quantitative measurement of complexity in closed systems and understand the trend of complexity over time. note code
Advanced Topics Neural Turing Machines Combine neural networks with the concept of Turing machines to expand your understanding of neural network architectures. note code
Advanced Topics Identity Mappings in Deep Residual Networks Further study the internal mechanisms of ResNet and the importance of identity mappings in information propagation. note code
Advanced Topics Order Matters: Sequence to Sequence for Sets Explore how to handle cases where inputs and outputs are not necessarily ordered sequences. note code
Advanced Topics Keeping Neural Networks Simple by Minimizing the Description Length of the Weights Learn how the Minimum Description Length principle applies to simplifying neural networks and balancing weight information content. note code
Advanced Topics A Tutorial Introduction to the Minimum Description Length Principle Deepen your understanding of the MDL principle and its applications in model selection and data compression. note code
Advanced Topics Machine Super Intelligence Explore the concepts and research progress in machine superintelligence, understanding the definition of intelligence and its quantifiable standards. note code

Full Paper List

1) Attention Is All You Need

No explanation needed—the seminal Transformer paper that is a must-read.

Link: https://arxiv.org/pdf/1706.03762


2) The Annotated Transformer

This article is a blog post written in 2018 by researchers including Alexander Rush, an associate professor at Cornell University. It provides a line-by-line explanation of the Transformer and includes a complete Python implementation. This helps readers understand the theory while deepening their knowledge through practice.

Article: https://nlp.seas.harvard.edu/2018/04/03/attention.html

Code: https://github.com/harvardnlp/annotated-transformer/


3) The First Law of Complexodynamics

This is an article titled "The First Law of Complexodynamics" by Scott Aaronson, discussing why the "complexity" or "interestingness" of physical systems seems to increase over time, reach a maximum, and then decrease, while entropy increases monotonically. Aaronson attempts to explain this phenomenon using Kolmogorov complexity and related concepts, pointing out several challenges and possible solutions in this field.

Article: https://scottaaronson.blog/?p=762


4) The Unreasonable Effectiveness of Recurrent Neural Networks

This article, written by Andrej Karpathy in 2015, emphasizes the effectiveness of Recurrent Neural Networks (RNNs). It explores the powerful capabilities of RNNs in handling sequential data.

Link: https://karpathy.github.io/2015/05/21/rnn-effectiveness/


5) Understanding LSTM Networks

Written in 2015 by Christopher Olah, this article introduces Long Short-Term Memory (LSTM) networks, a special kind of RNN capable of handling long-term dependencies. LSTMs have achieved great success in fields like speech recognition, language modeling, translation, and image captioning.

Link: https://colah.github.io/posts/2015-08-Understanding-LSTMs/


6) Recurrent Neural Network Regularization

Authored by Ilya Sutskever in 2015, this paper proposes a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. The paper demonstrates how to correctly apply dropout to LSTM networks, significantly reducing overfitting across various tasks, including language modeling, speech recognition, image caption generation, and machine translation.

Link: https://arxiv.org/pdf/1409.2329.pdf


7) Keeping Neural Networks Simple by Minimizing the Description Length of the Weights

This paper discusses how the generalization ability of supervised neural networks improves when the weights contain less information than the output vectors of the training cases. By penalizing the information content of the weights during learning, the weights are kept simple. This can be achieved by adding Gaussian noise to control the information content of the weights.

Link: https://www.cs.toronto.edu/~hinton/absps/colt93.pdf


8) Pointer Networks

This paper introduces a new neural network architecture designed to learn the conditional probability of output sequences composed of discrete tokens that represent positions in an input sequence. Existing models like sequence-to-sequence and Neural Turing Machines struggle with problems where the target output dictionary size depends on the input length, such as sorting variable-length sequences and various combinatorial optimization problems.

Link: https://arxiv.org/pdf/1506.03134


9) ImageNet Classification with Deep Convolutional Neural Networks

Authored by Geoffrey Hinton, Ilya Sutskever, and others, this groundbreaking paper introduced AlexNet, revolutionizing image recognition and kickstarting the deep learning revolution. They trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into 1000 different classes.

Link: https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf


10) Order Matters: Sequence to Sequence for Sets

The paper explores how the order in which data is organized affects the learning of underlying patterns. The authors investigate an extension of the sequence-to-sequence (seq2seq) framework to go beyond sequences and handle input sets in a principled way. They also propose a loss function to address the lack of structure in output sets by exploring different data sequences during training.

Link: https://arxiv.org/pdf/1511.06391


11) GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism

This paper introduces GPipe, a model parallelism library that allows scaling the capacity of large neural networks through micro-batch pipeline parallelism. The authors demonstrate its application in image classification and multilingual neural machine translation tasks.

Link: https://arxiv.org/pdf/1811.06965


12) Deep Residual Learning for Image Recognition

Authored by Kaiming He, this 2016 CVPR Best Paper describes the Deep Residual Learning framework, significantly reducing the difficulty of training deeper neural networks and improving accuracy.

Link: https://arxiv.org/pdf/1512.03385


13) Multi-Scale Context Aggregation by Dilated Convolutions

The authors develop a novel convolutional network module tailored for dense prediction tasks like semantic segmentation. This module uses dilated convolutions to effectively aggregate multi-scale contextual information without reducing image resolution.

Link: https://arxiv.org/pdf/1511.07122


14) Neural Message Passing for Quantum Chemistry

The paper summarizes and organizes existing neural network models for graph-structured data that the authors believe are most promising. It proposes a general framework for supervised learning on graphs called Message Passing Neural Networks (MPNNs).

Link: https://arxiv.org/pdf/1704.01212


15) Neural Machine Translation by Jointly Learning to Align and Translate

Published in 2014 by Bahdanau et al., this paper is one of the pioneering works in neural machine translation. It introduces a novel model architecture and training method that allows neural networks to automatically search for parts of a source sentence that are relevant to predicting a target word.

Link: https://arxiv.org/pdf/1409.0473


16) Identity Mappings in Deep Residual Networks

This paper, also authored by Kaiming He, further analyzes the propagation mechanisms behind residual blocks. The authors propose a new residual unit that simplifies the training process and improves model generalization.

Link: https://arxiv.org/pdf/1603.05027


17) A Simple Neural Network Module for Relational Reasoning

To explore relational reasoning further and test whether this capability can be easily added to existing systems, DeepMind researchers developed a simple, plug-and-play module called the Relation Network (RN). This module can be inserted into existing neural network architectures to equip them with the ability to reason about relationships between entities.

Link: https://arxiv.org/pdf/1706.01427


18) Variational Lossy Autoencoder

This paper successfully combines autoregressive models with Variational Autoencoders (VAEs) to achieve generative tasks. It addresses the issue where VAEs tend to ignore some latent representations during training and introduces the Variational Lossy Autoencoder (VLAE).

Link: https://arxiv.org/pdf/1611.02731


19) Relational Recurrent Neural Networks

This paper from DeepMind and University College London introduces the Relational Memory Core (RMC), capable of performing relational reasoning in sequential information. It achieves state-of-the-art performance on the WikiText-103, Project Gutenberg, and GigaWord datasets.

Link: https://arxiv.org/pdf/1806.01822


20) Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

This paper attempts to measure the pattern where the "complexity" or "interestingness" of closed systems increases over time, reaches a maximum, and then decreases, unlike entropy, which increases monotonically. The authors use a simple two-dimensional cellular automaton model to simulate the mixing of two liquids ("coffee" and "cream") and propose "structural complexity" as an approximate measure of Kolmogorov complexity.

Link: https://arxiv.org/pdf/1405.6903

Further Reading: Beauty and Structural Complexity (in Chinese)


21) Neural Turing Machines

Neural Turing Machines (NTMs) are a deep learning algorithm that combines neural networks and the concept of Turing machines. The paper enhances the capabilities of neural networks by coupling them to external memory resources, with which they can interact using attention mechanisms.

Link: https://arxiv.org/pdf/1410.5401


22) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Published by Baidu Research's Silicon Valley AI Lab, the authors demonstrate an end-to-end deep learning approach that can recognize English and Mandarin speech. They replace hand-engineered components with neural networks, handling various speech scenarios, including noisy environments and different accents.

Link: https://arxiv.org/pdf/1512.02595.pdf


23) Scaling Laws for Neural Language Models

A classic paper from OpenAI, the authors explore the factors that affect language model performance in terms of cross-entropy loss. They find that model size, dataset size, and training compute affect the loss and can be largely traded off against each other.

Link: https://arxiv.org/pdf/2001.08361


24) A Tutorial Introduction to the Minimum Description Length Principle

This paper provides a tutorial introduction to the Minimum Description Length (MDL) principle, a method for model selection and data compression.

Link: https://arxiv.org/pdf/math/0406077


25) Machine Super Intelligence

Authored by DeepMind co-founder and chief scientist Shane Legg, this 2008 doctoral thesis is considered one of the earliest academic papers to systematically explore Artificial General Intelligence (AGI). It lays the foundation for subsequent research in the field.

Link: https://www.vetta.org/documents/Machine_Super_Intelligence.pdf


26) Kolmogorov Complexity and Algorithmic Randomness

Published by the American Mathematical Society, this book by A. Shen, V. A. Uspenskii, and N. K. Vereshchagin introduces Kolmogorov complexity theory and its applications in algorithmic randomness, providing a theoretical foundation for understanding computational complexity and randomness.

Link: https://www.lirmm.fr/~ashen/kolmbook-eng-scan.pdf


27) CS231n Convolutional Neural Networks for Visual Recognition

CS231n is a machine learning course at Stanford University, focusing on using Convolutional Neural Networks for visual recognition. It comprehensively covers CNN architectures, training techniques, and the latest research findings.

Link: https://cs231n.github.io/


References

Ref: Exclusive Q&A: John Carmack’s ‘Different Path’ to Artificial General Intelligence

"So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head."

Ref: https://x.com/ID_AA_Carmack/status/1622673143469858816

I rather expected @ilyasut to have made a public post by now after all the discussion of the AI reading list he gave me. A canonical list of references from a leading figure would be appreciated by many. I would be curious myself about what he would add from the last three years.

About

Inspired by Ilya Sutskever’s 2020 reading list to John Carmack, this repo reproduces and explores key AI papers, known as the "ilya30u30." Dive into detailed notes, code, and insights to deepen your understanding of foundational and advanced deep learning concepts.

Topics

Resources

License

Stars

Watchers

Forks

Languages