Skip to content

Latest commit

 

History

History
151 lines (129 loc) · 13.2 KB

README.md

File metadata and controls

151 lines (129 loc) · 13.2 KB

NLP 101: a Resource Repository for Deep Learning and Natural Language Processing

This document is drafted for those who have enthusiasm for Deep Learning in natural language processing. If there are any good recommendations or suggestions, I will try to add more.

This document is drafted with the rules as follows:

  • Materials that are considered to cover the same grounds will not be recorded repeatedly.
  • Only one among those within similar level of difficulty will be recorded.
  • Materials with different level of difficulty that need prerequsite or additional learning will be recorded.

Language: Korean | English


Mathematics

Statistics and Probabilities

Source Description
Statistics 110 A lecture on Probability that can be easily understood by non-engineering major students.
Brandon Foltz's Statistics Brandon Foltz's Probability and Statistics lectures are posted on Youtube and is rather short, so it can be easily accessed during daily commute.

Linear Algebra

Source Description
Essence of Linear Algebra A Linear algebraic lecture on Youtube channel 3Blue1Brown. Could be a big help for those planning to take undergraduate-level linear algebra since it allows overall understanding. It provides intutitively understandable visual aids to getting the picture of Linear algebra.
Linear Algebra A legendary lecture of professor Gilbert Strang.
Matrix methods in Data Analysis and Machine Learning Professor Gilbert Strang's lecture on applied Linear algebra. As Linear algbra is prerequisite knowledge here, it is quite difficult to understand yet a great lecture to learn how Linear algebra is actually applied in the field of Machine Learning.

Basic mathematics & Overview

Source Description
Essence of calculus A calculus lecture by the channel 3Blue1Brown mentioned above, helpful for those who want an overview of calculus likewise.
Calculus A coursebook on calculus written by professor Gilbert Strang. There is no need to go through the whole book, but chapters 2-4, 11-13, 15-16 are very worth studying.
Mathematics for Machine Learning A book on all the mathematical knowledge accompanied with machine learning. Mathematic knowledge within the collegiate level of natural sciences or engineering is preferable here, as the explanations are mainly broad-brush.

Deep Learning and Natural Language Processing

Deep Learning

Source Description
CS230 A Deep Learning lecture of the renouned professor Andrew Ng, who has recently founded a startup on AI education.
Deep Learning Book A book written by Ian Goodfellow, the father of GAN, and other renouned professors.
Dive into Deep Learning While the 'Deep Learning Book' above has theoretical explanation, this book also includes the codes to check how the notion is actually immplemented.
Grokking Deep Learning Teaches readers how to write basic elements of the neural network with NumPy, without using Deep Learning Frameworks. Also a good material to study how high-level APIs work under the hood.

Natural Language Processing

Source Description
Neural Network Methods for NLP An NLP book using Deep Learning written by Yoav Goldberg. It has witty explanations that lead to the fundamentals.
Eisenstein's NLP Note Awesome book to read that deals with not only NLP with machine learning, but also the basic linguistic knowledge to understand it. Eisenstein's book Introduction to Natural Language Processing was published based on this note.
CS224N Awesome NLP lecture from Stanford. It has the 2019 version, dealing with the latest trends.
CS224U An NLP lecture that was revalued since the advent of GLUE benchmark. Recommended to be taken after CS224N, and its merit is that it provides exercises in Pytorch.
Code-First Intro to Natural Language Processing A code-first NLP lecture by Rachel Thomas, the co-founder of fast.ai. The motivation that Rachel Thomas gives is mind blowing.
Natural Language Processing with PyTorch An NLP book from O'REILLY, known for numerous data science books of great quality. It is PyTorch-friendly as all the codes are written in PyTorch.
Linguistic Fundamentals for Natural Language Processing A Linguistics book written by the linguist Emily Bender, known for Bender rule. Although not Deep Learning related, it is a great beginner's book on linguistic domain knowledge.

Libraries related to the Natural Language Processing

Source Description
NumPy Stanford's lecture CS231N deals with NumPy, which is fundamental in machine learning calculations.
Tensorflow A tutorial provided by Tensorflow. It gives great explanations on the basics with visual aids.
PyTorch An awesome tutorial on Pytorch provided by Facebook with great quality.
tensor2tensor Sequence to Sequence tool kit by Google written in Tensorflow.
fairseq Sequence to Sequence tool kit by Facebook written in Pytorch.
Hugging Face Transformers A library based on Transformer provided by Hugging Face that allows easy access to pre-trained models. One of the key NLP libraries to not only developers but researchers as well.
Hugging Face Tokenizers A tokenizer library that Hugging Face maintains. It boosts fast operations as the key functions are written in Rust. The latest tokenizers such as BPE can be tried out with Hugging Face tokenizers.
spaCy A tutorial written by Ines, the core developer of the noteworthy spaCy.
torchtext A tutorial on torchtext, a package that makes data preprocessing handy. Has more details than the official documentation.
SentencePiece Google's open source library that builds BPE-based vocabulary using subword information.

Useful materials


AWESOME blogs

Blog Article you should read
Christopher Olah's Blog Understanding LSTM Networks
Jay Alammar's Blog Illustrated Word2vec
Sebastian Ruder's Blog Tracking Progress in Natural Language Processing
Chris McCormick's Blog Word2Vec Tutorial - The Skip-Gram Model
The Gradient Evaluation Metrics for Language Modeling
Distill.pub Visualizing memorization in RNNs
Thomas Wolf's Blog The Current Best of Universal Word Embeddings and Sentence Embeddings
dair.ai A Light Introduction to Transfer Learning for NLP
Machine Learning Mastery How to Develop a Neural Machine Translation System from Scratch

NLP Specialists You should remember

(not enumarted by rank)

Name Description Known for
Kyunghyun Cho Professor @NYU GRU
Yejin Choi Professor @Washington Univ. Grover
Yoon Kim Ph.D Candidate @Harvard Univ. CNN for NLP
Minjoon Seo Researcher @Clova AI, Allen AI BiDAF
Kyubyong Park Researcher @Kakao Brain Paper implementation & NLP with Korean language
Tomas Mikolov Researcher @FAIR Word2vec
Omer Levy Researcher @FAIR Various Word Embedding techniques
Jason Weston Researcher @FAIR Memory Networks
Yinhan Liu Researcher @FAIR RoBERTa
Guillaume Lample Researcher @FAIR XLM
Alexis Conneau Researcher @FAIR XLM-R
Mike Lewis Researcher @FAIR BART
Ashish Vaswani Researcher @Google Transformer
Jacob Devlin Researcher @Google BERT
Kenton Lee Researcher @Google E2E Coref
Matthew Peters Researcher @Allen AI ELMo
Alec Radford Researcher @Open AI GPT-2
Sebastian Ruder Researcher @DeepMind NLP Progress
Richard Socher Researcher @Salesforce Glove
Jeremy Howard Co-founder @Fast.ai ULMFiT
Thomas Wolf Lead Engineer @Hugging face pytorch-transformers
Luke Zettlemoyer Professor @Washington Univ. ELMo
Yoav Goldberg Professor @Bar Ilan Univ. Neural Net Methods for NLP
Chris Manning Professor @Stanford Univ. CS224N
Dan Jurafsky Professor @Stanford Univ. Speech and Language Processing
Graham Neubig Professor @CMU Neural Nets for NLP
Sam Bowman Professor @NYU NLI Benchmark
Nikita Kitaev Ph.D Candidate @UC Berkeley Reformer
Zihang Dai Ph.D Candidate @CMU Transformer-XL
Zhilin Yang Ph.D Candidate @CMU XLNet
Abigail See Ph.D Candidate @Stanford Univ. Pointer Generator
Kevin Clark Ph.D Candidate @Stanford Univ. ELECTRA
Eric Wallace Ph.D Candidate @Berkely Univ. AllenNLP Interpret

Research Conferences