Natural Language Processing (MADE S02E02)

This repository contains materials for the Natural Language Processing course.

Tip #1:

Loading the entire repository can take a considerable amount of time. A single folder can be downloaded via DownGit.

Tip #2:

Sometimes GitHub failes to render a notebook. In that case use nbviewer — it works like a charm!

Tip #3:

In those cases when nbviewer fails to find a notebook whereas GitHub finds it just fine, try to add ?flush_cache=false at the end of the nbviewer link.

Legend: — slides, — code, — video.

Week	What	When
1	Tasks in NLP, text preprocessing (tokenization, normalization (stemming, lemmatization)), feature extraction (Bag-of-Words, Bag-of-Ngramms, TF-IDF), word embeddings (one-hot, matrix factorization, word2vec, CBOW, Skip-gram, GloVe).	10.03.2021
2	Embeddings: recap (word2vec), usage in unsupervised translation; cosine distance; RNNs, CNNs, n-grams, and their usage examples.	17.03.2021
3	Recap: RNN; LSTM, gates in LSTM; RNNs as encoders for sequential data; vanishing gradient problem; exploding gradient problem.	24.03.2021
4	Neural Machine Translation (NMT): problem statement, historical overview, statistical MT, beam search, BLEU/perplexity scores; Encoder-Decoder architecture, attention.	31.03.2021
5	Recap: attention in seq2seq; Transformer architecture, self-attention.	07.04.2021
6	Recap: self-attention; positional encoding, layer normalization, decoder in Transformer.	14.04.2021
7	OpenAI Transformer (pre-training decoder for language modeling), ELMo (deep contextualized word representations), BERT.	21.04.2021
8	ULMFiT, Transformer-XL, Question Answering (SQuAD, SberQuAD, ODQA), GPT.	28.04.2021

Additional materials:

word embeddings:
- Word Embeddings (by Lena Voita)
- Word2vec tutorial
- Illustrated word2vec (by Jay Alammar)
CNNs:
LSTM and PoS tagging:
- PoS tagging
- Text classification with RNNs and CNNs
Transformers:
- Illustrated Transformer (by Jay Alammar)
- The Annotated Transformer (by Harvard NLP group)
BERT:
- The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) (by Jay Alammar)
- Simple tutorial for distilling BERT (by Paul Gladkov)
- Huggingface Transformers
Question Answering and TTS:
- SberQuAD — Russian Reading Comprehension Dataset: Description and Analysis
- GPT-3 for Russian language
- Voice cloning
- Tacotron 2 Demo (by NVIDIA)
- Voice datasets (by Mozilla)
- Speech recognition and synthesis (ASR and TTS) (by DeepPavlov)
- Russian Open Speech To Text (STT/ASR) Dataset
- DeepSpeech 0.6: Mozilla’s Speech-to-Text Engine Gets Fast, Lean, and Ubiquitous (by Reuben Morais)
- Open Domain Question Answering Skill on Wikipedia (by DeepPavlov)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
01-word-embeddings		01-word-embeddings
02-cnn-for-texts-and-more-embeddings		02-cnn-for-texts-and-more-embeddings
03-lstm-gru-vanishing-gradient		03-lstm-gru-vanishing-gradient
04-machine-translation-and-attention		04-machine-translation-and-attention
05-self-attention-and-transformer		05-self-attention-and-transformer
06-transformer-and-positional-encoding		06-transformer-and-positional-encoding
07-context-based-embeddings-and-bert		07-context-based-embeddings-and-bert
08-question-answering		08-question-answering
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing (MADE S02E02)

About

Releases

Packages

Languages

License

Illumaria/made-natural-language-processing

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing (MADE S02E02)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages