Skip to content

Latest commit

 

History

History
55 lines (35 loc) · 2.25 KB

11.md

File metadata and controls

55 lines (35 loc) · 2.25 KB

11. History of NN architectures in NLP: CNNs, LSTMs

Explanations, formulas, visualisations:

 

RNNs

  • RNNs are designed to process sequences, when the order of symbols is important
  • Hidden layers of RNNs model the whole sequence, each token in the input is represented with a hidden state
  • The hidden states can be seen as layers of a deep network (often called units) connected via the RNN abstraction
  • The RNN abstraction: current state = current input + previous state
  • Gating is invented to control how much history is allowed to pass to the next state, gated RNN units have more structure than simple RNN unit
  • by design uni-directional, in NLP mostly used as bi-directional: concatenation of the left-to-right and right-to-left pass
  • can output a symbol at each step or at the end of the sequence

 

LSTMs are gated RNNs

  • LSTM is a kind of gating: each state consists of a hidden and a memory (also called context) component
  • Hidden component is the main unit, memory is additional unit that helps gating
  • Gates: input, forget, output

 

CNNs as feature extractors

  • designed to identify relevant input features
  • specifically in NLP: relevant segments of text are found with 1D convolutions
  • convolution produces one vector representation for each n-gram
  • each filter is responsible for one cell in the vector, dimensionality of n-gram vector is the number of filters
  • n-gram representations are then pooled into a single vector representing the whole text
  • this vector is input to a classifier
  • n-gram size ≠ filter size!

 

NLP tasks: token level vs. sentence level

  • Typically CNNs are better suited for the sentence-level tasks (e.g. spam filtering, sentiment), while LSTMs for token level tasks (e.g. NER, POS tagging)
  • LSTMs are also often used for sentence level tasks: the last state is a representation of the whole sequence
  • The use of CNNs for token-level tasks is possible but not common