Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 1.51 KB

Sequence-to-Sequence.md

File metadata and controls

25 lines (21 loc) · 1.51 KB

Key ideas

  • DNNs work well with training sets but without them they can't be used to map sequences to sequences
  • Our method maps input sequence through LSTM to a vector of fixed dimensionality, then another deep LSTM to decode the target sequence from the vector

Introduction

  • Despite DNN success, they need the target and input to be encoded in vectors of fixed dimensionality
  • Many important problems dimensionality is not known a priori, such as machine translation and speech recognition
  • Domain-independent sequence to sequences method would be useful
  • LSTM 1 to obtain large fixed-dimensionality vector representation
  • LSTM 2 to extract sequence from such vector
  • Reversed order of words in source sequences but not target seems to improve accuracy

Model

  • RNN is a natural generalization of feedforward neural networks to sequences
  • Given a sequence of inputs x_1, ..., x_T, a standard RNN computes a sequence of outputs y_1, ..., y_T by iterating:
    • h_t = sigm(W^hx * x_t + W^hh * h_t-1)
    • y t = W^yh * h_t
  • A simple strategy for seq2seq learning is to map the input size to a fixed-size vector using one RNN and decode such vector to the target with another RNN
  • The goal of the LSTM is to estimate conditional probability p(y|x) for a sequence xT and output yT' - where T' and T might differ in length
  • Use 2 different LSTMS, deep (4 layers) for "encoding" "decoding"

Experiments (skipped)