- DNNs work well with training sets but without them they can't be used to map sequences to sequences
- Our method maps input sequence through LSTM to a vector of fixed dimensionality, then another deep LSTM to decode the target sequence from the vector
- Despite DNN success, they need the target and input to be encoded in vectors of fixed dimensionality
- Many important problems dimensionality is not known a priori, such as machine translation and speech recognition
- Domain-independent sequence to sequences method would be useful
- LSTM 1 to obtain large fixed-dimensionality vector representation
- LSTM 2 to extract sequence from such vector
- Reversed order of words in source sequences but not target seems to improve accuracy
- RNN is a natural generalization of feedforward neural networks to sequences
- Given a sequence of inputs x_1, ..., x_T, a standard RNN computes a sequence of outputs y_1, ..., y_T by iterating:
- h_t = sigm(W^hx * x_t + W^hh * h_t-1)
- y t = W^yh * h_t
- A simple strategy for seq2seq learning is to map the input size to a fixed-size vector using one RNN and decode such vector to the target with another RNN
- The goal of the LSTM is to estimate conditional probability p(y|x) for a sequence xT and output yT' - where T' and T might differ in length
- Use 2 different LSTMS, deep (4 layers) for "encoding" "decoding"