Release Version 0.2: Reversible tokenization, new word vector API, and more datasets · pytorch/text

Breaking changes:

By default, examples are now sorted within a batch by decreasing sequence length (#95, #139). This is required for use of PyTorch PackedSequences, and it can be flexibly overridden with a Dataset constructor flag.
The unknown token is now included as part of specials and can be overridden or removed in the Field constructor (part of #107).

New word vector API with classes for GloVe and FastText; string descriptors are still accepted for backwards compatibility (#94, #102, #115, #120, thanks @nelson-liu and @bmccann!)
Reversible tokenization (#107). Introduces a new Field subclass, ReversibleField, with a .reverse method that detokenizes. All implementations of ReversibleField should guarantee that the tokenization+detokenization round-trip is idempotent; torchtext provides wrappers for the revtok tokenizer and subword segmenter that satisfy this property.
Skip header line in CSV/TSV loading (#146)
RawFields that represent any data type without processing (#147, thanks @kylegao91!)

Fix pretrained word vector loading (#99, thanks @matt-peters!)
Fix JSON loader silently ignoring requested columns not present in the file (#105, thanks @nelson-liu!)
Many fixes for Python 2, especially surrounding Unicode (#105, #112, #135, #153 thanks @nelson-liu!)
Fix Pipeline.call behavior (#113, thanks @nelson-liu!)
Fix README example (#134, thanks @czhang99!)
Fix WikiText2 loader (#138)
Fix typo in MT loader (#142, thanks @sivareddyg!)
Fix Example.fromlist behavior on non-strings (#145)
Update test set URL for Multi30k (#149)
Fix SNLI data loader (#150, thanks @sivareddyg!)
Fix language modeling iterator (#151)
Remove transpose as a side effect of Field.reverse (#155)