You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, examples are now sorted within a batch by decreasing sequence length (#95, #139). This is required for use of PyTorch PackedSequences, and it can be flexibly overridden with a Dataset constructor flag.
The unknown token is now included as part of specials and can be overridden or removed in the Field constructor (part of #107).
New features:
New word vector API with classes for GloVe and FastText; string descriptors are still accepted for backwards compatibility (#94, #102, #115, #120, thanks @nelson-liu and @bmccann!)
Reversible tokenization (#107). Introduces a new Field subclass, ReversibleField, with a .reverse method that detokenizes. All implementations of ReversibleField should guarantee that the tokenization+detokenization round-trip is idempotent; torchtext provides wrappers for the revtok tokenizer and subword segmenter that satisfy this property.