Features

GluonNLP provides its users with easy access to

State of the art models
Pre-trained word embeddings
Many public datasets for different tasks
Examples friendly to users that are new to the task
Reproducible training scripts

Models

Gluon NLP Toolkit supplies model definitions for common NLP tasks. These can be
adapted for the users requirements or taken as blueprint for new developments.
All of these are implemented using Gluon Blocks
allowing easy reuse as plug-and-play neural network building blocks.

Language Models
- Standard RNN language model
- AWD language model by salesforce
Attention Cells
Beam Search
- Beam Search Sampler
- Beam Search Scorer

Data

Gluon NLP Toolkit provides tools for building efficient data pipelines for NLP
tasks by defining a Dataset class interface and utilities for transforming them.
Several datasets are included by default and will be automatically downloaded
when used.

Language modeling with WikiText
- WikiText is a popular language modeling dataset from Salesforce. It is a
  collection of over 100 million tokens extracted from the set of verified
  Good and Featured articles on Wikipedia.
Sentiment Analysis with IMDB
- IMDB: IMDB is a popular dataset for binary sentiment classification. It
  provides a set of 25,000 highly polar movie reviews for training, 25,000 for
  testing, and additional unlabeled data.
CoNLL datasets
- These datasets include data for the shared tasks, such as part-of-speech
  (POS) tagging, chunking, named entity recognition (NER), semantic role
  labeling (SRL), etc.
- We provide built in support for CoNLL 2000 – 2002, 2004, as well as the
  Universal Dependencies dataset which is used in the 2017 and 2018
  competitions.
Word embedding evaluation datasets
- There are a number of commonly used datasets for intrinsic evaluation for
  word embeddings. We provide commonly used datasets for the similarity and
  analogy evaluation tasks.

Gluon NLP further ships with common datasets data transformation functions,
dataset samplers to determine how to iterate through datasets as well as
functions to generate data batches.

A complete and up-to-date list of supplied datasets and utilities is available
in the API documentation.

Other features

Examples and scripts

The Gluon NLP toolkit also provides scripts that use the functionality of the
toolkit for various tasks

Word Embedding Evaluation
Beam Search Generator
Word language modeling
Sentiment Analysis through Fine-tuning, w/ Bucketing
Machine Translation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Features

Models

Data

Other features

Examples and scripts