Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

v0.2.0

Compare
Choose a tag to compare
@leezu leezu released this 04 May 18:24
· 769 commits to master since this release

Features

GluonNLP provides its users with easy access to

  • State of the art models
  • Pre-trained word embeddings
  • Many public datasets for different tasks
  • Examples friendly to users that are new to the task
  • Reproducible training scripts

Models

Gluon NLP Toolkit supplies model definitions for common NLP tasks. These can be
adapted for the users requirements or taken as blueprint for new developments.
All of these are implemented using Gluon Blocks
allowing easy reuse as plug-and-play neural network building blocks.

Data

Gluon NLP Toolkit provides tools for building efficient data pipelines for NLP
tasks by defining a Dataset class interface and utilities for transforming them.
Several datasets are included by default and will be automatically downloaded
when used.

  • Language modeling with WikiText
    • WikiText is a popular language modeling dataset from Salesforce. It is a
      collection of over 100 million tokens extracted from the set of verified
      Good and Featured articles on Wikipedia.
  • Sentiment Analysis with IMDB
    • IMDB: IMDB is a popular dataset for binary sentiment classification. It
      provides a set of 25,000 highly polar movie reviews for training, 25,000 for
      testing, and additional unlabeled data.
  • CoNLL datasets
    • These datasets include data for the shared tasks, such as part-of-speech
      (POS) tagging, chunking, named entity recognition (NER), semantic role
      labeling (SRL), etc.
    • We provide built in support for CoNLL 2000 – 2002, 2004, as well as the
      Universal Dependencies dataset which is used in the 2017 and 2018
      competitions.
  • Word embedding evaluation datasets
    • There are a number of commonly used datasets for intrinsic evaluation for
      word embeddings. We provide commonly used datasets for the similarity and
      analogy evaluation tasks.

Gluon NLP further ships with common datasets data transformation functions,
dataset samplers to determine how to iterate through datasets as well as
functions to generate data batches.

A complete and up-to-date list of supplied datasets and utilities is available
in the API documentation
.

Other features

Examples and scripts

The Gluon NLP toolkit also provides scripts that use the functionality of the
toolkit for various tasks

  • Word Embedding Evaluation
  • Beam Search Generator
  • Word language modeling
  • Sentiment Analysis through Fine-tuning, w/ Bucketing
  • Machine Translation