Skip to content

A custom Tensorflow implementation of Google's Electra NLP model with compositional embeddings using complementary partitions

Notifications You must be signed in to change notification settings

keshavbhandari/Electra-With-Compositional-Embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Electra With Memory-Efficient Compositional Embeddings

Introduction

ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

For a detailed description and experimental results, please refer to the ICLR 2020 paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.

COMPOSITIONAL EMBEDDINGS USING COMPLEMENTARY PARTITIONS is a relatively novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. It is an effective memory efficient technique to deal with models that have massive vocabulary or high cardinality which can result in bottlenecks in the training process. The authors show that the information loss through the generated complementary embeddings are minimal compared to actual embeddings. This quotient-remainder trick used in the paper is more effective compared to the previous hashing trick.

For a detailed description and experimental results, please refer to the paper Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.

Motivation

This repository contains code to pre-train ELECTRA with an option to use memory efficient compositional embeddings for datasets that have huge vocabulary sizes. The repository currently only supports text data from CSV files on a single GPU. An example of fine-tuning Electra on a sentiment classification task is provided. This code is ideal for researchers to use on non English datasets, recommendation engine data, MIDI files, etc.

Pretraining

Use pretraining.py to pre-train an ELECTRA model. It has the following arguments:

  • --raw_data_loc (optional): raw data location of csv file containing text sentences.
  • --col_name (optional): name of text column in dataset to use for pretraining.
  • --working_dir (optional): location of directory to store model weights, configs and vocabulary tokens.
  • --hparams (optional): a dict containing model hyperparameters. See Pretraining_Config.py under Configs folder for the supported default hyperparameters. To override any of the default hyperparameters, pass those as a dictionary. For example: --hparams {"hparam1": value1, "hparam2": value2, ...}.

To see an example notebook of pretraining the Electra model, see Pretraining.ipynb and Pretraining_Compositional_Embeddings.ipynb.

Fine Tuning

Use FineTuning.py to fine tune the pre-trained ELECTRA model for a sentiment classification task. It has the following arguments:

  • --raw_data_loc (optional): raw data location of csv file containing text sentences.
  • --working_dir (optional): location of directory to load the pretrained model weights, configs and vocabulary tokens.
  • --hparams (optional): a dict containing model hyperparameters. See Finetuning_Config.py under Configs folder for the supported default hyperparameters. To override any of the default hyperparameters, pass those as a dictionary. For example: --hparams {"hparam1": value1, "hparam2": value2, ...}.

To see an example notebook of fine tuning the Electra model, see FineTuning.ipynb and FineTuning_Compositional_Embeddings.ipynb.

Setup

Fork this repository and run the pretraining example (instructions above) on the data provided to familiarize yourself with the repository and parameters. To use on your own data, create a csv file with a column of text. Feel free to change the code as needed to support your own requirements.

Contact Info

For issues related to the repository, please raise a GitHub issue or contact me at keshavbhandari@gmail.com

Please star the repository if you find it useful! Thanks :)

About

A custom Tensorflow implementation of Google's Electra NLP model with compositional embeddings using complementary partitions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published