Use cases:
- Text generation.
- Text classification/Sentiment Analysis.
- Text Summarisation.
- Text rewriting/Paraphrasing.
- Text clustering.
- Embeddings generation.
- Translation.
- Text generation steps:
python pytorch-transformers/examples/run_generation.py
--model_type=transfo-xl
--length=100
--model_name_or_path=transfo-xl-wt103
- Text generation steps:
python pytorch-transformers/examples/run_generation.py
--model_type=xlnet
--length=50
--model_name_or_path=xlnet-base-cased
- Text completion steps:
- Tokenize and index the text as a sequence of numbers
- Pass it to the gp2 pretrained model e.g Pytorch's
GPT2LMHeadModel
. - Get predictions.
- Text generation steps:
python pytorch-transformers/examples/run_generation.py
--model_type=gpt2
--length=100
--model_name_or_path=gpt2
Universal Language Model Fine Tuning - ULMFiT
Steps:
- Data prep.
- Creating LM Model & fine-tuning it with the pre-trained model.
- Get predictions with the fine tuned model.
Implementations in Spacy and Fastai.
- Masked language modeling steps:
- Text tokenisation.
- Convert tokesn into a sequence of integers.
- Use bert's masked language model e.g Pytorch's
BertForMaskedLM
. - Get predictions.
Embeddings from Language Model - ELMo
- NLP framework by AllenNLP. Word vectors are calculated using a 2-layer bidirectional language model (biLM). Each layer comprises back &forward pass.
- Represents word embeddings using complete sentence, thus, capture the context of the word used in the sentence unlike Glove and Word2Vec.
- Captures latent syntattic-semantic info from text.
- Gives word embeddings based on its sorrounding text.