Skip to content

RoBERTa finetuned on Enron email dataset & aeslc email dataset

Notifications You must be signed in to change notification settings

yonsei-sslab/EmailBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EmailBERT

Pretrained RoBERTa finetuned on Enron email dataset & aeslc email dataset with Masked Language Modeling train objective.

How to use

from transformers import AutoTokenizer, pipeline

model_checkpoint = "snoop2head/EmailBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
mask_filler = pipeline(
    "fill-mask", model=model_checkpoint
)
text = f"nPlease {tokenizer.mask_token} the following distribution list with updates:\nPhillip Allen (pallen@enron.com)\nMike Grigsby (mike.grigsby@enron.com)\nKeith Holst (kholst@enron.com)\nMonique Sanchez\nThank you for your help\nPhillip Allen'"

preds = mask_filler(text)

for pred in preds:
    print(f"Mask Filled: {pred['sequence']}")

References

About

RoBERTa finetuned on Enron email dataset & aeslc email dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages