EmailBERT

Pretrained RoBERTa finetuned on Enron email dataset & aeslc email dataset with Masked Language Modeling train objective.

How to use

from transformers import AutoTokenizer, pipeline

model_checkpoint = "snoop2head/EmailBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
mask_filler = pipeline(
    "fill-mask", model=model_checkpoint
)
text = f"nPlease {tokenizer.mask_token} the following distribution list with updates:\nPhillip Allen (pallen@enron.com)\nMike Grigsby (mike.grigsby@enron.com)\nKeith Holst (kholst@enron.com)\nMonique Sanchez\nThank you for your help\nPhillip Allen'"

preds = mask_filler(text)

for pred in preds:
    print(f"Mask Filled: {pred['sequence']}")

References

Dataset: https://huggingface.co/datasets/snoop2head/enron_aeslc_emails
MLM Finetuning: https://huggingface.co/course/chapter7/3?fw=pt

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmailBERT

How to use

References

About

Releases

Packages

Languages

yonsei-sslab/EmailBERT

Folders and files

Latest commit

History

Repository files navigation

EmailBERT

How to use

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages