This repository contains the links to the models pretrained in the paper Downstream Datasets Make Surprisingly Good Pretraining Corpora . The models are pretrained from scratch on text from the train split of popular downstream datasets. The models are hosted on Huggingface and their links are given below. The links to the datasets used for pretraining are also provided.
Pretraining dataset | Corpus size (MB) | Electra model | Roberta model |
---|---|---|---|
CoNLL-2012 | 6.4 | link | link |
SQuAD-v1.1 | 19 | link | link |
SWAG | 22 | link | link |
AG News | 27 | link | link |
HellaSwag | 30 | link | link |
QQP | 43 | link | link |
Jigsaw | 59 | link | link |
MNLI | 65 | link | link |
Sentiment140 | 114 | link | link |
PAWS | 139 | link | link |
DBPedia14 | 151 | link | link |
Yahoo Answers Topics | 461 | link | link |
Discovery | 293 | link | link |
Amazon Polarity | 1427 | link | link |
Hyperparameter | ELECTRA | Roberta |
---|---|---|
Size | Small | Base |
Parameter count | 14M | 110M |
Training steps | 1M | 100K |
Warmup steps | 10K | 6K |
Batch size | 128 | 512 |
Peak learning rate | 5e-4 | 5e-4 |
Sequence length | 128 | 512 |
More details can be found in the paper: https://arxiv.org/abs/2209.14389
If you use these models, please use citation given below:
@article{krishna2022downstream,
title={Downstream datasets make surprisingly good pretraining corpora},
author={Krishna, Kundan and Garg, Saurabh and Bigham, Jeffrey P and Lipton, Zachary C},
journal={arXiv preprint arXiv:2209.14389},
year={2022}
}