Skip to content
This repository has been archived by the owner on Jan 23, 2024. It is now read-only.

Latest commit

 

History

History
35 lines (21 loc) · 2.49 KB

README.md

File metadata and controls

35 lines (21 loc) · 2.49 KB

hebrew-gpt_neo

Hebrew text generation models based on EleutherAI's gpt-neo. Each was trained on a TPUv3-8 which was made avilable to me via the TPU Research Cloud Program.

JS Colab notebook Open in Google Colab

Gradio Colab notebook Open in Google Colab

Datasets

  1. An assortment of various Hebrew corpuses - I have made it available here

  2. oscar / unshuffled_deduplicated_he - Homepage | Dataset Permalink

The Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.

Models

hebrew-gpt_neo-xl

hebrew-gpt_neo-small

hebrew-gpt_neo-tiny