From 2432efda47257828caa55d0844b4577218e02f4b Mon Sep 17 00:00:00 2001
From: Ali Safaya <alisafaya@gmail.com>
Date: Wed, 11 Aug 2021 17:42:49 +0300
Subject: [PATCH] Update README.md

---
 README.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/README.md b/README.md
index 3f3f734..e44fe46 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,7 @@
 # trnews-64 dataset
 
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5180654.svg)](https://doi.org/10.5281/zenodo.5180654)
+
 __trnews-64__ is a character language modeling dataset that contain 64 million words of news articles and columns.
 It can be utilized as a benchmark for different modeling long range dependencies in Turkish language.
 
@@ -9,6 +11,16 @@ This dataset contains a mix of news articles from different topics and journals
 
 This dataset was preprocessed and clean from infrequent characters. The main character set is shared in the file `tr.charset.json`, which contains 124 characters in total. This includes Turkish upper/lower case characters along with punctuations and some other common characters. 
 
+## Download
+
+The dataset is hosted on [Zenodo](https://zenodo.org/), it can be downloaded using the following:
+
+```bash
+wget -O trnews-64.tar.bz2 https://zenodo.org/record/5180654/files/trnews-64.tar.bz2?download=1
+tar -xf trnews-64.tar.bz2
+rm trnews-64.tar.bz2
+```
+
 ## Details
 
 Dataset splits are shared in raw text format and the articles are seperated by empty lines.
@@ -58,6 +70,10 @@ with open("trnews-64.test.raw") as fi:
 }
 ```
 
+## License
+
+This dataset is licensed under [Creative Commons Attribution 4.0 International](./LICENSE) license.
+
 ## Contact
 
 Ali Safaya (alisafaya at gmail dot com).