Skip to content

Commit

Permalink
Add Thai Dialect Corpus
Browse files Browse the repository at this point in the history
  • Loading branch information
wannaphong authored May 12, 2024
1 parent 68598c3 commit 80a500a
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/tasks/speech-recognition.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
| Lotus Cell | Thai Speech corpus over the phone. (not full corpus) | 11 hours | CC BY-SA-NC 3.0 | NECTEC | [Mirror from @korakot: GitHub](https://github.com/korakot/corpus/releases/download/v1.0/LOTUS-cell-v1.0.zip) |
| Thai Elderly Speech dataset by Data Wow and VISAI | Thai Elderly Speech dataset, consisting of 17 hours 11 minutes (19,200 files). The files are divided into 2 categories: Health care (health issues and services) and Smart Home (using Smart Home devices in household contexts). | 17 hours 11 minutes | CC BY-SA 4.0 | VISAI AI Company Limited and Data Wow Company Limited | [VISAI AI Company Limited and Data Wow Company Limited](https://github.com/VISAI-DATAWOW/Thai-Elderly-Speech-dataset/releases/tag/v1.0.0) |
| FLEURS | Fleurs is the speech version of the FLoRes machine translation benchmark. We use 2009 n-way parallel sentences from the FLoRes dev and devtest publicly available sets, in 102 languages. | | CC BY | Google | [huggingface](https://huggingface.co/datasets/google/fleurs) |
| XTREME-S | The Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S) benchmark is a benchmark designed to evaluate speech representations across languages, tasks, domains and data regimes. It covers 102 languages from 10+ language families, 3 different domains and 4 task families: speech recognition, translation, classification and retrieval. | | CC BY | Google | [huggingface](https://huggingface.co/datasets/google/xtreme_s) |
| XTREME-S | The Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S) benchmark is a benchmark designed to evaluate speech representations across languages, tasks, domains and data regimes. It covers 102 languages from 10+ language families, 3 different domains and 4 task families: speech recognition, translation, classification and retrieval. | | CC-BY-SA 4.0 | Google | [huggingface](https://huggingface.co/datasets/google/xtreme_s) |
| Thai Dialect Corpus | Corpus of Central Thai dialect and three other Thai dialects (Khummuang, Korat, and Pattani). | | CC BY | Chulalongkorn University | [[Github](https://github.com/SLSCU/thai-dialect-corpus) |


### Software
Expand Down

0 comments on commit 80a500a

Please sign in to comment.