Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nickm197 authored Mar 28, 2021
1 parent 21b1084 commit 8a61687
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,17 @@ Each utterance has been first transcribed by an open-source ASR. The transcripti

For each human transcriber, a transcription pipeline is built by the transcription system. For the quality control purposes, 5% of the utterances were taken from an existing spoken corpus (Mozilla Common Voice)

Each utterance has been transcribed by two human transcribers. In the case where the relative WER of transcriptions was over 5%, the third transcriber resolved the conflict.
Each utterance has been transcribed by two human transcribers. In the case where the relative WER of transcriptions was over 5%, the third transcriber resolved the conflict.

# Normalized Alphabets
The alphabets have been normalized as per the table below:
Language | Alphabet
---------|----------
French | azertyuiopqsdfghjklmùwxcvbné'èçàêôâûœ
Spanish | abcdefghijklmnñopqrstuvwxyzáéíóúüé
Arabic | أنت سيرإلىمحةاقثعهذفبئضودجصكخشزطءغظآؤ
Turkish | abcçdefgğhıijklmnoöprsştuüvyz


# License and copyright
The MediaSpeech dataset is distributed under the Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video.
Expand Down

0 comments on commit 8a61687

Please sign in to comment.