From 8a616873cd6d0b4abffdbcd9e9eaebe0dee22ddc Mon Sep 17 00:00:00 2001 From: Nick Mikhailovsky Date: Sun, 28 Mar 2021 14:05:33 +0300 Subject: [PATCH] Update README.md --- README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 083ec6c..097eafe 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,17 @@ Each utterance has been first transcribed by an open-source ASR. The transcripti For each human transcriber, a transcription pipeline is built by the transcription system. For the quality control purposes, 5% of the utterances were taken from an existing spoken corpus (Mozilla Common Voice) -Each utterance has been transcribed by two human transcribers. In the case where the relative WER of transcriptions was over 5%, the third transcriber resolved the conflict. +Each utterance has been transcribed by two human transcribers. In the case where the relative WER of transcriptions was over 5%, the third transcriber resolved the conflict. + +# Normalized Alphabets +The alphabets have been normalized as per the table below: +Language | Alphabet +---------|---------- +French | azertyuiopqsdfghjklmùwxcvbné'èçàêôâûœ +Spanish | abcdefghijklmnñopqrstuvwxyzáéíóúüé +Arabic | أنت سيرإلىمحةاقثعهذفبئضودجصكخشزطءغظآؤ +Turkish | abcçdefgğhıijklmnoöprsştuüvyz + # License and copyright The MediaSpeech dataset is distributed under the Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video.