3,881 hours - Mandarin Spontaneous Speech Data, the content covering multiple subjects. All the speech audio was manually transcribed into text content; speaker identity, gender, and other attribution are also annotated. This dataset can be used for voiceprint recognition model training, corpus construction for machine translation, and algorithm research introduction, etc.
For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1024?source=Github
16kHz, 16bit, wav, mono channel;
Interview; Sports; Variety; Course; Entertainment, Service, etc.
annotation for the transcription text, speaker identification, gender
Mandarin
at a Sentence Accuracy Rate (SAR) of being no less than 95%
speech recognition, video caption generation and video content review
Commercial License