Skip to content

Latest commit

 

History

History
128 lines (99 loc) · 23.1 KB

modelscope_models.md

File metadata and controls

128 lines (99 loc) · 23.1 KB

(简体中文|English)

Pretrained Models Released on ModelScope

Model License

You are free to use, copy, modify, and share FunASR models under the conditions of this agreement. You should indicate the model source and author information when using, copying, modifying and sharing FunASR models. You should keep the relevant names of models in [FunASR software].. Full model license could see license

Model Usage

Ref to docs

Model Zoo

Here we provided several pretrained models on different datasets. The details of models and datasets can be found on ModelScope.

Speech Recognition

Paraformer

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
Paraformer-large CN & EN Alibaba Speech Data (60000hours) 8404 220M Offline Duration of input wav <= 20s
Paraformer-large-long CN & EN Alibaba Speech Data (60000hours) 8404 220M Offline Which would deal with arbitrary length input wav
Paraformer-large-en-long EN Alibaba Speech Data (50000hours) 10020 220M Offline Which would deal with arbitrary length input wav
Paraformer-large-Spk CN & EN Alibaba Speech Data (60000hours) 8404 220M Offline Supporting speaker diarizatioin for ASR results based on paraformer-large-long
Paraformer-large-contextual CN & EN Alibaba Speech Data (60000hours) 8404 220M Offline Which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
Paraformer CN & EN Alibaba Speech Data (50000hours) 8358 68M Offline Duration of input wav <= 20s
Paraformer-online CN & EN Alibaba Speech Data (50000hours) 8404 68M Online Which could deal with streaming input
Paraformer-large-online CN & EN Alibaba Speech Data (60000hours) 8404 220M Online Which could deal with streaming input
Paraformer-tiny CN Alibaba Speech Data (200hours) 544 5.2M Offline Lightweight Paraformer model which supports Mandarin command words recognition
Paraformer-aishell CN AISHELL (178hours) 4234 43M Offline
ParaformerBert-aishell CN AISHELL (178hours) 4234 43M Offline
Paraformer-aishell2 CN AISHELL-2 (1000hours) 5212 64M Offline
ParaformerBert-aishell2 CN AISHELL-2 (1000hours) 5212 64M Offline

UniASR [Unify Streaming and Non-streaming]

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
UniASR CN & EN Alibaba Speech Data (60000 hours) 8358 100M Online UniASR streaming offline unifying models
UniASR-large CN & EN Alibaba Speech Data (60000 hours) 8358 220M Offline UniASR streaming offline unifying models
UniASR English EN Alibaba Speech Data (10000 hours) 1080 95M Online UniASR streaming online unifying models
UniASR Russian RU Alibaba Speech Data (5000 hours) 1664 95M Online UniASR streaming online unifying models
UniASR Japanese JA Alibaba Speech Data (5000 hours) 5977 95M Online UniASR streaming offline unifying models
UniASR Korean KO Alibaba Speech Data (2000 hours) 6400 95M Online UniASR streaming online unifying models
UniASR Cantonese (CHS) Cantonese (CHS) Alibaba Speech Data (5000 hours) 1468 95M Online UniASR streaming online unifying models
UniASR Indonesian ID Alibaba Speech Data (1000 hours) 1067 95M Online UniASR streaming offline unifying models
UniASR Vietnamese VI Alibaba Speech Data (1000 hours) 1001 95M Online UniASR streaming offline unifying models
UniASR Spanish ES Alibaba Speech Data (1000 hours) 3445 95M Online UniASR streaming online unifying models
UniASR Portuguese PT Alibaba Speech Data (1000 hours) 1617 95M Online UniASR streaming offline unifying models
UniASR French FR Alibaba Speech Data (1000 hours) 3472 95M Online UniASR streaming online unifying models
UniASR German GE Alibaba Speech Data (1000 hours) 3690 95M Online UniASR streaming online unifying models
UniASR Persian FA Alibaba Speech Data (1000 hours) 1257 95M Online UniASR streaming offline unifying models
UniASR Burmese MY Alibaba Speech Data (1000 hours) 696 95M Online UniASR streaming offline unifying models
UniASR Hebrew HE Alibaba Speech Data (1000 hours) 1085 95M Online UniASR streaming offline unifying models
UniASR Urdu UR Alibaba Speech Data (1000 hours) 877 95M Online UniASR streaming offline unifying models
UniASR Turkish TR Alibaba Speech Data (1000 hours) 1582 95M Online UniASR streaming offline unifying models

Conformer

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
Conformer CN AISHELL (178hours) 4234 44M Offline Duration of input wav <= 20s
Conformer CN AISHELL-2 (1000hours) 5212 44M Offline Duration of input wav <= 20s
Conformer EN Alibaba Speech Data (10000hours) 4199 220M Offline Duration of input wav <= 20s

Multi-talker Speech Recognition

Model Name Language Training Data Vocab Size Parameter Offline/Online Notes
MFCCA CN AliMeeting、AISHELL-4、Simudata (917hours) 4950 45M Offline Duration of input wav <= 20s, channel of input wav <= 8 channel

Voice Activity Detection

Model Name Training Data Parameters Sampling Rate Notes
FSMN-VAD Alibaba Speech Data (5000hours) 0.4M 16000
FSMN-VAD Alibaba Speech Data (5000hours) 0.4M 8000

Punctuation Restoration

Model Name Language Training Data Parameters Vocab Size Offline/Online Notes
CT-Transformer-Large CN & EN Alibaba Text Data(100M) 1.1G 471067 Offline large offline punctuation model
CT-Transformer CN & EN Alibaba Text Data(70M) 291M 272727 Offline offline punctuation model
CT-Transformer-Realtime CN & EN Alibaba Text Data(70M) 288M 272727 Online online punctuation model

Language Models

Model Name Training Data Parameters Vocab Size Notes
Transformer Alibaba Speech Data (?hours) 57M 8404

Speaker Verification

Model Name Training Data Parameters Number Speaker Notes
Xvector CNCeleb (1,200 hours) 17.5M 3465 Xvector, speaker verification, Chinese
Xvector CallHome (60 hours) 61M 6135 Xvector, speaker verification, English

Speaker Diarization

Model Name Training Data Parameters Notes
SOND AliMeeting (120 hours) 40.5M Speaker diarization, profiles and records, Chinese
SOND CallHome (60 hours) 12M Speaker diarization, profiles and records, English

Timestamp Prediction

Model Name Language Training Data Parameters Notes
TP-Aligner CN Alibaba Speech Data (50000hours) 37.8M Timestamp prediction, Mandarin, middle size

Inverse Text Normalization (ITN)

Model Name Language Parameters Notes
English EN 1.54M ITN, ASR post-processing
Russian RU 17.79M ITN, ASR post-processing
Japanese JA 6.8M ITN, ASR post-processing
Korean KO 1.28M ITN, ASR post-processing
Indonesian ID 2.06M ITN, ASR post-processing
Vietnamese VI 0.92M ITN, ASR post-processing
Tagalog TL 0.65M ITN, ASR post-processing
Spanish ES 1.32M ITN, ASR post-processing
Portuguese PT 1.28M ITN, ASR post-processing
French FR 4.39M ITN, ASR post-processing
German GE 3.95M ITN, ASR post-processing