Skip to content

Latest commit

 

History

History

week04

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Week 04

Practice & homework

  • Seminar:

Additional Materials

  • General:

    • depthwise separable convolution explanation with a beautiful visualization here in case of conv2d
    • comparing end-to-end speech recognition architectures in 2021 a blogpost comparing CTC, LAS and RNN-t models
    • whisper paper - a large-scale ASR model with a sophisticated attention-based decoder trained on 680k hours of weakly supervised multilingual and multitask data from openai, released in 2022
  • LAS:

    • original LAS model paper
    • brief overwiev of the LAS paper from medium
    • see whisper in general section
  • RNN-t:

  • RNN-t optimizations:

    • fast conformer paper - a fast conv2d subsampling with depthwise separable convolutions, 8x time reduction and smaller kernel sizes for convolutions
    • multi-blank transducers [paper](Multi-blank Transducers for Speech Recognition) - add a big blank token in the dictionary and predict it while there is a big pause then we will save computation time
    • token-and-duration transducer paper - predict not blank or big blank tokens, but predict all tokens and its duration (nvidia using this tuned decoder in the biggest model - Parakeet-TDT 1.1B)
    • RNN-t with stateless prediction network paper - replace lstm embeddings with embeddings from a simple lookup table (e.g. torch.nn.Embeddings)
    • more about prediction network architectures here