Skip to content

Latest commit

 

History

History
38 lines (34 loc) · 1.67 KB

FEATURES.tmp.md

File metadata and controls

38 lines (34 loc) · 1.67 KB

Functionalities

This file is here just as a support for development.

AUDIO

[TODO]:

  • speech to text
    1. to transcribe speech into text

      • INPUT:
        1. a datasets object with the audio recordings in the "audio" column
        2. the audio column (default = "audio")
        3. the speech to text service to use (including the name, the version, the revision, and - for some services only and sometimes it's optional - the language of the transcription model we want to use)
      • PREPROCESSING:
        1. adapt the language to the service format
        2. organize the dataset into batches
      • PROCESSING:
        1. transcribe the dataset
      • POSTPROCESSING:
        1. formatting the transcripts to follow a standard organization
      • OUTPUT:
        1. a new dataset including only the transcripts of the audios in a standardized json format (plus an index?)
      • TESTS:
        1. test input errors (a field is missing, the audio column exists and contains audio objects, params missing)
        2. test the transcript of a test file is ok
        3. test the language is supported (and the tool handles errors)
    2. to compute word error rate

      • INPUT:
        1. a dataset object with the "transcript" and the "groundtruth" columns
        2. a service with a name (default is jitter)
      • PROCESSING:
        1. computing the per-row WER between the 2 columns
      • OUTPUT:
        1. a dataset with the "WER" column
      • TESTS:
        1. test input errors (a field is missing, fields missing, the 2 columns don't contain strings)
        2. test output is ok