Releases: lhotse-speech/lhotse
Releases · lhotse-speech/lhotse
0.2 - Towards the Base Camp
New features:
K2SpeechRecognitionIterableDataset
that supports more efficient batching #116- Support for
torchaudio.sox_effects
data augmentation alongsideWavAugment
#124
Breaking changes:
- the data augmentation APIs in Lhotse expect
augment_fn
argument instead ofaugmenter
, that has a signature like:def augment_fn(samples: np.ndarray, sampling_rate: int) -> np.ndarray
#124
New corpora:
- Mobvoi Hotwords #109
Enhancements:
- progress bars for corpus downloads and feature extraction #131
- re-using cached LibriSpeech manifests for faster data preparation #133
LilcomFilesWriter
andNumpyFilesWriter
use sub-directories for storage to reduce the filesystem load #134
Several bug fixes and improved testing.
0.1 - First Steps
”The journey of a thousand miles begins with one step.” – Lao Tzu
The first official release of Lhotse! It provides a solid base to build speech research and applications upon, by treating speech and audio data as a first-class citizen in the ML world.
Lhotse is going to continue to evolve, and some API changes might still happen.
Highlights:
- audio-specific data model with Recording, Supervision, Features, and Cut manifests
- integration with PyTorch for task-specific Dataset classes and Torchaudio for feature extraction
- built-in data preparation for 8 speech corpora, including Librispeech, Switchboard, AMI, and TED-LIUM v3
- intuitive interfaces that work well with interactive environments such as Jupyter notebooks for data visualisation
- on-the-fly or pre-computed feature extraction and data augmentation