Skip to content

Commit

Permalink
feat: add readme to pyproject
Browse files Browse the repository at this point in the history
  • Loading branch information
KevinEloff committed Sep 27, 2023
1 parent 180d1b3 commit 2f9d403
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 46 deletions.
92 changes: 46 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,36 +29,36 @@ The steps to perform ASR inference with the trained checkpoint is simple:

1. **Instal pip dependancies**: ensure `torch`, `torchaudio`, `numpy`, `omegaconf`, `fairseq`, `fastprogress`, `jiwer`, and `pandas` are installed (for full training dependencies see `requirements.txt`). Make sure you are using **python 3.10 or above**, this repo uses certain new features of python 3.10.
2. **Load models**: load the trained TransFusion model and frozen WavLM encoder:
```python
import torch
import torchaudio

device = 'cpu' # or 'cuda' if you have enough GPU memory.
wavlm = torch.hub.load('RF5/transfusion-asr', 'wavlm_large', device=device)
transfusion = torch.hub.load('RF5/transfusion-asr', 'transfusion_small_462k', device=device)
```
```python
import torch
import torchaudio

device = 'cpu' # or 'cuda' if you have enough GPU memory.
wavlm = torch.hub.load('RF5/transfusion-asr', 'wavlm_large', device=device)
transfusion = torch.hub.load('RF5/transfusion-asr', 'transfusion_small_462k', device=device)
```
3. **Compute WavLM features**: load a 16kHz waveform and compute the WavLM features:

```python
path = '<path to arbitrary 16kHz waveform>.wav'
x, sr = torchaudio.load(pth)
assert sr == 16000
# get weighted WavLM features:
features = wavlm.extract_transfusion_features(x.to(device), wavlm) # (seq_len, dim)
```
```python
path = '<path to arbitrary 16kHz waveform>.wav'
x, sr = torchaudio.load(pth)
assert sr == 16000
# get weighted WavLM features:
features = wavlm.extract_transfusion_features(x.to(device), wavlm) # (seq_len, dim)
```
4. **Predict transcript**: Perform multinomial diffusion using all the additional techniques from the paper:

```python
pred_inds, pred_text = transfusion.perform_simple_inference(
transfusion, # pass in model to use in diffusion
features[None], # add batch dimension to features
transfusion.diffuser, # diffuser containing diffusion parameters
transfusion.vocab, # vocab for converting indices to text / text to indices
transfusion.cfg # model/diffusion config dict
)
print(pred_text)
# prints out the predicted transcript of your utterance!
```
```python
pred_inds, pred_text = transfusion.perform_simple_inference(
transfusion, # pass in model to use in diffusion
features[None], # add batch dimension to features
transfusion.diffuser, # diffuser containing diffusion parameters
transfusion.vocab, # vocab for converting indices to text / text to indices
transfusion.cfg # model/diffusion config dict
)
print(pred_text)
# prints out the predicted transcript of your utterance!
```

That's it, trivial!
You can modify the diffusion parameters using the `DSH` class in `transfusion/score.py` and in the diffuser config. By default it uses the optimal settings found in the paper.
Expand Down Expand Up @@ -89,33 +89,33 @@ Before training, one needs to prepare the data. The steps to do that for the Lib

2. Then extract the WavLM features with the `extract.py` script:

```
usage: python -m wavlm.extract [--librispeech_path PATH/TO/LIBRESPEECH] [--ckpt_path PATH/TO/WAVLM_LARGE_CKPT] [--out_path PATH/TO/FEAT]
```
usage: python -m wavlm.extract [--librispeech_path PATH/TO/LIBRESPEECH] [--ckpt_path PATH/TO/WAVLM_LARGE_CKPT] [--out_path PATH/TO/FEAT]
required arguments:
--librispeech_path root path of librispeech dataset
--out_path target directory to save WavLM features into
--ckpt_path path to pretrained WavLM checkpoint
required arguments:
--librispeech_path root path of librispeech dataset
--out_path target directory to save WavLM features into
--ckpt_path path to pretrained WavLM checkpoint
optional arguments:
--seed
--device
```
optional arguments:
--seed
--device
```

3. Split data into train, validation, and test splits using `split_data.py` script:

```
usage: split_data.py --librispeech_path LIBRISPEECH_PATH --ls_wavlm_path LS_WAVLM_PATH [--include_test]
```
usage: split_data.py --librispeech_path LIBRISPEECH_PATH --ls_wavlm_path LS_WAVLM_PATH [--include_test]
Generate train & valid csvs from dataset directories
Generate train & valid csvs from dataset directories
options:
--librispeech_path LIBRISPEECH_PATH
path to root of librispeech dataset
--ls_wavlm_path LS_WAVLM_PATH
path to root of WavLM features extracted using extract.py
--include_test include processing and saving test.csv for test subsets
```
options:
--librispeech_path LIBRISPEECH_PATH
path to root of librispeech dataset
--ls_wavlm_path LS_WAVLM_PATH
path to root of WavLM features extracted using extract.py
--include_test include processing and saving test.csv for test subsets
```

Running this will save the train/valid/test csv files and a vocabulary dict as `vocab.pt` into a `./splits/` folder.

Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
[project]
name = "transfusion"
version = "0.1.0"
readme = "README.md"
description = "TransFusion: Transcribing Speech with Multinomial Diffusion"

[build-system]
requires = ["setuptools"]
Expand Down

0 comments on commit 2f9d403

Please sign in to comment.