You can do DNN-based speech enhancement(SE) in the frequency domain using various method through this repository.
First, you have to make noisy data by mixing clean speech and noise. The dataset is used for deep learning training.
And, you can adjust the type of the network and configuration in various ways, as shown below.
The results of the network can be evaluated through various objective metrics (PESQ, STOI, CSIG, CBAK, COVL).
You can change
This repository is tested on Ubuntu 20.04, and
- Python 3.7
- Cuda 11.1
- CuDNN 8.0.5
- Pytorch 1.9.0
- Install the necessary libraries
- Make a dataset for train and validation
# The shape of the dataset [data_num, 2 (inputs and targets), sampling_frequency * data_length] # For example, if you want to use 1,000 3-second data sets with a sampling frequency of 16k, the shape is, [1000, 2, 48000]
- Set dataloader.py
self.input_path = "DATASET_FILE_PATH"
- Set config.py
# If you need to adjust any settings, simply change this file. # When you run this project for the first time, you need to set the path where the model and logs will be saved.
- Run train_interface.py
'SE_tutorials.ipynb' was made for tutorial.
You can simply train the CRN with the colab file without any preparation .
You can find a list that you can adjust in various ways at config.py, and they are:
- Real network
- convolutional recurrent network (CRN)
it is a real version of DCCRN - FullSubNet [1]
- convolutional recurrent network (CRN)
- Complex network
- deep complex convolutional recurrent network (DCCRN) [2]
- T-F masking
- Spectral mapping
- MSE
- SDR
- SI-SNR
- SI-SDR
and you can join the loss functions with perceptual loss.
- LMS
- PMSQE
As shown below, you can check whether the network is being trained well in real time through 'write_on_tensorboard.py'.
- loss
- pesq, stoi
- spectrogram
FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
[arXiv] [code]
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie
[arXiv] [code]
Other tools
https://github.com/usimarit/semetrics
https://ecs.utdallas.edu/loizou/speech/software.htm