DNN-based Speech Enhancement in the frequency domain

You can do DNN-based speech enhancement(SE) in the frequency domain using various method through this repository.
First, you have to make noisy data by mixing clean speech and noise. The dataset is used for deep learning training.
And, you can adjust the type of the network and configuration in various ways, as shown below.
The results of the network can be evaluated through various objective metrics (PESQ, STOI, CSIG, CBAK, COVL).

You can change

Networks
Learning methods
Loss functions

Requirements

This repository is tested on Ubuntu 20.04, and

Python 3.7
Cuda 11.1
CuDNN 8.0.5
Pytorch 1.9.0

Getting Started

Install the necessary libraries

Make a dataset for train and validation

# The shape of the dataset
[data_num, 2 (inputs and targets), sampling_frequency * data_length]   

# For example, if you want to use 1,000 3-second data sets with a sampling frequency of 16k, the shape is,   
[1000, 2, 48000]

Set dataloader.py
```
self.input_path = "DATASET_FILE_PATH"
```

Set config.py

# If you need to adjust any settings, simply change this file.   
# When you run this project for the first time, you need to set the path where the model and logs will be saved.

Run train_interface.py

Tutorials

'SE_tutorials.ipynb' was made for tutorial.
You can simply train the CRN with the colab file without any preparation .

Networks

You can find a list that you can adjust in various ways at config.py, and they are:

Real network
- convolutional recurrent network (CRN)
  it is a real version of DCCRN
- FullSubNet [1]
Complex network
- deep complex convolutional recurrent network (DCCRN) [2]

Learning Methods

T-F masking
Spectral mapping

Loss Functions

MSE
SDR
SI-SNR
SI-SDR

and you can join the loss functions with perceptual loss.

LMS
PMSQE

Tensorboard

As shown below, you can check whether the network is being trained well in real time through 'write_on_tensorboard.py'.

loss
pesq, stoi
spectrogram

Reference

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
[arXiv] [code]
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie
[arXiv] [code]
Other tools
https://github.com/usimarit/semetrics
https://ecs.utdallas.edu/loizou/speech/software.htm

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
estimation		estimation
LICENSE		LICENSE
PESQ.so		PESQ.so
README.md		README.md
SE_tutorials.ipynb		SE_tutorials.ipynb
composite.m		composite.m
config.py		config.py
dataloader.py		dataloader.py
generate_noisy_data.py		generate_noisy_data.py
models.py		models.py
tools_for_estimate.py		tools_for_estimate.py
tools_for_loss.py		tools_for_loss.py
tools_for_model.py		tools_for_model.py
train_interface.py		train_interface.py
trainer.py		trainer.py
write_on_tensorboard.py		write_on_tensorboard.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNN-based Speech Enhancement in the frequency domain

Requirements

Getting Started

Tutorials

Networks

Learning Methods

Loss Functions

Tensorboard

Reference

About

Releases

Packages

Languages

License

seorim0/DNN-based-Speech-Enhancement-in-the-frequency-domain

Folders and files

Latest commit

History

Repository files navigation

DNN-based Speech Enhancement in the frequency domain

Requirements

Getting Started

Tutorials

Networks

Learning Methods

Loss Functions

Tensorboard

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages