GitHub - bytedance/neurst: Neural end-to-end Speech Translation Toolkit

The primary motivation of NeurST is to facilitate NLP researchers to get started on end-to-end speech translation (ST) and build advanced neural machine translation (NMT) models.

See here for a full list of NeurST examples. And we present recent progress of end-to-end ST technology at https://st-benchmark.github.io/.

NeurST is based on TensorFlow2 and we are working on the pytorch version.

NeurST News

March 29, 2022: Release of GigaST dataset: a large-scale speech translation corpus.

Aug 16, 2021: Release of models and results for IWSLT 2021 offline ST and simultaneous translation task.

June 15, 2021: Integration of LightSeq for training speedup, see the experimental branch.

March 28, 2021: The v0.1.1 release includes the instructions of weight pruning and quantization aware training for transformer models, and several more features. See the release note for more details.

Dec. 25, 2020: The v0.1.0 release includes the overall design of the code structure and recipes for training end-to-end ST models. See the release note for more details.

Highlights

Production ready: The model trained by NeurST can be directly exported as TF savedmodel format and use TensorFlow-serving. There is no gap between the research model and production model. Additionally, one can use LightSeq for NeurST model serving with a much lower latency.
Light weight: NeurST is designed specifically for end-to-end ST and NMT models, with clean and simple code. It has no dependency on Kaldi, which simplifies installation and usage.
Extensibility and scalability: NeurST has the careful design for extensibility and scalability. It allows users to customize Model, Task, Dataset etc. and combine each other.
High computation efficiency: NeurST has high computation efficiency and can be further optimized by enabling mixed-precision and XLA. Fast distributed training using Byteps / Horovod is also supported for large-scale scenarios.
Reliable and reproducible benchmarks: NeurST reports strong baselines with well-designed hyper-parameters on several benchmark datasets (MT&ST). It provides a series of recipes to reproduce them.

Pretrained Models & Performance Benchmarks

NeurST provides reference implementations of various models and benchmarks. Please see examples for model links and NeurST benchmark on different datasets.

Text Translation
- Transformer on WMT14 en->de
Speech-to-Text Translation
- libri-trans
- MuST-C

Requirements and Installation

Python version >= 3.6
TensorFlow >= 2.3.0

Install NeurST from source:

git clone https://github.com/bytedance/neurst.git
cd neurst/
pip3 install -e .

If there exists ImportError during running, manually install the required packages at that time.

Citation

@InProceedings{zhao2021neurst,
  author       = {Chengqi Zhao and Mingxuan Wang and Qianqian Dong and Rong Ye and Lei Li},
  booktitle    = {the 59th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations},
  title        = {{NeurST}: Neural Speech Translation Toolkit},
  year         = {2021},
  month        = aug,
}

Contact

Any questions or suggestions, please feel free to contact us: zhaochengqi.d@bytedance.com, wangmingxuan.89@bytedance.com.

Acknowledgement

We thank Bairen Yi, Zherui Liu, Yulu Jia, Yibo Zhu, Jiaze Chen, Jiangtao Feng, Zewei Sun for their kind help.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
datasets/GigaST		datasets/GigaST
examples		examples
neurst		neurst
neurst_pt		neurst_pt
tests		tests
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.apt.txt		requirements.apt.txt
requirements.txt		requirements.txt
run.sh		run.sh
run_cli.sh		run_cli.sh
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeurST News

Highlights

Pretrained Models & Performance Benchmarks

Requirements and Installation

Citation

Contact

Acknowledgement

About

Releases 2

Packages

Contributors 5

Languages

License

bytedance/neurst

Folders and files

Latest commit

History

Repository files navigation

NeurST News

Highlights

Pretrained Models & Performance Benchmarks

Requirements and Installation

Citation

Contact

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 5

Languages

Packages