Home

Overview

wav2letter++ is a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition. It is written entirely in C++ and uses the ArrayFire tensor library and the flashlight machine learning library for maximum efficiency. Our approach is detailed in this arXiv paper.

To start with wav2letter please have a look at wiki:

To get started with wav2letter++, checkout the tutorials section.

We also provide complete recipes for WSJ, Timit and Librispeech for previously published papers: data preparation step and models training/decoding recipes along with the pre-trained models. All of them can be found in recipes folder.

Finally, we provide Python bindings for a subset of wav2letter++ (featurization, beam search decoder, and ASG criterion).

Citation

If you use the code in your paper, then please cite it as:

@article{pratap2018w2l,
  author          = {Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert},
  title           = {wav2letter++: The Fastest Open-source Speech Recognition System},
  journal         = {CoRR},
  volume          = {abs/1812.07625},
  year            = {2018},
  url             = {https://arxiv.org/abs/1812.07625},
}