This page contains audio samples from https://github.com/bshall/UniversalVocoding:
a PyTorch implementation "Robust Universal Neural Vocoding", J. Lorenzo-Trueba Jaime et al, (2018).
Notable differences from the paper:
- Trained on 16kHz audio from 102 different speakers (ZeroSpeech 2019: TTS without T English dataset)
- The model generates 9-bit mu-law audio (planning on training a 10-bit model soon)
- Uses an embedding layer instead of one-hot encoding
In-Domain Speakers
Speaker - V001
Speaker - V002
Out-of-Domain Speakers (English)
Speaker - S002
Speaker - S005
Speaker - S030
Out-of-Domain Speakers (Surprise Austronesian Language)