
Fig 1: VQ-VAE model architecture.
All audio samples are generated using the scripts and pretrained weights at https://github.com/bshall/ZeroSpeech.
English samples
Speaker - V001
| V001 | other conversions | |||||
|---|---|---|---|---|---|---|
| source | converted | target | S040 | S056 | S074 | S090 |
Speaker - V002
| V002 | other conversions | |||||
|---|---|---|---|---|---|---|
| source | converted | target | S040 | S056 | S074 | S090 |
Indonesian samples
Speaker - V001
| V001 | other conversions | |||||
|---|---|---|---|---|---|---|
| source | converted | target | S028 | S110 | S112 | S154 |