ZedDist codec samples
From zedwiki
Contents |
Introduction
As part of ZedNext, there is a revision of the list of Codecs. Wav, MP3, Speex and AMR-WB+ are on the shortlist for inclusion in the next revision of the Speex. This page allows to compare those different codecs when used to encode the same sample.
Methodology
A few different voices were selected to be part of the review. Those voices try to reproduce the variety found in digital talking books. Each voice is presented in uncompressed format as well as in three different codecs, each with two different encodings. Each compressed samples is presented in native format as well as in a format where they are reconverted back in wav for ease of listening for those who don't have access to a player that supports those formats.
Speex
from Speed.org website; "Speex is an Open Source/Free Software patent-free audio compression format designed for speech. The Speex Project aims to lower the barrier of entry for voice applications by providing a free alternative to expensive proprietary speech codecs. Moreover, Speex is well-adapted to Internet applications and provides useful features that are not present in most other codecs. Finally, Speex is part of the GNU Project and is available under the revised BSD license. "
To code these using Speex, samples were down-sampled to 16 KHz sampling rate. Then samples were encoded at 16.8 and 27.8 kbps, in the "wide-band" mode. Before encoding these, after down-sampling them in Sound Forge, the levels was also raised so that the highest peak is within 1 dB of clipping. A typical command line for running the encoder is:
speexenc %1.wav %1.spx -w --bitrate %2 --comp 10 --qual 10 -V
where bitrate was either 17000 or 28000. Speex encoding and decoding was done with coders and decoders compiled in July, 2009 with Intel SSE extensions, available from the speex.org website.
As a note links for .spx are not working yet and should be available shortly.
AMR-WB+
From Nuance website: "Developed jointly by Ericsson, Nokia and VoiceAge, AMR-WB+ (Extended Adaptive Multi-Rate Wideband) speech and audio codec is a 3GPP-recommended hi-fi audio codec. AMR-WB+ scales to cover the full audio spectrum and uses high-efficiency parametric stereo (HE-PS) to maintain high-fidelity stereo image reproduction at even the lowest bit rates and excellent quality at higher rates."
All samples are encoded with an internal sampling frequency of 25.6 KHz, also known as ISF index 8. The variable used is the frame type (also called mode index). For AMR-WB+ mono modes, this can range from 16 to 23, and determines how many bits per frame are assigned to speech parameters.
The files with "10" in their names are encoded at a nominal rate of 10.4 kbps, using frame-type 16 (the lowest-quality mode). With header overhead, the bitrate for the file is probably near 10.5 kbps. By decreasing the internal sampling frequency, we could cut the bitrate down as low as 5.2 kbps. I have tested 7.8 kbps enough to say that it could work fairly well for some situations where bandwidth is at a premium.
Samples with "24" in their filenames are encoded at 24 kbps nominal rate, using frame-type 23 (the highest-quality mode). This is the encoder setting which has been used by NLS for the past 3 years. The bitrate could be increased to 36 kbps by raising the internal sampling frequency to 37.8 KHz but is not presented here as part of the samples.
Female Voices
Allison
Female voice with some amplitude limiting.
Native format
- Original uncompressed Wav Allison.wav
- Speex at 17 kpbs Allison17.spx
- Speex at 28 kpbs Allison28.spx
- AMR-WB+ at 10.4 kpbs Allison10.3gp
- AMR-WB+ at 24 kpbs Allison24.3gp
- mp3 at 32 kbps Allison32.mp3
- mp3 at 64 kpbs Allison64.mp3
Reencoded to Wav
- Original uncompressed Wav Allison.wav
- Speex at 17 kpbs Allison spx17.wav
- Speex at 28 kpbs Allison spx28.wav
- AMR-WB+ at 10.4 kpbs Allison amr10.wav
- AMR-WB+ at 24 kpbs Allison amr24.wav
- mp3 at 32 kbps Allison mp3 32.wav
- mp3 at 64 kpbs Allison mp3 64.wav
Gianarrelli
Female voice with wide pitch range
Native format
- Original uncompressed Wav Gianarrelli.wav
- Speex at 17 kpbs Gianarrelli17.spx
- Speex at 28 kpbs Gianarrelli28.spx
- AMR-WB+ at 10.4 kpbs Gianarrelli10.3gp
- AMR-WB+ at 24 kpbs Gianarrelli24.3gp
- mp3 at 32 kbps Gianarrelli32.mp3
- mp3 at 64 kpbs Gianarrelli64.mp3
Reencoded to Wav
- Original uncompressed Wav Gianarrelli.wav
- Speex at 17 kpbs Gianarrelli spx17.wav
- Speex at 28 kpbs Gianarrelli spx28.wav
- AMR-WB+ at 10.4 kpbs Gianarrelli amr10.wav
- AMR-WB+ at 24 kpbs Gianarrelli amr24.wav
- mp3 at 32 kbps Gianarrelli mp3 32.wav
- mp3 at 64 kpbs Gianarrelli mp3 64.wav
Med
Female voice typical for high S sounds
Native format
- Original uncompressed Wav Med.wav
- Speex at 17 kpbs Med17.spx
- Speex at 28 kpbs Med28.spx
- AMR-WB+ at 10.4 kpbs Med10.3gp
- AMR-WB+ at 24 kpbs Med24.3gp
- mp3 at 32 kbps Med32.mp3
- mp3 at 64 kpbs Med64.mp3
Reencoded to Wav
- Original uncompressed Wav Med.wav
- Speex at 17 kpbs Med_spx17.wav
- Speex at 28 kpbs Med_spx28.wav
- AMR-WB+ at 10.4 kpbs Med_amr10.wav
- AMR-WB+ at 24 kpbs Med_amr24.wav
- mp3 at 32 kbps Med_mp3 32.wav
- mp3 at 64 kpbs Med_mp3 64.wav
Male Voices
Childs
Really low-pitched male voice.
Native format
- Original uncompressed Wav Childs.wav
- Speex at 17 kpbs Childs17.spx
- Speex at 28 kpbs Childs28.spx
- AMR-WB+ at 10.4 kpbs Childs10.3gp
- AMR-WB+ at 24 kpbs Childs24.3gp
- mp3 at 32 kbps Childs32.mp3
- mp3 at 64 kpbs Childs64.mp3
Reencoded to Wav
- Original uncompressed Wav Childs.wav
- Speex at 17 kpbs Childs spx17.wav
- Speex at 28 kpbs Childs spx28.wav
- AMR-WB+ at 10.4 kpbs Childs amr10.wav
- AMR-WB+ at 24 kpbs Childs amr24.wav
- mp3 at 32 kbps Childs mp3 32.wav
- mp3 at 64 kpbs Childs mp3 64.wav
Gil
Male with higher pitch and gravelly voice
Native format
- Original uncompressed Wav Gil.wav
- Speex at 17 kpbs Gil17.spx
- Speex at 28 kpbs Gil28.spx
- AMR-WB+ at 10.4 kpbs Gil10.3gp
- AMR-WB+ at 24 kpbs Gil24.3gp
- mp3 at 32 kbps Gil32.mp3
- mp3 at 64 kpbs Gil64.mp3
Reencoded to Wav
- Original uncompressed Wav Gil.wav
- Speex at 17 kpbs Gil_spx17.wav
- Speex at 28 kpbs Gil_spx28.wav
- AMR-WB+ at 10.4 kpbs Gil_amr10.wav
- AMR-WB+ at 24 kpbs Gil_amr24.wav
- mp3 at 32 kbps Gil_mp3_32.wav
- mp3 at 64 kpbs Gil_mp3_64.wav
Hagen
Male voice with rapid vowel changes that could be heard in some coders
Native format
- Original uncompressed Wav Hagen.wav
- Speex at 17 kpbs Hagen17.spx
- Speex at 28 kpbs Hagen28.spx
- AMR-WB+ at 10.4 kpbs Hagen10.3gp
- AMR-WB+ at 24 kpbs Hagen24.3gp
- mp3 at 32 kbps Hagen32.mp3
- mp3 at 64 kpbs Hagen64.mp3
Reencoded to Wav
- Original uncompressed Wav Hagen_.wav
- Speex at 17 kpbs Hagen_spx17.wav
- Speex at 28 kpbs Hagen_spx28.wav
- AMR-WB+ at 10.4 kpbs Hagen_amr10.wav
- AMR-WB+ at 24 kpbs Hagen_amr24.wav
- mp3 at 32 kbps Hagen_mp3 32.wav
- mp3 at 64 kpbs Hagen_mp3 64.wav
Archive format
You can download all files in a zip as well as all the files in native format sorted by folder for playback in a DTB player aware of those formats.
