ZedDist codec samples

From zedwiki

Jump to: navigation, search

Contents

Introduction

As part of ZedNext, there is a revision of the list of Codecs. Wav, MP3, Speex and AMR-WB+ are on the shortlist for inclusion in the next revision of the Speex. This page allows to compare those different codecs when used to encode the same sample.

Methodology

A few different voices were selected to be part of the review. Those voices try to reproduce the variety found in digital talking books. Each voice is presented in uncompressed format as well as in three different codecs, each with two different encodings. Each compressed samples is presented in native format as well as in a format where they are reconverted back in wav for ease of listening for those who don't have access to a player that supports those formats.

Speex

from Speed.org website; "Speex is an Open Source/Free Software patent-free audio compression format designed for speech. The Speex Project aims to lower the barrier of entry for voice applications by providing a free alternative to expensive proprietary speech codecs. Moreover, Speex is well-adapted to Internet applications and provides useful features that are not present in most other codecs. Finally, Speex is part of the GNU Project and is available under the revised BSD license. "

To code these using Speex, samples were down-sampled to 16 KHz sampling rate. Then samples were encoded at 16.8 and 27.8 kbps, in the "wide-band" mode. Before encoding these, after down-sampling them in Sound Forge, the levels was also raised so that the highest peak is within 1 dB of clipping. A typical command line for running the encoder is:

 speexenc %1.wav %1.spx -w --bitrate %2 --comp 10 --qual 10 -V

where bitrate was either 17000 or 28000. Speex encoding and decoding was done with coders and decoders compiled in July, 2009 with Intel SSE extensions, available from the speex.org website.

As a note links for .spx are not working yet and should be available shortly.

AMR-WB+

From Nuance website: "Developed jointly by Ericsson, Nokia and VoiceAge, AMR-WB+ (Extended Adaptive Multi-Rate Wideband) speech and audio codec is a 3GPP-recommended hi-fi audio codec. AMR-WB+ scales to cover the full audio spectrum and uses high-efficiency parametric stereo (HE-PS) to maintain high-fidelity stereo image reproduction at even the lowest bit rates and excellent quality at higher rates."

All samples are encoded with an internal sampling frequency of 25.6 KHz, also known as ISF index 8. The variable used is the frame type (also called mode index). For AMR-WB+ mono modes, this can range from 16 to 23, and determines how many bits per frame are assigned to speech parameters.

The files with "10" in their names are encoded at a nominal rate of 10.4 kbps, using frame-type 16 (the lowest-quality mode). With header overhead, the bitrate for the file is probably near 10.5 kbps. By decreasing the internal sampling frequency, we could cut the bitrate down as low as 5.2 kbps. I have tested 7.8 kbps enough to say that it could work fairly well for some situations where bandwidth is at a premium.

Samples with "24" in their filenames are encoded at 24 kbps nominal rate, using frame-type 23 (the highest-quality mode). This is the encoder setting which has been used by NLS for the past 3 years. The bitrate could be increased to 36 kbps by raising the internal sampling frequency to 37.8 KHz but is not presented here as part of the samples.

Female Voices

Allison

Female voice with some amplitude limiting.

Native format

Reencoded to Wav

Gianarrelli

Female voice with wide pitch range

Native format

Reencoded to Wav

Med

Female voice typical for high S sounds

Native format

Reencoded to Wav

Male Voices

Childs

Really low-pitched male voice.

Native format

Reencoded to Wav

Gil

Male with higher pitch and gravelly voice

Native format

Reencoded to Wav

Hagen

Male voice with rapid vowel changes that could be heard in some coders

Native format

Reencoded to Wav

Archive format

You can download all files in a zip as well as all the files in native format sorted by folder for playback in a DTB player aware of those formats.

Personal tools