Audio Overview

Original Author(s): Lynn Leith

Digital Audio Recording Basics


When an audio signal is digitized it is converted into discrete digital numbers (0's and 1's) that make up the digital file. A "Bit" is a single piece of information assigned a value of 0 or 1.

The most common method for encoding an analog signal into digital data is pulse-code modulation (PCM).

PCM (Pulse Code Modulation) is based on a series of measures ("samples") of the amplitude of the analog signal.


For many organizations involved in the production of DAISY books, the starting point of the digital audio recording process is an analog audio signal coming through a microphone, a tape reproducer or any other analog source. These numbers, zeros and ones, are the actual digital audio file.

Virtually all digital audio systems, including CD, DAT, etc., use PCM.

The PCM encoding process is based on a series of measures - or "samples" - of the amplitude of the analog signal. The rate at which these measurements are taken, and the number of bits used to describe each measurement, are two central concepts in digital audio.

Sampling Rate

Sampling rate is the number of times per second that an analog audio signal is sampled or "measured", at discrete intervals of time, during processing (original recording and A to D conversion).

Sampling frequency is the rate at which an analog signal is sampled into a digital signal consisting of digital samples and is usually measured in hertz, or samples per second.

Higher frequencies imply higher-quality sampling, that is, the higher the sampling frequency, the greater the frequency content of the source material that can be accurately represented in the recording.

8000 Hz Telephone
22050 Hz Better than FM Radio
44100 Hz CD (compact disk)
96000 Hz DVD Audio and Super Audio CD


The amplitude of the analog signal produced with a sampling frequency of 44.1 kHz will be measured at a rate of 44100 times per second. This sampling frequency will accurately represent anything within the spectrum of what humans are able to perceive (ranging from 20 Hz to 20 kHz). This is also the sampling frequency used in audio CD's, and is the standard.

The higher the sampling frequency, the greater the frequency content of the source material that can be accurately represented in the recording. Audio sampled at a rate of 22.05 kHz will contains half the number of "samples" of audio sampled at 44.1 kHz (with an upper range of 10 rather than 20 kHz). However, higher sampling frequencies require more file storage space. A sampling frequency of 44.1 kHz may not be necessary for the recording of spoken human voice, as most of the range of the human voice occurs below 10 kHz. A digital audio file produced at 44.1 kHz will be twice the size of a file produced with a sampling rate of 22.05 kHz, however, the quality of the audio of human narration will not double.

Bit Depth

The number of bits used to describe (assign a value to( each of the measurements taken, is called the resolution or bit depth. With a 16-bit resolution, there are 65536 different value steps that each sample can be assigned to.

Regardless of the bit depth used, the representation is never completely accurate as some of the actual measurement values must be rounded to the nearest of the available value steps. This process is referred to as quantization.

Care must also be taken not to exceed the maximum signal level.

Dynamic Range

The dynamic range (the difference between the highest and the lowest amplitude that can be represented) is 98 dB (decibels) in a 16 bit system. Normally, sound cards use 16 bit resolution, and this is seldom a feature that the user can change. However, some professional level sound cards use 18, 20, 24 or 32 bit resolution.

The difference between the actual measured amplitude and its binary representation is called quantization error. If the quantization error is large, there will be an audible degradation of sonic quality. A harsh sound quality, or noise, will be added to the audio. The effect gets more prominent if the amplitude of the recorded signal is low. Therefore, care should be taken to digitize with the input signal as close as possible to the maximum level.

The input signal level in digital recording must never exceed 0 dBFS. If it does, clipping will occur, resulting in distortion. In this respect, digital recording is much more sensitive than analog recording, and there is much less room for error.

Signal Levels

Within the digital domain 0 dB is the same as digital maximum (0 dBFS). No signal exists above that point. Input at a level greater than 0 dB will result in "clipping" and digital distortion.

  • The peak value of the audio should be no lower than -3 dB
  • A variation between peak and lowest signal level no larger than 12 dB is optimal for a digital talking book.

When audio compression algorithms are used, optimization becomes even more critical.

Signal-to-Noise Ratio (SNR or S/N)

Signal-to-noise ratio describes the ratio of useful information to false or irrelevant information. It is an engineering term for the ratio between the magnitude of a signal (meaningful information) and the magnitude of background noise. Because many signals have a very wide dynamic range, SNRs are often expressed in terms of the logarithmic decibel scale.

If there is a great deal of ambient sound (noise) within the recording environment, the noise floor will be high, and the noise level will be high. Noise in the audio can also be introduced by the recording equipment or system itself.

Signal-to-Noise ratios are closely related to the concept of dynamic range. Where dynamic range measures the ratio between noise and the greatest un-distorted signal on a channel, SNR measures the ratio between noise and an arbitrary signal on the channel.

Commercial tools are available for audio file analysis.

Audio Signal Processing

The use of dynamics processing (limiting, leveling, etc.) may required to achieve uniform and optimized signal levels. However over-processing can result in audio that has an unpleasant and unnatural sound. Moderate use of dynamics processing with a reasonable quality processor can improve significantly improve audio output quality.

DAISYpedia Categories: 

This page was last edited by LLeith on Monday, August 9, 2010 15:06
Text is available under the terms of the DAISY Consortium Intellectual Property Policy, Licensing, and Working Group Process.