Digital Audio
introduction to theory

  1. Sound
    1. What is sound?
    2. How do we represent it?
    3. Converting sound energy
  2. Recording
    1. Analog methods
    2. Analog media
    3. Digital methods
    4. Digital media


What is sound?

Physically, sound is vibration of some medium. The word is also used to desribe the sensation of this vibration when received by the ear.

Sound is created when some object vibrates. Consider a guitar string that has been plucked. The string is stretched in one direction and then the elasticity of the string forces it back to it's original straight position. The momentum of the string carries it past the original position in the opposite direction. This back and forth motion continues until the energy has dissapated. As the string moves, it pushes air molecules in front of it and compresses them together, creating a high pressure area. Also, air molecules behind the string are drawn into the space vacated by the string, creating a low pressure area.

Air itself is elastic. The high pressure area pushes the molecules next to it and this sends a wave of compression outward from the string. As the string reverses direction, a low pressure area is sent out following the high pressure area. This flow of high and low pressure areas continues to move away from the vibrating string at a high velocity, spreading out in all directions. When these sound waves reach an object, that object is also forced to vibrate in a pattern closely resembling the vibration of the string that originally created the sound. Thus, the sound is transmitted from the source to the listener's ear.

How do we represent it?

Sound can be represented as a graph of the air pressure created by the vibrating object over time. By convention, high pressure is represented by positive numbers (above the center line) and low pressure by negative numbers. The center line itself represents normal air pressure with no sound

An object vibrating more rapidly would have the waves shorter and closer together. Slower vibration would result in longer waves spaced farther apart. This change in vibration speed is percieved as pitch; faster vibrations are higher pitches and slower vibrations are lower pitches.

An object that vibrates more forcefully will produce more pressure and will result in waves that are "taller" on the graph. This is perceived as loudness.

Converting sound energy

Energy can be converted from one form to another. Electricity can be converted to light, chemical energy can be converted to heat, and so forth. Sound waves are energy and they can be converted to different forms as well.

Consider a thin membrane attached to a coil of wire suspended in a magnetic field. When sound waves make contact with this membrane, it will vibrate. This vibration moves the coil of wire back and forth through the magnetic field and this produces a movement of electrons in the wire. This movement is electricity and the pressure (voltage) of the electricity will be proportional to the pressure of the sound wave. Such a device is called a microphone and it is commonly used to pick up sound waves and convert them to electrical energy.

A similar device can be used to convert this electrical energy back into sound by having the electricity flow through another coil and making this coil move in another magnetic field. The coil is attached to a membrane that will vibrate against the air and set up sound waves similar to the original sound. This device is called a loudspeaker.

Typically, the electrical energy put out by a microphone is insufficient to move a loudspeaker enough to be heard, so an aditional device is used to amplify the level of the signal. These three devices (microphone, amplifier, and loudspeaker) can be used to make a quiet sound loud enough to be heard over a large room, or to carry sound to distant locations.


It is often desired to preserve sound and recreate it later. Processes for recording sound waves for later playback were developed to accomplish this.

Analog methods

Early methods for preserving sound were analog. This means that some pattern was created by the sound that contained a form similar to the sound wave. The electrical wave form from the microphone is used to vibrate a cutting device or create a magnetic pattern. The goal was to create a recording of the original sound in some medium that follow a pattern analagous to the original sound wave.

Analog media

The earliest device used for recording sound was the phonograph. This device created a groove in the medium that had a shape modulated by the sound wave. Phonograph records are played back by having a needle follow the groove. The needle will vibrate in the same pattern that was used to cut the groove, and this vibration could be amplified and output through loudspeakers. Another common analog recording device is the tape recorder. A thin strip of plastic (tape) coated with a magnetic material is passed by an electromagnet that is modulated by the sound wave. This creates magnetic patterns on the tape that may be reproduced by reversing the process; the tape is drawn past a coil and the changing magnetic patters induce an electric current, which is then amplified. These recording techniques have several problems. At each step, sound to microphone, microphone to electricity, electricity to magnetism or groove, and then back to sound afterwards, errors can accumulate. The microphone diaphragm may not vibrate in exactly the same pattern as the sound wave. There may be outside interference in the cables. But the majority of the problems are in the recording medium itself. If the groove of the record is cut too slowly, then there is not enough room to accurately represent the detail of the higher frequencies. If the groove is cut too fast, then noise from the record rubbing against the needle becomes apparent. There may be spots in the plastic record that are malformed. Dust can accumulate and cause a hissing noise. Similar problems also exist for magnetic tape. Even in the best possible circumstances, the quality of the sound degrades with each step since the physical media used to preserve it contains flaws and imperfections. If the recording is copied to new media (such as for editing or reproduction/marketing) then these flaws accumulate.

Digital methods

Since most of the problems with recording sound accurately are due to the medium used for analog recording, methods were sought to prevent these problems. The single largest problem with analog recording is that the information being recorded must be represented as an analog to the original sound wave. What is needed is a different way to represent the sound; a way that doesn't suffer from the flaws of the recording media.

With the advent of the computer age, it became quite easy to represent waveform information as a series of numbers rather than as a analogous pattern. The voltage level of the wave form could be measured, and "samples" taken every so often. These measurements were numerical (digital) and these numbers could be converted to pulses that could be more reliably recorded than analog waveforms. To play back the digitally recorded sound, the numbers are read back from the recording medium and the voltage of an electrical signal is varied in precisely the same way as the original signal.

The numbers representing the strength of the waveform are set up on a scale from -32767 to 32767. This gives a fine enough gradation that listeners can't tell the difference between digitally recorded sound and analog recordings. This range of numbers can be represented in binary (base 2) with 16 bits (a bit is 0 or 1, off or on). Since a bit is either on or off, it is much more reliable to read it from a tape than an analog signal. Even a large amount of noise or imperfections on the medium won't interfere with distinguishing between a 1 or a 0. This avoids the single biggest source of poor quality that had been present with analog recording.

Since sound waves vibrate rapidly, the waveform must also be sampled very rapidly. The more often the waveform is sampled, the closer the reproduction will be to the original. Of course, as the waveform is sampled more often, more data must be stored. A sample rate must be chosen that is fast enough to accurately represent the sound without resulting in more data than necessary. Experimentation determined that sampling just over twice the rate of the highest frequency to be reproduced is sufficient. Humans can hear a maximum frequency of 20,000 cycles per second (20,000 Hertz). A standard sample rate of 44,100 Hertz was chosen.

Digital media