Sound is created when some object vibrates. Consider a guitar string that has been plucked. The string is stretched in one direction and then the elasticity of the string forces it back to it's original straight position. The momentum of the string carries it past the original position in the opposite direction. This back and forth motion continues until the energy has dissapated. As the string moves, it pushes air molecules in front of it and compresses them together, creating a high pressure area. Also, air molecules behind the string are drawn into the space vacated by the string, creating a low pressure area.
Air itself is elastic. The high pressure area pushes the molecules next to
it and this sends a wave of compression outward from the string. As the
string reverses direction, a low pressure area is sent out following the
high pressure area. This flow of high and low pressure areas continues to
move away from the vibrating string at a high velocity, spreading out in all
directions. When these sound waves reach an object, that object is also
forced to vibrate in a pattern closely resembling the vibration of the
string that originally created the sound. Thus, the sound is transmitted from
the source to the listener's ear.
How do we represent it?
Sound can be represented as a graph of the air pressure created by the
vibrating object over time. By convention, high pressure is represented by
positive numbers (above the center line) and low pressure by negative
numbers. The center line itself represents normal air pressure with no
sound
An object vibrating more rapidly would have the waves shorter and closer together. Slower vibration would result in longer waves spaced farther apart. This change in vibration speed is percieved as pitch; faster vibrations are higher pitches and slower vibrations are lower pitches.
An object that vibrates more forcefully will produce more pressure and will
result in waves that are "taller" on the graph. This is perceived as
loudness.
Converting sound energy
Energy can be converted from one form to another. Electricity can be
converted to light, chemical energy can be converted to heat, and so forth.
Sound waves are energy and they can be converted to different forms as well.
Consider a thin membrane attached to a coil of wire suspended in a magnetic field. When sound waves make contact with this membrane, it will vibrate. This vibration moves the coil of wire back and forth through the magnetic field and this produces a movement of electrons in the wire. This movement is electricity and the pressure (voltage) of the electricity will be proportional to the pressure of the sound wave. Such a device is called a microphone and it is commonly used to pick up sound waves and convert them to electrical energy.
A similar device can be used to convert this electrical energy back into sound by having the electricity flow through another coil and making this coil move in another magnetic field. The coil is attached to a membrane that will vibrate against the air and set up sound waves similar to the original sound. This device is called a loudspeaker.
Typically, the electrical energy put out by a microphone is insufficient to
move a loudspeaker enough to be heard, so an aditional device is used to
amplify the level of the signal. These three devices (microphone, amplifier,
and loudspeaker) can be used to make a quiet sound loud enough to be heard
over a large room, or to carry sound to distant locations.
Recording
It is often desired to preserve sound and recreate it later. Processes for
recording sound waves for later playback were developed to accomplish this.
Analog methods
Early methods for preserving sound were analog. This means that some pattern
was created by the sound that contained a form similar to the sound wave.
The electrical wave form from the microphone is used to vibrate a cutting
device or create a magnetic pattern. The goal was to create a recording of
the original sound in some medium that follow a pattern analagous to the
original sound wave.
Analog media
The earliest device used for recording sound was the phonograph. This device
created a groove in the medium that had a shape modulated by the sound wave.
Phonograph records are played back by having a needle follow the groove. The
needle will vibrate in the same pattern that was used to cut the groove, and
this vibration could be amplified and output through loudspeakers.
Another common analog recording device is the tape recorder. A thin strip of
plastic (tape) coated with a magnetic material is passed by an electromagnet
that is modulated by the sound wave. This creates magnetic patterns on the
tape that may be reproduced by reversing the process; the tape is drawn past
a coil and the changing magnetic patters induce an electric current, which is
then amplified.
These recording techniques have several problems. At each step, sound to
microphone, microphone to electricity, electricity to magnetism or groove,
and then back to sound afterwards, errors can accumulate. The microphone
diaphragm may not vibrate in exactly the same pattern as the sound wave.
There may be outside interference in the cables. But the majority of the
problems are in the recording medium itself. If the groove of the record is
cut too slowly, then there is not enough room to accurately represent the
detail of the higher frequencies. If the groove is cut too fast, then noise
from the record rubbing against the needle becomes apparent. There may be
spots in the plastic record that are malformed. Dust can accumulate and cause
a hissing noise. Similar problems also exist for magnetic tape.
Even in the best possible circumstances, the quality of the sound degrades
with each step since the physical media used to preserve it contains flaws
and imperfections. If the recording is copied to new media (such as for
editing or reproduction/marketing) then these flaws accumulate.
Digital methods
Since most of the problems with recording sound accurately are due to the
medium used for analog recording, methods were sought to prevent these
problems. The single largest problem with analog recording is that the
information being recorded must be represented as an analog to the original
sound wave. What is needed is a different way to represent the sound; a way
that doesn't suffer from the flaws of the recording media.
With the advent of the computer age, it became quite easy to represent waveform information as a series of numbers rather than as a analogous pattern. The voltage level of the wave form could be measured, and "samples" taken every so often. These measurements were numerical (digital) and these numbers could be converted to pulses that could be more reliably recorded than analog waveforms. To play back the digitally recorded sound, the numbers are read back from the recording medium and the voltage of an electrical signal is varied in precisely the same way as the original signal.
The numbers representing the strength of the waveform are set up on a scale from -32767 to 32767. This gives a fine enough gradation that listeners can't tell the difference between digitally recorded sound and analog recordings. This range of numbers can be represented in binary (base 2) with 16 bits (a bit is 0 or 1, off or on). Since a bit is either on or off, it is much more reliable to read it from a tape than an analog signal. Even a large amount of noise or imperfections on the medium won't interfere with distinguishing between a 1 or a 0. This avoids the single biggest source of poor quality that had been present with analog recording.
Since sound waves vibrate rapidly, the waveform must also be sampled very
rapidly. The more often the waveform is sampled, the closer the reproduction
will be to the original. Of course, as the waveform is sampled more often,
more data must be stored. A sample rate must be chosen that is fast enough to
accurately represent the sound without resulting in more data than necessary.
Experimentation determined that sampling just over twice the rate of the
highest frequency to be reproduced is sufficient. Humans can hear a maximum
frequency of 20,000 cycles per second (20,000 Hertz). A standard sample rate
of 44,100 Hertz was chosen.
Digital media