MP3s in the Car: Understanding Digital Audio Formats
By Jerry Flattum - 02/14/2006 - 05:17 PM EST
Digital audio files such as MP3s are no different than any other computer data file: a series of 1s and 0s. To turn an analog signal (such as one picked up by a standard microphone) into a digital stream, ADC (analog-to-digital converter) software measures the signal at a regular interval to find the sampling rate. These samples, if measured close enough together, form a near-exact representation of the analog signal. The signal is now transmitted using 1s and 0s that computers and MP3 players can read.
What Does 44.1 kHz Mean?
When a soundcard is digitizing audio, it is measuring the voltage coming from the audio source many times per second. These measurements are called “samples,” and they are saved to the hard drive as a list of numbers. When it is time to play back the sound, the computer can then read the numbers from the hard drive and reproduce the original voltage levels.
The quality of digital audio depends on two factors -- the “sample rate” and the “sample size.” Sample rates usually lie between 8 kHz and 44.1 kHz; a sample rate of 44.1 kHz means that the sound is sampled, or measured, 44,000 times per second. Eleven kHz is considered “phone quality,” 22.5 kHz is “radio quality,” and 44.1 kHz is “CD quality.” Digitizing with a high sample rate results in a closer approximation to the original sound with better capture of high-frequency sounds, but it requires more disk space.
Sample size, usually 8 or 16 bits, tells how accurately the voltage is measured. For an 8-bit sample, numbers between 0 and 255 are used to describe voltages. For a 16-bit sample, numbers between 0 and 65,536 are used to describe 0%-100% voltages. Again, using a higher sample size gives a closer approximation to the original sound with smoother transitions between voltages.
CDs use a 44.1-kHz sample rate and a 16-bit sample size; that is 44,000 samples per second, with each sample taking up 16 bits (two bytes). Therefore one second of CD-quality sound would take up 88,000 bytes, or 88K of disk space. This would be doubled if the sound were in stereo.
DVD-audio typically uses 96,000 samples per second and 24 bits per sample. The spec is frequently written as 24-bit/96 kHz or simply 24/96 (likewise, CD is written 16/44). DVD-audio is expected to become the new standard for audio quality.
Each second of true CD-quality sound takes up more than 1.3MB of disk space, which is why file-compression technology is essential to digital audio, especially portable audio. Using principles of psychoacoustics (how the brain perceives sound) and perceptual coding (eliminating imperceptible sounds), engineers develop algorithms, called codecs (compression decompression), that compress songs into the smallest possible sizes with minimal loss of quality. The sound depends on two factors: the quality of this compression algorithm and the bit rate at which the song is encoded, measured in kilobits per second (Kbps).
When a digital file if played back, the analog-to-digital process is reversed. A digital audio device, such as an MP3 player or a computer soundcard, uses a DAC (digital-to-analog converter) to turn the 1s and 0s back into an analog signal that can then be amplified and broadcast over headphones or speakers. When playing back a file that’s been compressed, software on a digital device called firmware, takes the codec (compressed file) and decompresses it, then sends the decompressed 1s and 0s to the DAC. Sound quality depends on factors all along the audio reproduction chain: the recording, the digital file, the DAC chip in the player, the amplifier and the speakers.
Sampling can be a confusing term and is used in a variety of contexts.
Sampling frequency or sampling rate is the frequency or rate at which an analog signal is sampled or converted into digital data. Sampling frequencies are expressed in Hertz (cycles per second). For example, compact disc sampling rate is 44,100 samples per second or 44.1 kHz, however in pro audio other rates exist: common examples being 32 kHz, 48 kHz, and 50 kHz.
A sample is also a digital recording of a sound, which could be anything from a car horn to a note on a piano, beat of a drum or a person laughing. A sampler is a device that can record, store, edit, and play back samples at different pitches or volumes, and with different effects. It can also play samples repeatedly (as a loop), which is useful for creating sustained sounds or repeated rhythmic phrases. These sampled sounds can be mapped to different keys on a keyboard and played back (triggered) via MIDI (Musical Instrument Digital Interface).
Achieving realistic acoustic instrument emulation requires sophisticated sampling techniques. It requires multiple samples of different notes, velocities, timbres and other attributes of the instrument being sampled (plucked sound, sounds generated from different parts of the instrument, etc.). Sampling technology has become so sophisticated (as of 2005) that sampled sounds (digital reproductions) are virtually indisguishable from the original. The birds heard in a movie soundtrack or the violins heard in a pop tune could very well be digital reproductions of the original. They are used more frequently in recordings and soundtracks than the public is aware of.
Samples come from a variety of sources. Sampling technology is available to the average musician and consumer. Musicians use samplers. Samplers can be stand-alone devices or part of a digital audio workstation (DAW), comprised of an electronic keyboard, sequencer and computer. There are many sampling device manufacturers with Akai, Emu, Roland and Kurzweil being the most popular.
Sampling can be laborious process requiring excellent equipment and skills to produce high quality samples. Kurzweil is well known for its excellent quality samples of pianos and other acoustic instruments, and it is unlikely for an average musician or music listener to match Kurzweil expertise.
Most musicians use professional sample libraries, available on CD-ROMs or downloadable from the Internet. Or, they use the libraries that ship with the sampler/keyboard, usually as part of the unit’s read-only memory. Most samples are playable only on the sampler that created them. But manufacturers like Akai, Roland and Emu are increasingly allowing their units to read other formats.
Third party designers provide additional samples, either in a proprietary format that can be played back by only one kind of device, or in a format that can be read by several devices. These samples are sold as collections on a CD or downloaded from the Internet (as collections or separately).
A CD or DVD might contain a collection of guitar sounds only, or a mix of sounds, i.e., orchestral, rock drums, woodwinds, sound effects, etc. If a device is CD-capable, it can extract samples directly from the CD. Or, if the samples are downloaded, they can be loaded onto a floppy or other storage device, depending on what kind of storage devices the sampler or workstation can read.
Musicians may not create their own samples but frequently like to tweak or reshape them into an entirely different sound, using a keyboard’s synthesizing capabilities. For instance, a real sounding car horn could be shaped into a sound that can be played like a violin or a musical sound that’s never been heard before.
The sequencer mentioned earlier as part of a digital audio workstation, is a digital recording device (stand-alone, a unit of an electronic keyboard or software program) that can create, record, edit and process sound all within the digital domain. Frequently, what musicians do is play the sampled sounds on a digital keyboard and then record them directly to an on-board sequencer. An entire arrangement is built this way, with drums, guitar, keys, strings, or any other instrument so desired. Using special techniques, the human voice can also be sampled. This means that an entire recording can be made right in a musician’s home.
Recorded as WAV files, the song can then be mixed down to a 2-track stereo file, then converted into MP3. Sampled sounds or entire arrangements using sampled sounds can be converted into a wide range of audio files and formats (see Digital Audio Formats). Once converted, the song can then be uploaded to the Internet, and the musician never has to leave the computer.
Another form of sampling is when parts of a commercially recorded audio CD are sampled and then used as part of another recording. Sampling instrumental parts, vocal passages or entire sections of commercial recordings is common in dance music, electronica, and rap.
One popular recording might use a “sample” of another popular recording, for instance, a part of the song’s repeated chorus or an instrumental riff. As long as copyright permission is secured, this creates a sort of dialog for bands and artists, quoting each other much in the same way writers quote each other. It can also be considered a form of copyright infringement is permission was not secured beforehand.
Software samplers are now available. Tascam’s GigaStudio or Steinberg’s Halion sampling software programs can “stream” samples directly from a computer’s hard drive. This means samples can be many gigabytes in size. One company, EASTWEST, offers sampled pianos like the Steinway B and Bosendorfer that include over 700 stereo samples. Every key was sampled multiple times to capture the full range of dynamics.
Proprietary formats are a problem throughout the computing world and in part explain why understanding MP3 and other digital technologies can be so confusing. A prime example is how many software programs work only on IBM compatible machines and not Macintosh. Windows Media files (.wma) cannot be played by Realplayer and Realplayer files (.ra) cannot be played by Windows Media. Sampled sounds for the Kurzweil cannot be played on an Akai sampler.
However, a number of conversion software programs have the capability of converting almost any digital audio format into another. Plus, programs like Windows Media Player 10 (WMP 10) already play a wide range of media files. When installing a new player, most players ask if the user wishes to use the player as the default player. It then provides a list of media file types. From this list a user can select which files the player will play. Some files can be played by multiple players, like MP3.
In recording, one way to work around the proprietary problem is to record the audio output of the sound generating device directly into an external sequencer or sampler (hardware or software). Recording the output creates a track which is then mixed with other tracks (multi-track recording). All of these tracks are then saved as a single separate audio file, such as a .wav file or MP3 file.
From Synthesized Sound to Sampling
Synthesizers used to be analog, but are now mostly digital. Analog synthesizers are still used, but more often than not the sounds produced by those machines are now “sampled” using samplers. Many synthesizers now have built-in samplers, sequencers and other sound programming features, which is why they are now called digital audio workstations (DAW) instead of electronic keyboards or synthesizers. DAWs usually include both hardware and software. Classic synthesizers include the Moog, Prophet V, Arp Odyssey, Oberheim, and many more.
Analog vs. Digital
Many musicians, as well as music listeners, argue over analog versus digital sound. Digital music is often criticized as being too clean, losing the warmth generated by acoustic instruments. Country music, for instance, favors the use of acoustic instruments like the acoustic guitar, fiddle, and banjo).
However, country’s adherence to a “roots” sound is in part, illusion. The analog to digital chain is a bit confusing. An analog microphone is used to transmit the signal generated from the acoustic instrument to a digital recording console, used to capture, store and convert analog signals into digital ones. It creates a sort of paradox, since analog (acoustic) instruments are being recorded and then reproduced digitally, either as MP3 files or on audio CDs.
But, the digital realm has come a long way from the cold, synthesized sound it was once known for, and the loss of warmth or “human feel” in performance is becoming a non-issue. Digital sound has completely eliminated the weaknesses of tape and vinyl, like noise, scratches, hiss, etc. MP3 technology promises to eliminate CD technology as well.
As a reminder, in the analog vs. digital debate, the banjo sound heard in a particular country tune could very well be a sampled banjo sound played on a keyboard.
Consumers Turn Pro
The technology used by musicians and recording engineers is now readily available to the consumer. In fact, almost all of the hardware used to make recordings is in software form and many computers are now being sold with full music production capability. Consequently, many music fans are now creating their own music as well as listening to purchased recordings of their favorite artists and bands. The quality of such amateur creations are often a far cry from the talent and production that goes into most commercially released recordings, but the gap is closing.
Virtually everything involved in music creation can be done on a computer. Even digital keyboards are unnecessary since software-based synthesizers entered the digital music scene. A piano-styled keyboard—one with black and white keys—is not necessary to generate music or even generate a piano sound. It can all be done with a computer keyboard and a mouse.
Plus, music software programs can be downloaded from the Internet making CD-ROM installation unnecessary. Music can be created (along with new synthesized sounds never heard in the acoustic realm), arranged, edited, mixed and mastered all within the digital domain.
The finished recording is mastered—converted into an audio file—and can then either be stored on a CD or DVD, or made available as a downloadable MP3 file from the Internet. If the end user downloads the MP3 file directly to a portable storage device, the MP3 file can then be transferred to and played back by an MP3 player installed in the car.
The music never touches the ground, so to speak.
Digital Audio Formats
There once was a time when borrowing a vinyl record or CD between friends could make or break a friendship. “He never returned my Pink Floyd ‘The Wall” album…and of course, that would be unforgiveable. The Digital Age changed all that. No more friendships breaking up because of unreturned albums or CDs. Now friends can swap copies of computer files without fear of ever losing the original...unless of course the computer crashes.
Backing up audio files—as well as all computer programs and files—is a high priority. In the same way vinyl records, cassettes and CDs somehow magically disappear, the same goes for digital audio files.
Types of Digital Audio Formats
An audio file format is a file format for storing audio data on a computer. There are a number of different audio file formats, with MP3 being the most common.
The difference between CD and DVD formats and digital audio formats like .wav, MP3, .wma and others can be confusing. CDs and DVDs are physical media used to store digital audio files. Digital audio files do not exist in physical form.
Philips Electronics and Sony co-invented the CD and the technology was set as a standard in 1982. CDs were introduced in the United States in 1983. By 1988, more CDs were sold than vinyl or cassette. Because CD became a standard, consumers could rely on consistency across a number of CD players.
The battle for a digital audio format standard rages on and MP3 leads the way. However, MP3 does not have DRM, or Digital Rights Management. DRM software is technology that protects a piece of intellectual property, such as a song, from being illegally copied, usually by placing restrictions on how it is used.
Since MP3s have no DRM, users can easily convert their CD collections into MP3s and then make copies, transfer to portable devices and generally use them without restriction. No DRM also means that most MP3s on the Internet are available illegally on file-sharing networks, meaning that the copyright holders are not paid when a song is downloaded. Before the Napster controversy, consumers were used to “free” music. The shift to DRM and charging in the range of 99 cents per song has not set well with those who’ve come to believe music should be free.
Consequently, MP3 is not the format used on major legal download sites. Even Napster uses WMA (Windows Media). Apple Computer’s iTunes Music Store uses the Advanced Audio Coding (AAC) format, which is an extension of MP3. Apple has embedded its own DRM software into the ACC format to protect the copyright holder. Apple’s FairPlay DRM allows users to play the songs they’ve purchased from the iTunes store on five computers using Apple’s iTunes software. Another restriction is playlists of purchased songs can be burned to CD seven times. In addition, Apple’s iPod is the only portable player compatible with tunes downloaded from iTunes. These restrictions have not prevented iPod and iTunes growing popularity.
Napster, the former file-sharing network morphed into a legal download site, uses Windows Media Audio (WMA), Microsoft’s proprietary audio format combined with Microsoft’s own DRM software. Napster allows users to burn purchased songs to CD, play them on up to three PCs and transfer to a portable player that is WMA-compatible.
Napster also offers a subscription service in which users pay a $9.95 monthly fee to receive unlimited access to streaming music and downloading of songs to a hard drive. Once the subscription is discontinued, the user loses the rights to listen to those songs. Users can buy the songs as well as pay the monthly subscription fee.
Allegedly Apple makes more money on the sales of iPods than it does on the sale of music from the iTunes Music Store. This could change. But, because files from the iTunes Music Store are only AAC-formatted, with the iPod the only compatible portable player, Apple could face severe problems if the industry shifts to MP3 as the standard file format.
RealNetworks operates the RealPlayer Music Store, selling songs in the AAC format using Real’s proprietary Helix DRM software. Real’s format is supported directly only by one portable player, the Creative Nomad Jukebox Zen Xtra, and by a handful of personal digital assistant devices from PalmOne. Neither Apple nor Windows supports RealPlayer files. Three major music sites—Napster, WalMart and MusicMatch—all use the WMA format, which is compatible with most digital music players.
There are two major groups of audio file formats:
Lossless formats: WAV, PCM, TTA, FLAC, and AU
Lossy formats: MP3, Ogg Vorbis (OGG), Windows Media Audio (WMA), and AAC
Lossy file formats leave out sounds beyond the human audible range. Lossless audio formats have a compression ratio of about 2:1, but no data/quality is lost in the compression. Uncompressed, the data is identical to the original. However, uncompressed audio files can be quite large. Compressed files are easier to store simply because they take up less space. Lossless formats are primarily used in professional recording situations.
There are three main types of lossy encoding:
Constant Bit Rate (CBR) is the same bit rate used to encode the entire file and generally does not sound as good as Variable Bit Rate.
Variable Bit Rate (VBR) uses different bit rates for each section of an Mp3 files (sections are called frames). The encoder determines when and where to change the bit rate. For instance a lower bit rate is used with there is little audio, like a soft instrumental passage or silent break.
Average Bit Rate (ABR) is similar to Variable Bit Rate. It regulates how variable the compression is and determines an average.
With audio compression, the average amount of data required to store one second of music is expressed in kilobytes per second or Kbps). Some codecs like MP3, WMA, and AAC allow files to be encoded at different bitrates. Generally, as bitrate decreases, so does sound quality. However, file size is smaller. Audio quality can never be recovered, so converting a 128 Kbps file to 320 Kbps will result in a file with the same quality as 128 Kbps. Each time a lossy format is compressed a little quality is lost. An MP3 file encoded at 128 Kbps is standard.
A codec is software that is used to compress or decompress a digital media file, such as a song or video. A codec consists of two components—an encoder and a decoder. The encoder performs the compression function and the decoder performs the decompression function.
Some codecs include both components and some codecs only include one of them. For example, the video on a DVD-Video disc is compressed using the MPEG-2 codec. Most DVD playback programs will decode the MPEG-2 codec, but will encode to prevent a DVD-Video disc from being copied.
Windows Media Player, Windows Movie Maker, iTunes, Napster, and other programs use codecs to play and create digital audio and media files. For example, when a song is ripped from an audio CD to the computer, Windows Media Player (or other ripping software/hardware) uses a codec to compress the song into an audio file. By default, Windows Media Player uses the Windows Media Audio codec to compress the song into a compact WMA file.
When a song is played back, the Player uses a codec to decompress the audio file and output the music to the speakers. The same is true for nearly all music or video files.
It’s easy to confuse codecs with file formats and vice versa because sometimes the name of the codec and the name of the file format are the same.
A file format is a type of container. Inside the container is data that has been compressed by using a particular codec. A file format such as Windows Media Audio (.wma) contains data that is compressed by using the Windows Media Audio codec.
However, a file format such as Audio Video Interleaved (AVI) can contain data that is compressed by any of a number of different codecs, including the MPEG-2, DivX, or XVid codecs. AVI files can also contain data that is not compressed by any codec. Consequently, some AVI files might play while others won’t, depending on which codecs were used to compress the file and which codecs are installed on the computer.
For the same reason, audio portions of an AVI file might play but not the video portion. There are hundreds of audio and video codecs in use today. Some have been created by Microsoft, but the vast majority of codecs have been created by other companies, organizations, or individuals.
By default, a number of the most popular codecs are installed in the Windows operating system and with Windows Media Player, such as the Windows Media Audio, Windows Media Video, and MP3 codecs. If a codec is required that isn’t included by default, the Player searches the computer to see if it can use any of the codecs installed by other digital media playback and creation programs. If the necessary codec is not available from the Web or is not compatible with Windows Media Player, the file can’t be burned, played or synchronized.
If the Player doesn’t find the right codec on the computer, it tries to download the codec from a Microsoft server. If the codec is available, Windows Media Player installs it. If the codec is not available on the server (for example, because the missing codec was not created by Microsoft), Windows Media Player displays a message that indicates the computer is missing a codec.
There are two main reasons why compressing a digital media file is helpful: storage space and transfer time. A compressed file takes up less storage space than an uncompressed file. That means more files can be stored on a computer, portable music player (MP3 player), or data CD.
A compressed file can be transferred from one location to another more quickly than an uncompressed file. For example, Web site creators frequently compress audio and video files so that they can be streamed over the Internet. It’s usually not possible to stream an uncompressed file smoothly over the Internet.
The MPEG-2 and DivX video codecs and the ACELP.net and Ogg Vorbis audio codecs are examples of codecs that are not included in Windows operating systems or the Windows Player by default.
For non-Microsoft codecs go to the WMPlugins.com Codec page. Most of the codecs listed on WMPlugins.com are available at no cost. Some codecs, such as the MPEG-2 codec (also known as a DVD decoder), must be purchased from companies that have licensed them. For an extensive list of file types and the programs that use them, go to the FILExt.com website.
To ensure Windows Media Player can download codecs from the Microsoft server, the “Download codecs automatically” option must be enabled (it’s turned on by default). To check, right-click the Windows Media Player title bar, point to Tools, and then click Options. On the Player tab, verify the Download codecs automatically check box is selected.
Looking at a file’s extension (such as .wma, .wmv, .mp3, or .avi) is one way to tell what kind of file is being used, but it doesn’t necessarily tell what codec was used. Many programs create files with custom file extensions. Plus, anyone can rename a file without changing the file’s format. A file with the extension .mpg for instance, could be an AVI file that was compressed using some version of an MPEG video codec.
A file can be compressed by more than one codec. For example, one codec might be used to compress the audio portion of a file and another codec might be used to compress the video portion of a file.
Finding out what codec is used in an audio file is not always easy. In the Microsoft Windows Player, a user can right-click the file while it’s playing, then click Properties. On the File tab, look at the sections named Audio codec and Video codec. Also, GSpot Codec Information Appliance and Sherlock the Codec Detective are two non-Microsoft tools available for determining codecs.
Caution is necessary when installing codecs that aren’t listed on WMPlugins.com or other legitimate codec sites. Information on many of these sites might conflict. Tsunami, Nimo, and other codec packs are examples of codecs that frequently cause problems with Windows Media Player.
MP3 is the most popular digital audio encoding and lossy compression format invented in 1987 by the Fraunhofer Institute for Integrated Circuits in Erlangen, Germany. Like many audio formats to follow, MP3 was designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners.
In 1987, the Fraunhofer IIS started to work on perceptual audio coding in the framework of the EUREKA project EU147, Digital Audio Broadcasting (DAB). In a joint cooperation with the University of Erlangen (Prof. Dieter Seitzer), the Fraunhofer IIS finally devised a very powerful algorithm that is standardized as ISO-MPEG Audio Layer-3 (IS 11172-3 and IS 13818-3).
Without data reduction, digital audio signals typically consist of 16 bit samples recorded at a sampling rate more than twice the actual audio bandwidth, that is, 44.1 kHz for CDs. MPEG audio coding reduces the original sound data from a CD by a factor of 12, without losing sound quality. Factors of 24 and even more still maintain a sound quality that is significantly better than just reducing the sampling rate and the resolution of samples. Data reduction is realized by perceptual coding techniques (PCM, or Perceptual Code Modulation) addressing the perception of sound waves by the human ear.
The basis for the development of compressed files is the relationship between file size and computer hard drive storage space. The next few years will see continued dramatic changes in smaller file size and bigger computer hard drives, as well as portability.
The name MP3 is derived from “MPEG-1 Audio Layer 3” more formally known as ISO/IEC 11172-3 Layer 3. The files recorded in this format are saved with the .MP3 filename extension. MP3 is a lossy compression format.
A Brief PCM Primer
As a lossy compression format, MP3 provides a representation of pulse-code modulation-encoded (PCM) audio data in a much smaller size by discarding portions that are considered less important to human hearing (similar to JPEG, a lossy compression for images).
Pulse-code modulation (PCM) is a digital representation of an analog signal. The magnitude of the signal is sampled regularly at uniform intervals, and then quantized to a series of symbols in a digital (usually binary) code. PCM is used in digital telephone systems and is also the standard form for digital audio in computers and various CD formats.
Several PCM streams may be multiplexed into a larger aggregate data stream. This technique is called time-division multiplexing, or TDM. TDM was invented by the telephone industry, but today the technique is an integral part of many digital audio workstations such as Pro Tools (industry standard digital recording software comprised of both hardware and software).
PCM was invented by the British engineer Alec Reeves in 1937 while working for the International Telephone and Telegraph in France.
A Google search on PCM (pulse code modulation) will yield a list of websites providing a more advanced understanding of the science behind PCM.
A number of techniques are employed in MP3 to determine which portions of the audio can be discarded, including psychoacoustics. MP3 audio can be compressed with different bit rates, providing a range of tradeoffs between data size and sound quality.
Despite iPod’s reliance on the ACC format and numerous other formats available, most reports indicate MP3 will remain the dominant digital audio format with a wide range of software and hardware manufacturer support.
MP3 has nearly a 20 year development history, but it was WinAmp that became the most popular MP3 player around 1995, pioneered by Nullsoft. At the time, computer hard drives were far from the mega-gigabyte sizes currently enjoyed. Along with WinAmp, leading the way to MP3’s popularity was The Internet Underground Music Archive (IUMA) website.
IUMA is credited with starting the online music revolution before Napster. IUMA was the Internet’s first high-fidelity music website, featuring MP2 recordings before MP3 became popular. IUMA was started by Rob Lord (who later headed Nullsoft) and Jeff Patterson in 1993. Other founding members were involved.
For a detailed history of the technical development of MP3, see the Fraunhofer Institute website.
Quality of MP3
The standard bitrate for MP3 is 128 Kbps, although bitrates vary. Bit rate is the number of bits of encoded data used to represent each second of audio. By contrast, uncompressed CD audio has a bit rate of 1378 Kbps. Most listeners find MP3s encoded at the 128 bitrate is close enough to CD audio.
Files encoded with a lower bit rate generally play back at a lower quality. With too low a bit rate, “compression artifacts” (sounds not present in the original recording) may appear in the reproduction. Because of their inherent properties, some sounds are more difficult to compress than others, with too much compression resulting in loss of audio quality.
The bit rate is variable for MP3 files. The general rule is that more information is included from the original sound file when a higher bit rate is used, and thus the higher the quality during play back. In the early days of MP3 encoding, a fixed bit rate was used for the entire file.
Bit rates available in MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kpps. Available sample frequencies are 32, 44.1 and 48 kHz.
44.1 kHz is almost always used since it coincides with the sampling rate of CDs. 128 Kbps has become the standard for acceptable quality although 192 Kbps is becoming more popular.
Variable bit rates (VBR) are also possible. Audio in MP3 files are divided into frames (which have their own bit rate) so it is possible to change the bit rate dynamically as the file is encoded. VBR is increasingly being used. VBR allows more bits for parts of the sound with higher dynamics (more sound movement) and fewer bits for parts with lower dynamics, increasing quality and decreasing storage space.
In addition to bitrate, MP3 quality also depends on the quality of the encoder. Any digital audio format can be converted into MP3, although some more difficult than others. An MP3 encoded at 128 Kbps with a good encoder might sound better than a 192 Kbps MP3 produced with a bad encoder. Lossless formats produce the best possible result, at the expense of a lower compression ratio. There are many different MP3 encoders available, each producing files of differing quality.
LAME was first created by Mike Cheng in early 1998. LAME is an educational tool to be used for learning about MP3 encoding. The goal of the LAME project is to use the open source model to improve the psychoacoustics, noise shaping and speed of MP3. Many popular ripping and encoding programs include the LAME encoding engine.
For a comprehensive list of software that uses LAME, see the LAME website
The iTunes-LAME Encoder combines the simple interface of iTunes with the high quality of the LAME encoder, and converts audio files to MP3 with outstanding results.
The Sound Game
In the playback of any MP3 file, audio quality depends on every step in the recording/reproduction process. It starts with how the music is recorded, how it is encoded, what device is used for playback, and the sound system.
Sound reproduction is as much a subjective experience as it is a science. Even factors like the quality of the instrument used in the recording to the environment where the music is reproduced can play a major role in perception of sound.
The sound game gets tricky. There are cheap violins and there is the Stradavarius, the rarest of which costs 100s of thousands—who can tell the difference? In the conversion of raw audio data into MP3, how much of the sound of a Stradivarius is lost? What encoder was used? Was the recording played back through cheap computer speakers? Was it heard in a nightclub using the most expensive pro audio equipment available?
Because of the subjective experience in hearing music, playing MP3s in the car can mean experimenting with a number of solutions.
ID3 and other tags
A “tag” is data stored in an MP3 (as well as other formats) which contains metadata such as the title, artist, album, track number or other information about the MP3 file to be added to the file itself. The most widespread standard tag formats are currently the ID3 ID3v1 and ID3v2 tags, and the more recent APEv2 tag.
CDs are recorded and mastered at different volumes. In fact, songs on the same CD are frequently recorded at different volumes as well. To overcome these mismatches in volume so the volume stays the same from song to song, a process called “normalization” is used. Some hardware and software players offer normalization for disk to disk and efforts are being made to standardize volume information within the tags of MP3s and other digital formats.
Codecs (In alphabetical order)
NOTE: Some graphics codecs are included because of their relationship to digital audio and because of their popularity.
AAC - Advanced Audio Compression (.aac, a.k.a. MP4 or .mp4)
Advanced Audio Compression (AAC) was designed by Dolby, is more advanced than mp3, and is commonly referred to as MP4. AAC files encoded at lower bitrates (like 96 Kbps) rival MP3s encoded at higher bitrates (like 128 Kbps) despite their notably smaller size. Apple’s iTunes uses AAC stored in m4a files. Files ending in .m4a are audio content only, .mp4 can contain both audio and video, and .m4v are video files. The .m4b format is the same as .m4a except it is used for audio books read by the iPod. Apple also uses a lossless format (ALAC) using the .m4a file tag. AAC is a lossy format. Another use of AAC is with Mpeg-2 (home cinema). Dolby continues to update the AAC format.
AC-3 is used in Dolby Digital and DVD.
AIFF (Audio Interchange File Format; .aif, .aiff)
An audio format for Macintosh operating systems used for storing uncompressed, CD-quality sound (similar to WAV files for Windows-based PCs). The Amiga used a variation of AIFF, with the .iff extension. An iPod plays .aif files.
Apple Lossless (ALE)
Apple Lossless Encoding (or just Apple Lossless) was developed by Apple Computer. It produces full, uncompressed CD-quality audio in half the space of the original file.
ATRAC was developed by Sony engineers in the early 90’s and generates near-CD quality. The MiniDisc format uses ATRAC to fit a whole CD’s worth of music on a 2-1/2” disc.
ATRAC3 is a smaller file size version of the ATRAC codec. It’s used for MDLP recording with some MiniDisc recorders, for music storage in some portable memory players, and in other Internet music applications like Liquid Audio and RealAudio.
A MiniDisc recorder with MDLP offers a range of compression options (in order of increasing compression):
Normal recording mode: standard ATRAC codec (292 Kbps)
LP2 mode: ATRAC3 codec (132 Kbps)
LP4 mode: ATRAC3 codec (66 Kbps)
ATRAC3plus is found on Sony’s Hi-MD portable recorders.
The AU codec is used for posting sound clips on the Internet. AU files can be played back on Windows, Macintosh, and other operating systems.
AVI (Audio/Video Interleaved; .avi)
AVI is a format for storing and playing back movie clips with sound on Windows-based PCs. An AVI file is organized into alternating (“interleaved”) chunks of audio data and video data. AVI is a “container format,” meaning that it specifies how the data will be organized, but it is not itself a form of audio or video compression. AVI is the file created when DV clips are imported from a digital camcorder to a PC (These clips are often referred to as “DV-AVIs” because they contain full-quality DV content).
Broadcast Wave Format (BWF)
BWF is a standard audio format created by the European Broadcasting Union as a successor to WAV. BWF allows metadata to be stored in the file.
CD Audio (.cda)
Audio CDs are based on the Red Book standard: 44.1 kHz, 16 Bit Stereo.
DivX is an advanced digital media format for compressing video to a convenient size without losing any noticeable quality and play those videos back on almost any device. For example, the DivX codec can compress an MPEG-2/DVD file to nearly one-tenth its original size or a home movie (DV) to 25:1.
DivX products are combined into two different software bundles. The DivX Play Bundle contains everything needed to play a DivX file (DivX codec and DivX Player. The DivX Create Bundle contains everything needed to create and play a DivX file (DivX Player, DivX Pro and DivX Converter). DivX works with both PCs and Macs.
DV (Digital Video)
DV is the format used by most digital camcorders for movies and CD-quality sound, usually on Mini DV cassettes. The DV format employs a form of video compression (applied in real-time while recording with the camera) and uses a lot of memory. When transferred to a computer, a DV clip requires about 1 GB of storage per 5 minutes of video. Clips are usually stored on the computer as QuickTime or .AVI files.
Despite its use of compression, DV promises up to 520 lines of resolution. DV uses a type of compression known as “intraframe.” It encodes video at the full standard frame rate of 30 frames per second. This not only makes for high-quality video, but also allows frame-by-frame editing. In contrast, video codecs like MPEG1 or MPEG2, which can “squeeze” clips into smaller sizes, tend to handle a video sequence by reducing the number of full frames per second and encoding the differences between frames. These are known as “interframe” forms of compression.
FLAC is a lossless compression format with large file size. A 3 minute track is about 30MB. FLAC is very similar to Monkeys Audio and uses Ogg tags to store tag information. Flac is supported on the Rio Karma.
GIF (Graphic Interchange Format, .gif)
GIF is a format for storing digital images, commonly used for bullets, icons, and other graphics on the Web. The GIF format is limited to 256 colors, so it’s not as commonly used as JPEG for storing digital photos. A single GIF file can combine several frames together, for basic animated motion.
QuickTime 7 Player takes advantage of the latest video compression technology called H.264. Chosen as the industry-standard codec for 3GPP (mobile multimedia), MPEG-4 HD-DVD and Blu-ray, H.264 represents the next generation of video for everything from mobile multimedia to high-definition playback.
The H.264 codec compresses video intp much smaller files without sacrificing quality. H.264 delivers the same quality as MPEG-2 at a third to half the data rate and up to four times the frame size of MPEG-4 Part 2 at the same data rate. H.264 achieves the best-ever compression efficiency for a broad range of applications, such as broadcast, DVD, video conferencing, video-on-demand, streaming and multimedia messaging. With H.264, an Apple Cinema HD Display and a dual Power Mac G5, for instance, can turn a computer into a full entertainment system with HD playback.
H.264 is now mandatory for the HD-DVD and Blu-ray specifications (the two formats for high-definition DVDs) and ratified in the latest versions of the DVB (Digital Video Broadcasters) and 3GPP (3rd Generation Partnership Project) standards. Numerous broadcast, cable, videoconferencing and consumer electronics companies consider H.264 the video codec of choice for new products and services.
JPEG (Joint Photographic Experts Group, .jpg, jpeg)
JPEG is a codec for storing and transferring full-color digital images, often used to post photography and artwork on the Web. JPEG compression takes advantage of the human eye’s inability to see minute color changes, removing portions of data from the original picture file. Compression ratios vary depending on the desired file size and image quality. Motion JPEG is used by some digital cameras and camcorders for storing video clips of relatively small file size. With Motion JPEG, each frame of video is captured separately and reduced in size using JPEG compression.
Liquid Audio (.Lqt, .LA1)
Liquid Audio is more than a digital audio format. Liquid Audio is software used for music management and playback (Liquid Player), and is also a network of affiliated web sites (Liquid Music Network) that sell downloadable, copy-protected music. Liquid Audio uses different types of compression for streaming and for downloading.
The MED format was specifically developed to extend the basic MOD format to support the Amiga program called MED. Newer Amiga versions were re-named OctaMED with other versions named as the the MMD file format. MED became an advanced, Amiga based software music editing tool. Website: Med.uk.com
MIDI (Musical Instrument Digital Interface, .mid, .midi)
MIDI files are usally very small because audio information is stored on a soundcard. A MIDI file tells the soundcard what instrument to play. MIDI is limited to 128 instrument sounds (extended by Yamaha’s XG), and different soundcards feature different sounding instruments. The sound quality of these instruments is poor compared to sounds that are sampled (using samplers). Roland’s Virtual Sound Canvas can save MIDI files as wave files.
MIDI is also a process where musical instruments and other audio components can communicate with each other, in the same way as computers can talk to each other. One device is generally used to control another device or chain of devices.
Monkeys Audio (.ape)
Monkeys Audio is a lossless compressor and uses the flexible APE tagging system to imbed track information (title, artist, etc.) within the music file.
MPEG Moving Picture Experts Group, .mpg, .mpeg)
The Moving Picture Experts Group is a committee that sets international standards for the digital encoding of movies and sound. There are several audio/video formats which bear this group’s name. In addition to their popularity on the Internet, several MPEG formats are used with different kinds of A/V gear:
MPEG1 is often used in digital cameras and camcorders to capture small, easily transferable motion video clips. It’s also the compression format used to create Video CDs and commonly used for posting clips on the Internet.
MPEG2 is used for commercially produced DVD movies, home-recorded DVD discs, and digital satellite TV broadcasts. MPEG2 is also the form of compression used by TiVo-based hard disk video recorders. MPEG2 rivals the DV format, although it uses more compression. Because MPEG2 allows for different compression ratios, DVD recorders and hard disk video recorders can offer a range of recording speeds.
MPEG4 is a newer codec used for both streaming and downloadable Web content. It is also the video format used by a growing number of portable video recorders.
The MP3 audio format is part of the MPEG1 codec.
MP2/MP1 (.mp2, .mp1)
The MP2 and MP1 are predecessors to MP3 and rarely used anymore.
The MP3PRO is an enhancement of the existing MP3 format using an advanced coding process called Spectral Band Replication. Most MP3 players support MP3PRO, but play it as a normal MP3 file. An MP3PRO player must be used to take full advantage of the extra high frequency information embedded in the normal mp3 stream. Fraunhofer claims mp3PRO files are twice the quality of mp3 files but at half the file size (64 Kbps instead of 128 Kbps). It’s possible that MP3PRO will replace MP3.
Musepack (MPC, MP+)
Musepack is a derivative of MP2. Musepack is a lossy compression that allegedly offers superior audio quality at 192 Kbps. Website: Musepack.net
Music on Demand (MOD, .mod)
MOD is similar to MIDI, with the exception that instrument samples are stored in the track and not on a soundcard. Tracks are constructed using a sequencer (multi-track digital recording device). The Amiga computer uses MOD.
Ogg Vorbis (.ogg)
Vorbis is an open and free audio compression (codec) project from the Xiph.org Foundation. It is frequently called Ogg Vorbis. Vorbis was started in 1998 as a response to Fraunhofer Gesellschaft announcing plans to charge licensing fees for the MP3 format. Vorbis founder Christopher “Monty” Montgomery began work in conjunction with a growing list of other developers and released a stable version 1.0 of the codec in 2002.
INSERT Tip: Xiph.org, the official Ogg Vorbis website, provides a recent list of Vorbis-supported hardware and software, such as players, portables, PDAs, and microchips. For audio enthusiasts who double as computer programmers, source code for Ogg Vorbis is also available at the site.
Ogg Vorbis has gained popularity among open source communities, video games, music websites and online radio stations. It has also gained support by an increasing number of hardware (Rio Karma, Samsung, Neuros and iRiver) and software players (there is conversion software for iTunes and other software players need an external plug-in). It’s used for high quality (44.1 to 48 kHz, 16+ bit, polyphonic) audio using fixed and variable bit rates from 16 to 128 Kbps/channel. Ogg Vorbis provides higher quality audio than MP3 and also supports full ID Tagging. Website: Xiph.org
QDesign is used in QuickTime at low bitrates.
QuickTime is used for storing and playing back movies with sound. Though developed and supported primarily by Apple, the format isn’t limited to Macintosh operating systems. It’s also used in Windows systems and other computing platforms.
Real Audio (.ra, .rm)
RealAudio from RealNetworks is a proprietary format used for streaming on websites. It is played back using the Realplayer much in the same way as Windows Media Player plays .wma files. RealMedia includes the RealAudio codec for sound clips and RealVideo codec for movies. It uses high compression ratios but with not much loss of audio quality. Website: Real.com
SDII (Sound Designer II)
SDII is an audio format for Macintosh operating systems and used by pro-audio editing software applications. SDII files, like AIFF and WAV files, are capable of storing uncompressed CD-quality audio.
SDMI (Secure Digital Music Initiative)
SDMI is not a file format but a copyright protection system for digital music files. SDMI-compliant hardware and software enables users to download and play both free MP3s and copy-protected music from major-label artists. Most companies like Apple, Microsoft and RealNetworks have created their own proprietary copy protection system.
Speex is a free software and patent based on CELP, specifically designed for speech and VoIP. It has limited support and is used for audio books.
Shorten is a lossless codec created by SoftSound from Cambridge and is used in file sharing. A SHN file is about ½ the size of its original WAV or AIFF source, but takes up considerably more storage space than an MP3 file. eTree.com is a main resource center for Shorten.
TIFF (Tag Image File Format, .tiff)
TIFF is a flexible format for digital still images used in desktop publishing. TIFF images can incorporate various forms of compression (like JPEG), or can be uncompressed. Some digital still cameras offer a special TIFF mode for capturing uncompressed photos but these files require much more storage space than JPEGs.
TTA (True Audio)
The True Audio (TTA) codec is a free, simple, realtime lossless audio compressor. The codec was built to offer adequate compression levels while maintaining high operation speeds. The TTA performs lossless compression on multichannel 8, 16 and 24-bit data of WAV audio files. Compression ratios achieved by the TTA codec vary, depending on music type, but range from 30% - 70% of the original. TTA supports both ID3v1 and ID3v2 information tags. It allows for the storage of up to 20 audio CDs worth of music on a single DVD-R, retaining the original CD quality audio, plus detailed information in the popular ID3 tag format. All TTA source code and binaries are freely available and distributed under Open Source licenses.
VOC is a format by Creative labs used for their Sound Blaster cards. VOC files tend to be 8 Bit, Mono, and 11.025 kHz. No compression is involved with VOC files. Website: Creative.com
VQF was developed by NTT Labs and used by Yamaha. VQF generates higher compression rates than MP3 but uses more processor time to encode and decode. It is not a commonly used format.
WAV is most basic of all audio formats. PCM (Pulse Code Modulation) audio data is stored uncompressed. When an Audio CD is converted to a wave file, the resulting wave file is 16 Bit, Stereo with a sample frequency of 44.1 kHz, which equals about 10MBs per minute of audio.
Windows Media Audio (.wma)
Microsoft’s WMA offers more advanced compression than MP3. Like other codecs, WMA is continuously under development with newer additions like WMA Lossless, WMA Pro and WMA Voice. The normal WMA file is widely supported by portable players but the newer versions are not.
Most codecs are licensed and owned by major companies like the Fraunhofer Gesellschaft, Dolby Labs, Sony, Thomson Consumer Electronics, and AT&T. There is considerable argument over which codec (audio file format) is the best. MP3 is the most popular, but codecs like Ogg Vorbis, MP3PRO, AC-3, Windows Media Audio, MPC and RealAudio perform allegedly sound better. Again, different results are as much subjective as technical.
Stereo, in the simplest of terms, is based on our having two ears, so consequently there is a right channel and a left channel. Audio is not exactly divided in terms of right or left. More so, a stereo “field” is created. For instance, a guitar is heard in both channels, but is dominant in one channel. Meanwhile, the piano is also heard in both channels, but is dominant in the channel opposite the guitar. The drums are “mixed” in the middle, where they are perceived to be heard coming from the center between two speakers, instead of right or left.
Recording engineers play a very important role in mixing and blending instruments and vocals when recording music. Mixing is the art of finding a balance in volume levels of different instruments and recording tracks, stereo field placement, F/X processing (effects), and other processes and components used to record music.
A stereo field emcompasses sound heard back and front as well as left and right. For instance, the violins are heard in the back of the mix while the vocals are heard in the front of the mix. This is not to be confused with 5.1 Surround, where speakers are physically situated in a 360 degree circle around the listener. Stereo is the most widely used channel configuration and matches most Audio CDs.
Mono is a recording with only a single channel of information. A 128 Kbps encoded Mono file is the same size as a 128 Kbps encoded Stereo file. The reason is because the mono file is encoded at twice the rate as the stereo file. The Kbps measurement is for the whole recording regardless of the number of channels. So, a more accurate assessment of a Kilobyte rate is Kbps divided by the number of channels.
Joint stereo coding takes advantage of the fact that both channels of a stereo channel pair contain far the same information. These stereophonic irrelevancies and redundancies are exploited to reduce the total bitrate. Joint stereo is used in cases where only low bitrates are available but stereo signals are desired.
MP3 and Multichannel Surround (5.1 Surround)
A revolutionary change that still has yet to replace stereo is multichannel surround, or simply, 5.1 Surround. Quadraphonic stereo came and went without ever really catching on. The future of surround sound remains to be seen. But, having more than one or two speakers in the car is common, although not true stereo. The standard 4 speakers in most cars attempts to simulate the stereo field, namely, front and back along with right and left.
A multichannel playback requires the right speaker setup. One pair of speakers functions as the left and right front speakers, and the other pair as the left and right surround speakers. A center channel speaker establishes a well-defined soundstage in front of the listener. A single full-range car speaker can be installed in the center of the dash or a portable powered speaker will work. A subwoofer adds essential low frequencies that give more depth and realism, usually installed in the trunk or rear of the vehicle. Subwoofers can be self-powered or powered by a separate amp.
The reciever must be able to decode surround information. Some models use a digital cable to connect with a surround sound signal processor. Some DVD receivers have the necessary five channels of amplification built in, requiring only an amplifier for the subwoofer to be complete. If the receiver doesn’t have the necessary amplification, separate amplification to power five full-range speakers (center, two fronts, two surrounds), plus the subwoofer, is needed.
Movie theater and live concert sound systems played a significant role in developing a more encompassing, immersive listening experience. Speakers were placed not just in front, but all around. The goal is to feel a part of the experience; to be in the middle of it. Multiple speakers could simulate this, but visually audiences were locked into a two-dimensional view. There has been some experimenting done with screens that surround a viewer.
The human aspects that drive technological change cannot be ignored. We have only two ears, one on each side of the head, and two eyes, in the front. We can hear sound from any direction, and all we have to do is turn our head to increase our visual field.
But beyond the limits of two ears and two eyes is the desire for technology to feed the imagination. It does this by replicating and simulating a live concert through mechanical and electronic means. With eyes closed, the sound of a home or car stereo system is supposed to make one feel as though they are in the middle of a live concert. With movies, the goal is to feel a part of the action.
Surround sound comes as close as anything to creating this all-immersive experience. With 5 channels to play with, musicians and recording engineers can design the listening experience with complete control. Studios are also beginning to reissue older recordings, “remixed” to multichannel formats.
Although a driver is not sitting in an ideal place for full stereo effect, the relatively small area of a car allows for a total sound experience regardless of physical position of the listener. Speaker balance controls control how much sound is coming from the right, left, front, back, and any combination thereof. But traditional speaker balance controls control only the volume, not what is heard. With surround sound, the drums can come from the left, the guitar from the right, the singer from the front and the bass from the back.
Dolby Digital 5.1 was originally designed for movie soundtracks heard in movie theaters. Dolby Digital 5.1 is a codec, so audio is compressed. Most DVD-Audio discs now contain Dolby Digital 5.1. Many car DVD receivers have Dolby Digital 5.1 decoding or can be connected to outboard 5.1 sourround sound processors. DVD-Audio discs with Dolby Digital 5.1 are backwards compatible, meaning, they can still be played on standard DVD players but without the Dolby effect.
Like Dolby Digital, DTS provides 5.1 channels of digital audio, but with less compression. And, like Dolby Digital, a receiver must have DTS decoding capability to take full advantage of the sound. More discs are being made available with DTS 5.1.
Dolby Pro-Logic II processes a stereo (2-channel) recording to create multichannel information, giving any two-channel recording (like a regular audio CD) a more spacious and enveloping feel. Receivers with built-in Dolby Pro-Logic II capabilities are available from manufacturers like Blaupunkt, Clarion, JVC, Kenwood, and Rockford Fosgate.
The ability to receive a digital signal on a car receiver holds great potential for the future of car audio, with 5.1 digital broadcasts just around the corner. XM Satellite Radio and Neural Audio have developed SEE (Spatial Environment Engine), introduced in 2004. This digital signal processing technology allows 2-channel broadcasts with special “watermark” information to reproduce a convincing 5.1 channel surround sound experience when used in conjunction with a decoder.
iBiquity Digital, the developer of HD Radio technology, and SRS Labs are collaborating to develop multichannel broadcast technology.
Fraunhofer IIS and Agere Systems developed a multichannel MP3 format that produces 5.1 sound at bitrates comparable to those used today to encode stereo sound in MP3 format. In addition to offering multichannel sound at low bitrates, the MP3 Surround format is also fully backwards compatible with all existing MP3 players.
MP3 Surround technology encodes multichannel sound by transmitting a stereo audio signal that carries a compatible stereo down mix of the multichannel material. The multichannel sound image is created by additional side information that characterizes the spatial distribution and attributes of the sound. Since the channel information is not discrete the format cannot be compared to Dolby Digital multichannel audio.
The most obvious solution for multichannel, low-bitrate MP3 audio is in network distributed movies. MP3 Surround bit streams are played back as high-quality stereo sound on all current MP3 players. When decoded by MP3 Surround decoders, however, the new format produces full 5.1 channel surround sound. A user can play back both stereo or surround music from the same MP3 Surround files.
[ Current Articles | Archives ]