Freedom of Implementation
Interestingly enough, the
MP3 specification (ISO 11172-3) does not specify exactly how the
encoding is to be accomplished. Rather, it outlines techniques and
specifies a level of conformance; in other words, it tells developers that
their resulting MP3 files must meet certain structural criteria.[10]
This is necessary for the same reason that any standard exists: To allow
for the proliferation of MP3 encoders and players by various vendors and
developers. The specification only serves to guarantee a baseline
consensus in the community regarding how certain things will operate. An
encoder developed according to the MP3 specification will be capable of
outputting a "compliant bitstream" that can be played
successfully with any MP3-compliant decoder, just as you can create a JPEG
file in any image editor under any operating system and expect it to
display properly in any JPEG-compliant image viewer on any operating
system.
It's important to maintain the distinction
between the primary developers of the codec itself, The
Fraunhofer Institute, and the committee that codified the work of
Fraunhofer into the MPEG-I Layer 3 specification, the
International Standards Organization (ISO). Standards are often created
this way: A company produces a technology, other companies apply to become
a part of the standards-creation process, and together they lay down the
laws of implementation so that all vendors can compete around that
technology. Note, however, that just because MP3 has been standardized by
ISO does not mean that Fraunhofer (and their partners Thomson Multimedia)
don't still hold the patent on the technology itself. As you'll see in
Chapter 7, The Not-So-Fine-Print: Legal Bits and Pieces,
Fraunhofer's patent is being aggressively exercised, making it difficult
for small-time developers to affordably implement the ISO standard.
In any case, while the standard specifies
exactly how decoding is to be accomplished, it only provides sample
implementations (one simple and one complex) for encoding. As a
result, there's a certain degree of headroom available for developers to
make up some of the rules as they go along. In general, encoder developers
work toward two goals: speed and quality. While there is some difference
in the quality of audio files output by various encoders (as you'll see in
Chapter 5), there are vast differences in the speed at which encoders
operate. Sometimes, encoding speed comes at a distinct disadvantage to the
quality of the resulting bitstream, though this is not necessarily the
case.
A good example of the kind of freedom left
to developers is the fact that the MP3 standard does not specify exactly
how to treat the upper end of the spectrum, above 16kHz. Since human
auditory perception begins to diminish greatly (with age and exposure to
loud volumes) between 16kHz and 20kHz, some developers have historically
chosen to simply chop off frequencies above 16kHz, which can be beneficial
at low bitrates, since it leaves more bits available for encoding more
audible frequencies. Xing, for example, did this with the first versions
of their very fast codec. Later, they rewrote their codec to handle
frequencies up to 20kHz (probably at the behest of the audiophile MP3
community).
NOTE
If you're curious about the upper and
lower thresholds of your own
hearing, download a sine wave generation program for your platform and
run some tests. If you find a graphical program, you can simply turn the
dial or drag the slider up the frequency spectrum until you can no
longer hear it. If the program works from the command line, you can
either generate sweep frequencies or generate a series of files at
different frequencies at the upper end of the range and play them in
sequence. BeOS and Linux users should check out a utility called sinus,
while users of any platform can generate pure tones through any of the
many simple synthesizer programs available at your favorite software
library. The potential problem with running this kind of test lies in
the fact that your playback hardware may itself not be capable of
reproducing frequencies above, say, 17kHz. A test like this is best
conducted on the highest quality equipment you can find.
Other Considerations
In addition to the general principles
outlined in this chapter, the MP3 codec does a lot of additional work
maintaining frequency tables, storing and allocating bits optimally,
handling user options set at encode time, and the like. While we don't
cover everything the encoder is responsible for exhaustively, here are a
few of the more important additional chores the encoder must tackle.
Dipping into the
reservoir
Because the bitrate is taken into
consideration at every time frame, there will inevitably be certain frames
of such complexity that they cannot be adequately coded to adhere to the
limitations imposed by the chosen bitrate. In such a case, the MP3 spec
allows for a "reservoir of bytes," which acts as a sort of
overflow buffer when the desired amount of data cannot be stored in the
given timeframe. In actual practice, this reservoir is not a separate
storage space in the file, but rather the "empty space" left
over in frames where the necessary information was encoded into the
available space with room to spare. In other words, the
byte reservoir is a portion of the algorithm designed to rob Peter and pay
Paul.
While the CD and DAT audio formats
typically offer 16 bits of resolution, the processing of a very complex
musical passage may result in only four or six bits of resolution being
encoded into the final bitstream,[11]
since there isn't enough storage space allocated to handle the data needs
of each frame. What can't be drawn from the reservoir will simply result
in an audible degradation of the signal quality. Thus, the byte reservoir
is only a partial solution to the loss of signal quality in complex
passages. The only real solution to quality loss is to encode the signal
at a higher bitrate.
The joint stereo effect
Most people have had an opportunity at some point to listen to a stereo
system with a separate subwoofer attached (in fact, most better-quality
computer speaker systems consist of two or four satellite speakers and a
separate subwoofer). And as you may have noticed, the placement of
satellite speakers is critical to high-quality audio reproduction, whereas
the placement of the subwoofer is almost entirely irrelevant-people stuff
subwoofers under desks, behind couches, or integrate them with other
pieces of living room furniture. The reason it's possible to do this
without affecting sound quality is because the human ear is largely
insensitive to the location of the source of sounds at the very low and
very high ends of the frequency spectrum.
The MP3 spec optionally exploits this
aspect of human psychoacoustics as well. A file being encoded in stereo is
by definition twice as large as a monophonic file. However, this file size
doubling effect can be somewhat mitigated by combining high frequencies
across the left and right tracks into a single track. This is done during
the encoding phase by selecting the "joint stereo" option in the
encoder's preferences, or by passing an appropriate command-line option to
the encoder (there are actually several subtle differences between the
various joint stereo encoding "modes"-more on that in Chapter
5). Since you might not be able to tell which speaker very high signals
are emanating from anyway, there may be no point in storing that data
twice.
NOTE
Some hard-core audiophile tweaks claim
that bass sounds are not entirely nondirectional, only that they're less
so than mid- and high-frequency sounds. Listeners with ears trained this
well are probably not much interested in MP3 to begin with, but those
listeners might be able to tell the difference in
high-frequency spatialization when comparing MP3 to unencoded audio.
When the
joint stereo option is enabled, a certain amount of "steering
information" is added to the file so that these sounds can be placed
spatially with some approximation of accuracy during
playback. This becomes especially important at the upper edge of the bass
spectrum, where the ear becomes more sensitive to the spatial location of
bass signals. Joint stereo (in "Intensity" mode) really is a
low-fi solution best reserved for situations where you need to keep file
size at an absolute minimum.
WARNING
The joint stereo option can in some
instances introduce audible compression artifacts which can't be
removed by increasing the bitrate. The only way to find out whether this
is a problem for you is to experiment. If you don't like the results,
re-encode without joint stereo enabled. Remember: Your ears don't lie.
Side Information
If joint stereo is used in
M/S (middle/side) mode, the left and right channels aren't encoded
separately. Instead, a "middle" channel is encoded as the sum
of the left and right channels, while a "
side" channel is stored as the difference between the left and the
right. During the decoding process,
side information is read back out of the frame and applied to the
bitstream so that the original signal can be reconstructed as accurately
as possible. The side information is essentially a set of instructions
on how the whole puzzle should be re-assembled on the other end.
Next: Who
Defines "Imperceptible"? |