The
Anatomy of an MP3 File
Aside from being familiar with the basic options available to the MP3
encoder, the typical user doesn't need to know how MP3 files are structured
internally any more than she needs to know how JPEG images or Word documents
are structured behind the scenes. For the morbidly curious, however, here's
an x-ray view of the MP3 file format.
Inside the Header Frame
As mentioned earlier, MP3 files are segmented into zillions of frames, each
containing a fraction of a second's worth of audio data, ready to be
reconstructed by the decoder. Inserted at the beginning of every
data frame is a "header frame," which stores 32 bits of meta-data
related to the coming data frame (Figure
2-4). As illustrated in Figure
2-5,[13]
the MP3 header begins with a "
sync" block, consisting of 11 bits. The sync block allows players to
search for and "lock onto" the first available occurrence of a
valid frame, which is useful in MP3 broadcasting, for moving around quickly
from one part of a track to another, and for skipping ID3 or other data that
may be living at the start of the file. However, note that it's not enough
for a player to simply find the sync block in any binary file and assume
that it's a valid MP3 file, since the same pattern of 11 bits could
theoretically be found in any random binary file. Thus, it's also necessary
for the decoder to check for the validity of other header data as well, or
for multiple valid frames in a row. Table
2-1
lists the total 32 bits of header
data that are spread over 13 header positions.
Locking onto the
Data Stream
One of the original design goals of MP3 was that it would be suitable for
broadcasting. As a result, it becomes important that MP3 receivers be able
to lock onto the signal at any point in the stream. This is one of the big
reasons why a header frame is placed prior to each data frame, so that a
receiver tuning in at any point in the broadcast can search for sync data
and start playing almost immediately. Interestingly, this fact
theoretically makes it possible to cut MPEG files into smaller pieces and
play the pieces individually. However, this unfortunately is not possible
with Layer III files (MP3) due to the fact that frames often depend on
data contained in other frames (see "Dipping
into the reservoir,"
earlier). Thus, you can't just open any old MP3 file in your favorite
audio editor for editing or tweaking.
Figure 2-4: Data
describing the structural factors of that frame; this data is called the
frame's "header"
 |
Figure 2-5: The
MP3 frame header represented visually
 |
Table 2-1: The
Thirteen Header Files' Characteristics
| Position |
Purpose |
Length (in Bits) |
| A |
Frame sync |
11 |
| B |
MPEG audio version
(MPEG-1, 2, etc.) |
2 |
| C |
MPEG layer (Layer I,
II, III, etc.) |
2 |
| D |
Protection (if on,
then checksum follows header) |
1 |
| E |
Bitrate index (lookup
table used to specify bitrate for this MPEG version and layer) |
4 |
| F |
Sampling rate
frequency (44.1kHz, etc., determined by lookup table) |
2 |
| G |
Padding bit (on or
off, compensates for unfilled frames) |
1 |
| H |
Private bit (on or
off, allows for application-specific triggers) |
1 |
| I |
Channel mode (stereo,
joint stereo, dual channel, single channel) |
2 |
| J |
Mode extension (used
only with joint stereo, to conjoin channel data) |
2 |
| K |
Copyright (on or off) |
1 |
| L |
Original (off if copy
of original, on if original) |
1 |
| M |
Emphasis (respects
emphasis bit in the original recording; now largely obsolete) |
2 |
| 32 total header bits |
Following the sync
block comes an
ID bit, which specifies whether the frame has been encoded in MPEG-1
or MPEG-2. Two layer bits follow, determining whether the frame is
Layer I, II, III, or not defined. If the
protection bit is not set, a 16-bit checksum will be inserted prior to
the beginning of the audio data.
The
bitrate field, naturally, specifies the bitrate of the current frame
(e.g., 128 kbps), which is followed by a
specifier for the audio frequency (from 16,000Hz to 44,100Hz,
depending on whether MPEG-1 or MPEG-2 is currently in use). The
padding bit is used to make sure that each frame satisfies the bitrate
requirements exactly. For example, a 128 kbps Layer II bitstream at
44.1kHz may end up with some frames of 417 bytes and some of 418. The
417-byte frames will have the padding bit set to "on" (1) to
compensate for the discrepancy.
The
mode field refers to the stereo/mono status of the frame, and allows
for the setting of
stereo, joint stereo, dual channel, and mono encoding options. If
joint stereo effects have been enabled, the mode extension field tells
the decoder exactly how to handle it, i.e, whether high frequencies
have been combined across channels.
The
copyright bit does not hold copyright information per se (obviously,
since it's only one bit long), but rather mimics a similar copyright
bit used on CDs and DATs. If this bit is set, it's officially illegal
to copy the track (some ripping programs will report this information
back to you if the copyright bit is found to be set). If the data is
found on its original media, the
home bit will be set. The "
private" bit can be used by specific applications to trigger
custom events.
The
emphasis field is used as a flag, in case a corresponding
emphasis bit was set in the original recording. Th emphasis bit is
rarely used anymore, though some recordings do still use it.
Finally, the decoder
moves on through the
checksum (if it exists) and on to the actual
audio data frame, and the process begins all over again, with
thousands of frames per audio file.
NOTE
For more details on
the structure of MP3 header frames, including the actual lookup
tables necessary to derive certain details from the bit settings
previously listed, see the Programmer's Corner section at www.mp3-tech.org/.
If you want to go straight to the horse's mouth, start at www.iso.ch.
|
Next: ID3
Space |