The Anatomy of an MP3 File
Aside from being familiar with the basic options available to the MP3 encoder, the typical user doesn't need to know how MP3 files are structured internally any more than she needs to know how JPEG images or Word documents are structured behind the scenes. For the morbidly curious, however, here's an x-ray view of the MP3 file format.
Inside the Header Frame
As mentioned earlier, MP3 files are segmented into zillions of frames, each containing a fraction of a second's worth of audio data, ready to be reconstructed by the decoder. Inserted at the beginning of every
data frame is a "header frame," which stores 32 bits of meta-data related to the coming data frame (Figure 2-4). As illustrated in Figure 2-5, the MP3 header begins with a "
sync" block, consisting of 11 bits. The sync block allows players to search for and "lock onto" the first available occurrence of a valid frame, which is useful in MP3 broadcasting, for moving around quickly from one part of a track to another, and for skipping ID3 or other data that may be living at the start of the file. However, note that it's not enough for a player to simply find the sync block in any binary file and assume that it's a valid MP3 file, since the same pattern of 11 bits could theoretically be found in any random binary file. Thus, it's also necessary for the decoder to check for the validity of other header data as well, or for multiple valid frames in a row. Table 2-1 lists the total 32 bits of header data that are spread over 13 header positions.
Locking onto the Data Stream
One of the original design goals of MP3 was that it would be suitable for broadcasting. As a result, it becomes important that MP3 receivers be able to lock onto the signal at any point in the stream. This is one of the big reasons why a header frame is placed prior to each data frame, so that a receiver tuning in at any point in the broadcast can search for sync data and start playing almost immediately. Interestingly, this fact theoretically makes it possible to cut MPEG files into smaller pieces and play the pieces individually. However, this unfortunately is not possible with Layer III files (MP3) due to the fact that frames often depend on data contained in other frames (see "Dipping into the reservoir," earlier). Thus, you can't just open any old MP3 file in your favorite audio editor for editing or tweaking.
Figure 2-4: Data describing the structural factors of that frame; this data is called the frame's "header"
Figure 2-5: The MP3 frame header represented visually
Table 2-1: The Thirteen Header Files' Characteristics
||Length (in Bits)
||MPEG audio version (MPEG-1, 2, etc.)
||MPEG layer (Layer I, II, III, etc.)
||Protection (if on, then checksum follows header)
||Bitrate index (lookup table used to specify bitrate for this MPEG version and layer)
||Sampling rate frequency (44.1kHz, etc., determined by lookup table)
||Padding bit (on or off, compensates for unfilled frames)
||Private bit (on or off, allows for application-specific triggers)
||Channel mode (stereo, joint stereo, dual channel, single channel)
||Mode extension (used only with joint stereo, to conjoin channel data)
||(on or off)
||Original (off if copy of original, on if original)
||Emphasis (respects emphasis bit in the original recording; now largely obsolete)
|32 total header bits
Following the sync block comes an
ID bit, which specifies whether the frame has been encoded in MPEG-1 or MPEG-2. Two layer bits follow, determining whether the frame is Layer I, II, III, or not defined. If the
protection bit is not set, a 16-bit checksum will be inserted prior to the beginning of the audio data.
bitrate field, naturally, specifies the bitrate of the current frame (e.g., 128 kbps), which is followed by a
specifier for the audio frequency (from 16,000Hz to 44,100Hz, depending on whether MPEG-1 or MPEG-2 is currently in use). The
padding bit is used to make sure that each frame satisfies the bitrate requirements exactly. For example, a 128 kbps Layer II bitstream at 44.1kHz may end up with some frames of 417 bytes and some of 418. The 417-byte frames will have the padding bit set to "on" (1) to compensate for the discrepancy.
mode field refers to the stereo/mono status of the frame, and allows for the setting of
stereo, joint stereo, dual channel, and mono encoding options. If joint stereo effects have been enabled, the mode extension field tells the decoder exactly how to handle it, i.e, whether high frequencies have been combined across channels.
copyright bit does not hold copyright information per se (obviously, since it's only one bit long), but rather mimics a similar copyright bit used on CDs and DATs. If this bit is set, it's officially illegal to copy the track (some ripping programs will report this information back to you if the copyright bit is found to be set). If the data is found on its original media, the
home bit will be set. The "
private" bit can be used by specific applications to trigger custom events.
emphasis field is used as a flag, in case a corresponding
emphasis bit was set in the original recording. Th emphasis bit is rarely used anymore, though some recordings do still use it.
Finally, the decoder moves on through the
checksum (if it exists) and on to the actual
audio data frame, and the process begins all over again, with thousands of frames per audio file.
For more details on the structure of MP3 header frames, including the actual lookup tables necessary to derive certain details from the bit settings previously listed, see the Programmer's Corner section at www.mp3-tech.org/. If you want to go straight to the horse's mouth, start at www.iso.ch.
Next: ID3 Space