Notes on "Lossiness"
Compression formats, whether they operate on
audio, video, images, or random collections of files, are either
lossless or
lossy. The distinction is simple: Lossless formats are identical to the
original(s) after being decompressed, while lossy formats are not. A good
example of a lossless compression format is the ubiquitous .
zip archiving scheme. When you unpack a zip archive containing a backup of
your system from last month, losing even a single byte is unacceptable.
However, some types of data can withstand having information thrown away, on
the grounds that either you'll never notice what's missing, or you're willing
to make a compromise: Smaller files in exchange for missing but unimportant
data.
A good example of a lossy compression format is
JPEG, which banks on the fact that image files often store more information
than necessary to display an image of acceptable quality. By throwing away
some of the information, and by encoding redundant information with
mathematical algorithms, excellent compression ratios can be achieved for
images that don't need to be displayed at high resolutions.
While the JPEG analogy doesn't depict the MP3
compression process accurately, it does illustrate the concept of lossiness,
and it's important to understand that all MP3 files, no matter how
well-encoded, have discarded some of the information that was stored in the
original, uncompressed signal.
Many lossy compression formats work by scanning
for redundant data and reducing it to a mathematical depiction which can be
"unpacked" later on. Think for a moment of a photograph depicting a
clear blue sky, and below it a beach. If you were to scan and store this image
on your hard drive, you could end up storing hundreds of thousands of pixels
of perfect blue, all identical to one another, and therefore redundant. The
secret of a photographic compression method like
GIF is that this redundant information is reduced to a single description.
Rather than store all the bits individually, they may be represented as the
mathematical equivalent of "repeat blue pixel 273,000 times." When
the part of the image depicting the sand is encountered, the sand is analyzed
for redundancy and similar reductions can be achieved. This is why simple
images can be stored as small files, while complex images don't compress as
well-they contain less redundancy. On the other hand, JPEG compression works
in accord with user-defined "tolerance thresholds"; determining how
similar two adjacent pixels (or, more accurately, frequencies) have to be
before they're considered redundant with one another is the key to determining
the degree of lossiness. If JPEG compression is set high, light blue and
medium blue pixels may be treated as being redundant with one another. If JPEG
compression is set low, the codec will be more fussy about determining which
pixels are redundant. The end result will be a clearer picture and a larger
image file.
Next: Masking
Effects |