Yahoo Groups archive

Digital BW, The Print

Index last updated: 2026-04-28 22:56 UTC

Message

Re: File compression - was [Digital BW] Computing power

2004-12-06 by Anthony G. Atkielski

ccolbertbw writes:

> It is not just an issue of speed. There is an information theory
> theorem upon which the Lempel-Ziv compression algorithm is based. It
> says that for arbitrary data, there is an optimal compression. Once
> files get pretty big - and just about anything that we would care to
> compress counts as pretty big - on average you can't do better than
> Lempel-Ziv coding. That's what is used in a standard zip and unix
> compress files.

Those coding methods are hardly optimal.  Optimal compression requires a
great deal of analysis of the data, usually more than the compression
achieved would be worth.  There's a tremendous amount of redundancy in
most image files, but removing it efficiently without losing anything
requires considerable time, and it's enormously dependent on the exact
image content.  Generalized compression schemes compromise to achieve
reasonable compression in reasonable time.

> The best you can do lossless is usually about 2-fold compression.

There isn't any way to say with certainty how much lossless compression
is possible in a general way.  But most lossless, generalized
compression algorithms today achieve no more than about 50% compression.
Future algorithms may do better, but we aren't likely to ever see 90%
compression, simply because images may not truly be 90% redundant (most
of the time).

> That is why we accept lossy coding for images.

One reason we accept lossy compression is that images are very
redundant.  We don't have an efficient way of removing all this
redundancy losslessly, but we can remove it in lossy compression and
depend upon the human brain to fill in the blanks upon decompression,
with fairly good results.

In any case, I think the future is not in more efficient compression,
but instead in greater bandwidth and storage capacity.

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.