On Tue, 11 Jun 2019 08:11:11 -0700 (PDT),
[email protected] wrote:
I am trying to understand how LZ77 algorithm work. From what I read from va= >rious sources, I have come to following conclusion:
According to LZ77 compression algorithm, if I encode jump and length using = >4 bits, and character as 8 bits, I will use 16bits for each token. If the t= >ext I am compressing doesn't have any repetition, I will actually double th= >e size of my input.
I was wondering if I had arrived to correct conclusion, because it doesn't = >sound right.
It does sound odd at first, but that's more or less right. It's a
fundamental fact of information theory and applies to all (lossless) compression methods. Basically, over the set of all possible messages
of length N, the average length of the corresponding compressed
messages is also N.
So yes, every method will have certain inputs that produce larger
outputs. The trick to practical compression is to find algorithms that
work well with patterns that you find in certain useful inputs, which
AIUI is how LZ was designed. Then you check as you go, and if you have
a chunk that is anti-compressible, you just store it without
compression.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)