From Harry Potter@21:1/5 to All on Mon May 25 16:45:27 2020
Hi! I just looked at the Wikipedia article "Prediction by partial matching" and like what I hear. I believe I get the gist of it: shorten each character to the likelihood that it will occur after the previous character. Now, I have an idea to shorten
the code:
1. Scan the input for all strings and count the number of occurrences of each character after the last.
2. Scan the occurrences and write, for each preceding character, the 16 most-often-occurring following characters. 16 could be any useful number.
3. Scan the input again and, for each character stored as likely-occurring, write 1 then the shortened count. Otherwise, write 0 then a fraction of the entry skipping the stored values over the total non-stored values. I have a way to shorten these
values in bit streams.
Now, this requires a lot of memory: the counts alone would require 256k and, therefore, I deem it a 32-bit technique. I am currently working on 8- and 16-bit compression technique. I plan to do 32- and 64-bit compression at a later date. Unless, of
course, I can shorten the buffer to include only the often-occurring values. :)