On 2017-07-09, Mok-Kong Shen <
[email protected]> wrote:
An estimate of entropy of English texts is 1.34 bits per letter [1]. This implies that, if the letters are coded into 5 bits, one needs to appropriately
combine 4 text files in order to obtain bit sequences of full entropy, since 4*1.34 = 5.36 > 5. The method used in our software is to sum (mod 32)
the coded
values of a-z (mapped to 0-25) as 5 bits of the corresponding letters of
the
text files.
That is a very bad estimate-- it is basically the estimate of the
entropyif you pick one letter out at random from the text file. It does
NOT take into account correlations between the letters, of which there
are loads and loads. Ie, if you pick three letters in sequence, there is
high probability that they are correlated, which would be disasterous
for a pseudo random number generator. Also, text is an extremely biased
source. Eg, in English the letter z occurs with a somewhat different
frequency than e. Exactly why you woud want to do
what you do is entirely unclear since there are lots of extremely good
pseudo random number generators out there--ones not based on a half
assed theory
There are plenty of other schemes for obtaining high quality pseudo-random sequences in practice, e.g. AES in counter mode. However our scheme seems to be much simpler both in the underlying logic (understandability) and in implementation and is thus a viable alternative that one could use/need
under
circumstances.
It is NOT viable, unless you want a complete cockup of a random number generator
The software, TEXTCOMBINE-SP, is available at mok-kong-shen.de
[1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
Entropy of
English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.
M. K. Shen
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)