Forum: >>> Magnum BBS <<<

A software for combining text files to obtain high quality pseudo-rando

From Mok-Kong Shen@21:1/5 to All on Sun Jul 9 12:08:50 2017

An estimate of entropy of English texts is 1.34 bits per letter [1]. This implies that, if the letters are coded into 5 bits, one needs to
appropriately
combine 4 text files in order to obtain bit sequences of full entropy, since 4*1.34 = 5.36 > 5. The method used in our software is to sum (mod 32)
the coded
values of a-z (mapped to 0-25) as 5 bits of the corresponding letters of
the
text files.

There are plenty of other schemes for obtaining high quality pseudo-random sequences in practice, e.g. AES in counter mode. However our scheme seems to
be much simpler both in the underlying logic (understandability) and in implementation and is thus a viable alternative that one could use/need
under
circumstances.

The software, TEXTCOMBINE-SP, is available at mok-kong-shen.de

[1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
Entropy of
English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.

M. K. Shen

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From William Unruh@21:1/5 to Mok-Kong Shen on Sun Jul 9 16:56:36 2017

On 2017-07-09, Mok-Kong Shen <[email protected]> wrote:

An estimate of entropy of English texts is 1.34 bits per letter [1]. This implies that, if the letters are coded into 5 bits, one needs to appropriately
combine 4 text files in order to obtain bit sequences of full entropy, since 4*1.34 = 5.36 > 5. The method used in our software is to sum (mod 32)
the coded
values of a-z (mapped to 0-25) as 5 bits of the corresponding letters of
the
text files.

That is a very bad estimate-- it is basically the estimate of the
entropyif you pick one letter out at random from the text file. It does
NOT take into account correlations between the letters, of which there
are loads and loads. Ie, if you pick three letters in sequence, there is
high probability that they are correlated, which would be disasterous
for a pseudo random number generator. Also, text is an extremely biased
source. Eg, in English the letter z occurs with a somewhat different
frequency than e. Exactly why you woud want to do
what you do is entirely unclear since there are lots of extremely good
pseudo random number generators out there--ones not based on a half
assed theory

There are plenty of other schemes for obtaining high quality pseudo-random sequences in practice, e.g. AES in counter mode. However our scheme seems to be much simpler both in the underlying logic (understandability) and in implementation and is thus a viable alternative that one could use/need
under
circumstances.

It is NOT viable, unless you want a complete cockup of a random number generator

The software, TEXTCOMBINE-SP, is available at mok-kong-shen.de

[1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
Entropy of
English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.

M. K. Shen

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mok-Kong Shen@21:1/5 to All on Mon Jul 10 06:56:27 2017

Am 09.07.2017 um 18:56 schrieb William Unruh:

On 2017-07-09, Mok-Kong Shen <[email protected]> wrote:

An estimate of entropy of English texts is 1.34 bits per letter [1]. This
implies that, if the letters are coded into 5 bits, one needs to
appropriately
combine 4 text files in order to obtain bit sequences of full entropy, since >> 4*1.34 = 5.36 > 5. The method used in our software is to sum (mod 32)
the coded
values of a-z (mapped to 0-25) as 5 bits of the corresponding letters of
the
text files.

That is a very bad estimate-- it is basically the estimate of the
entropyif you pick one letter out at random from the text file. It does
NOT take into account correlations between the letters, of which there
are loads and loads. Ie, if you pick three letters in sequence, there is
high probability that they are correlated, which would be disasterous
for a pseudo random number generator. Also, text is an extremely biased source. Eg, in English the letter z occurs with a somewhat different frequency than e. Exactly why you woud want to do
what you do is entirely unclear since there are lots of extremely good pseudo random number generators out there--ones not based on a half
assed theory

Note that Shannon, who introduced the concept entropy, did similar
works. Cover and King did only a work following him. Cover wrote a
book on information theory. I suppose he knew what he did. Note,
further, my example contains a test of the resulting byte sequence with Maurer's test and that test is ok. The other points you raised are dealt
with in my OP (and quoted by you here).

M. K. Shen

There are plenty of other schemes for obtaining high quality pseudo-random >> sequences in practice, e.g. AES in counter mode. However our scheme seems to >> be much simpler both in the underlying logic (understandability) and in
implementation and is thus a viable alternative that one could use/need
under
circumstances.

It is NOT viable, unless you want a complete cockup of a random number generator

The software, TEXTCOMBINE-SP, is available at mok-kong-shen.de

[1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
Entropy of
English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.

M. K. Shen

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mok-Kong Shen@21:1/5 to All on Fri Jul 14 17:40:12 2017

I am extremely sorry to say that I was unfortunately misled by some
erroneous
computations in the design stage such that I like to retract this software (instead of attempting certain more complicated redesign) and sincerely ask
for pardon from readers of this thread for having wasted their precious
time.

M. K. Shen

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From William Unruh@21:1/5 to Mok-Kong Shen on Fri Jul 14 15:55:57 2017

On 2017-07-14, Mok-Kong Shen <[email protected]> wrote:

I am extremely sorry to say that I was unfortunately misled by some
erroneous
computations in the design stage such that I like to retract this software (instead of attempting certain more complicated redesign) and sincerely ask for pardon from readers of this thread for having wasted their precious
time.

Excellent. Thanks for admitting it.

M. K. Shen

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Wed Jul 29 22:26:45 2026
  from Wales, Uk via Telnet
- Zenobyte
  Wed Jul 29 21:08:05 2026
  from San Juan, Pr via Telnet
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (3 / 13)
Uptime:	75:00:05
Calls:	12,450
Calls today:	5
Files:	15,194
Messages:	6,537,659

A software for combining text files to obtain high quality pseudo-rando

Who's Online

Recent Visitors

System Info