Forum: >>> Magnum BBS <<<

More efficient data structure

From Cecil Westerhof@21:1/5 to All on Sat Mar 12 17:09:23 2022

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
simulate rolling 2 dices seven times and registering the value of each
roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to All on Sat Mar 12 18:43:08 2022

Each roll gives a value from 2-12 (inclusive), so there are 11 possible values. For a sequence of 7 rolls there are 11**7 = 19487175 possible sequences
Store the counts for a sequence of seven rolls in a list

#initalize list to 0

set rolls [lrepeat [expr 11**7] 0]

for {set i 0} {$i<25000000} {incr i} {
set id 0
for {set j 0} {$j<7} {incr j} {
# index into the list by (...(r1*11+r2)*11+...+r6)*11+r7
# subtract 2 from total of two dice for use as an index
set id [expr {$id*11 + [math::random 6]+[math::random 6]}]
}
lset rolls $id [expr {1+[lindex $rolls $id]}]
}

# count nonzero list entries
set nz 0
foreach r $rolls {
incr nz [expr {0 != $r}]
}
puts "$nz are not zero"

rbc/BLT or VecTcl vectors might use even less memory

Dave B

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Manuel Collado@21:1/5 to All on Sat Mar 12 19:27:34 2022

El 12/03/2022 a las 17:09, Cecil Westerhof escribi�:

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
simulate rolling 2 dices seven times and registering the value of each
roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

Each roll of 2 dices gives a value 1-12. You can consider each value as
an hexadecimal digit and pack the result of 7 rolls as a single 7 digit hexadecimal value. This would reduce the memory amount by a factor of 7.
At the cost of unpacking and retrieving the individual values when needed.

Just a naive idea.
--
Manuel Collado - http://mcollado.z15.es

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cecil Westerhof@21:1/5 to Manuel Collado on Sat Mar 12 23:42:21 2022

Manuel Collado <[email protected]> writes:

El 12/03/2022 a las 17:09, Cecil Westerhof escribió:

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
simulate rolling 2 dices seven times and registering the value of each
roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

Each roll of 2 dices gives a value 1-12. You can consider each value as
an hexadecimal digit and pack the result of 7 rolls as a single 7 digit hexadecimal value. This would reduce the memory amount by a factor of 7.
At the cost of unpacking and retrieving the individual values when
needed.

My info was not correct. :'-( After the rolls 8 GB was used. I then
use the data structure and the memory usage grew to 9.3 GB.

I rewrote it to:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
set roll [expr {[::math::random 1 7] + [::math::random 1 7]}]
switch $roll {
10 { set roll A }
11 { set roll B }
12 { set roll C }
}
append diceRoll $roll
}
dict incr diceRollDictSeq ${diceRoll}
}

Now the memory used after creating the data-structure is 2.75 GB. So
about a third (not a seventh). The needed time increases with a
quarter. That is acceptable I think.
When using the data-structure the memory usage becomes 4 GB. So the
increase is about the same. I would have expected it to also be a
third of what it was before.

The data-structure is used like this:
proc displayBest {diceRollDict} {
puts "Element count: [dict size ${diceRollDict}] distinct values"
set diceRollList {}
dict for {diceRoll count} ${diceRollDict} {
lappend diceRollList [list ${diceRoll} ${count}]
}
set diceRollList [lsort -index 1 -integer -decreasing ${diceRollList}]
set first [lindex [lindex ${diceRollList} 0] 1]
set second [lindex [lindex ${diceRollList} 1] 1]
set percFirst [expr {100. * ${first} / ${::nrOfTries} }]
set percGreater [expr {100. * ${first} / ${second} - 100}]
puts "First has ${percFirst}% chance"
puts "Chance first ${percGreater}% greater as second"
foreach diceRollElement [lrange ${diceRollList} 0 ${::nrOfResults}] {
set diceRoll [lindex ${diceRollElement} 0]
set count [lindex ${diceRollElement} 1]
puts "[split ${diceRoll} ""]: ${count}"
}
}

Why is the memory usage a third instead of a seventh?
Why does the usage of the smaller data-structure uses the same amount
of extra memory as the greater data-structure?

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to All on Sun Mar 13 05:29:45 2022

Take a look at the Tcl_NewObj man page (and any Tcl book that explains the C interface)

El 12/03/2022 a las 17:09, Cecil Westerhof escribió:

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
simulate rolling 2 dices seven times and registering the value of each
roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

Each roll of 2 dices gives a value 1-12. You can consider each value as
an hexadecimal digit and pack the result of 7 rolls as a single 7 digit
hexadecimal value. This would reduce the memory amount by a factor of 7.
At the cost of unpacking and retrieving the individual values when
needed.

My info was not correct. :'-( After the rolls 8 GB was used. I then
use the data structure and the memory usage grew to 9.3 GB.

I rewrote it to:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}

consider using
set diceRoll 0

for {set i 0} {$i < ${nrOfRolls}} {incr i} {
set roll [expr {[::math::random 1 7] + [::math::random 1 7]}]
switch $roll {
10 { set roll A }
11 { set roll B }
12 { set roll C }
}
append diceRoll $roll

consider using:
set diceRoll [expr {$diceRoll<<4 + $roll}]

the dict key will now be an integer (saved directly in the dict key's Tcl_Obj)

}
dict incr diceRollDictSeq ${diceRoll}
}

Now the memory used after creating the data-structure is 2.75 GB. So
about a third (not a seventh). The needed time increases with a
quarter. That is acceptable I think.
When using the data-structure the memory usage becomes 4 GB. So the
increase is about the same. I would have expected it to also be a
third of what it was before.

The data-structure is used like this:
proc displayBest {diceRollDict} {
puts "Element count: [dict size ${diceRollDict}] distinct values"
set diceRollList {}
dict for {diceRoll count} ${diceRollDict} {
lappend diceRollList [list ${diceRoll} ${count}]
}
set diceRollList [lsort -index 1 -integer -decreasing ${diceRollList}]

The dict can appear as a list of keys and values.
consider replacing everything after puts "Element..." above with:

set diceRollList [lsort -stride 2 -index 1 -integer -decreasing $diceRollDict]

set first [lindex [lindex ${diceRollList} 0] 1]

set first [lindex $diceRollList 1]

set second [lindex [lindex ${diceRollList} 1] 1]

set second [lindex $diceRollList 3]

set percFirst [expr {100. * ${first} / ${::nrOfTries} }]
set percGreater [expr {100. * ${first} / ${second} - 100}]
puts "First has ${percFirst}% chance"
puts "Chance first ${percGreater}% greater as second"
foreach diceRollElement [lrange ${diceRollList} 0 ${::nrOfResults}] {
set diceRoll [lindex ${diceRollElement} 0]
set count [lindex ${diceRollElement} 1]

if the "lsort -stride 2" is used above, the foreach and lindexes become:
foreach {diceRoll count} $diceRollList {

puts "[split ${diceRoll} ""]: ${count}"
}
}

Why is the memory usage a third instead of a seventh?

Each entry in a dict has two Tcl_Obj structures (key and value)
The value is a number, which is stashed within the Tcl_Obj

The key in the first method was a list of 7 numbers using 8 Tcl_Obj (one for list, 7 for list entries which are numbers, plus memory allocated for an array of pointers to the list entries

The key in the second method is a string, which requires a Tcl_Obj and memory allocated for a 7 char string.

If the key is saved as an actual number instead of a string, the number will be stored directly in the key"s Tcl_Obj.

each dict entry (key and value) needs:
list of 7 numbers: 9 Tcl_Obj + memory allocated for pointer array, = 10 bits of memory
hex string: 2 Tcl_Obj + memory allocated for 7 char string = 3 bits of memory key is a number: 2 Tcl_Obj = 2 bits of memory

Looking at the Tcl_NewObj man page, a Tcl_Obj might take about 48 bytes on a 64 bit system (just my guess, I did not actually measure the size with sizeof). The point is the Tcl_Obj is larger than the values (strings and numbers), so their memory can
dominate the total memory use. I'm not sure how much memory Tcl allocates for list poiner arrays and strings. Perhaps the minimum allocation size is the size of a Tcl_Obj (this could reduce memory fragmentation?).If so this hints at the 3:1 reduction
instead of 7:1, and indicates you could get 4:1 or 5:1 using numeric keys (compared to the list key).

None of this includes the overhead for the dict hash table (which is at least one pointer per entry).

Why does the usage of the smaller data-structure uses the same amount
of extra memory as the greater data-structure?

perhaps this is the dict hash table overhead?

Dave B

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From heinrichmartin@21:1/5 to Cecil Westerhof on Sun Mar 13 03:06:53 2022

On Saturday, March 12, 2022 at 5:14:07 PM UTC+1, Cecil Westerhof wrote:

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
simulate rolling 2 dices seven times and registering the value of each
roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

Just an incomplete thought: do you really need all of them in memory? What will you be doing with the data?
Maybe you want to collect aggregates only. Or you could produce data to disk (file, pipe, db, ...) and post-process it from there.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cecil Westerhof@21:1/5 to heinrichmartin on Sun Mar 13 13:28:39 2022

heinrichmartin <[email protected]> writes:

On Saturday, March 12, 2022 at 5:14:07 PM UTC+1, Cecil Westerhof wrote:

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
simulate rolling 2 dices seven times and registering the value of each
roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

Just an incomplete thought: do you really need all of them in memory?
What will you be doing with the data?
Maybe you want to collect aggregates only. Or you could produce data to
disk (file, pipe, db, ...) and post-process it from there.

I do not know what the next throw will become, so that is a bit
difficult I am afraid.
I would like to go to 10 sets of throws. But at the moment I cannot
go beyond 7. With a better data-structure I could go (I hope) to 8.
For 9 and 10 I have to do something like you suggest, but that will
make the performance go down.

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to Cecil Westerhof on Mon Mar 14 00:18:25 2022

On 3/12/22 11:09 AM, Cecil Westerhof wrote:

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
simulate rolling 2 dices seven times and registering the value of each
roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

Hello,

Interesting problem. My recommendation would be to use an array instead
of a dict which will save you in both in time and space. Here is a
slightly modified version of your code using arrays:

array set diceRollDictSeq {}
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set temp {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
## you can map 10/11/12 to A/B/C here or later when accessing the values
lappend temp [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
incr diceRollDictSeq([join $temp ,])
}

I wasn't sure what to expect so I ran it for 10 million and 7
(inner/outer loop limits). On a Windows laptop, this ran for 6 minutes
and consumed 950MB of memory (while I was busy browsing). It resulted
in 5,346,039 unique rolls which is about a quarter of the total solution
space.

You can generate the same output report as your other post indicates
with minor changes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bezoar@21:1/5 to [email protected] on Thu Mar 17 18:12:31 2022

On Sunday, March 13, 2022 at 11:18:33 PM UTC-5, [email protected] wrote:

On 3/12/22 11:09 AM, Cecil Westerhof wrote:

I have a program where I simulate throwing 2 dices several times a
number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I simulate rolling 2 dices seven times and registering the value of each roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of
RAM significantly by using another data-structure?

Hello,

Interesting problem. My recommendation would be to use an array instead
of a dict which will save you in both in time and space. Here is a
slightly modified version of your code using arrays:

array set diceRollDictSeq {}
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set temp {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
## you can map 10/11/12 to A/B/C here or later when accessing the values lappend temp [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
incr diceRollDictSeq([join $temp ,])
}

I wasn't sure what to expect so I ran it for 10 million and 7
(inner/outer loop limits). On a Windows laptop, this ran for 6 minutes
and consumed 950MB of memory (while I was busy browsing). It resulted
in 5,346,039 unique rolls which is about a quarter of the total solution space.

You can generate the same output report as your other post indicates
with minor changes.

You can use a bitfield to hold the values. A bitfield basically holds a 1 or 0 at an index. So the string "001001" as an example would have a value at 2 and 5. So just using a string you can get ( assuming one byte per character and storing a
bit string for each value and each attempt , in your case : 2-12 or 10 possible dice values and 7 attempts we get 70 bit strings ) this gives us 25M x 70 or 1750M or 1.75 Gig . But wait we know that a byte is made up of 1's and 0's so each one byte
character can hold 8 values. So using bitwise operators we can cut the memory usage by 8 times or 1750 / 8 = 218.75M! which is significantly smaller than 9 G. But wait! we can also "run length encode" the string where any length of ones and zeros can
be compressed to a number and a value. For example, a string of all 1's 32 long can be compressed to "32 1" so we can do the same and depending on your data can reduce the memory significantly. In the implementation below ( I have objectified what I
borrowed from the tcl wiki ) we don't implement the run length encoding ; we leave that to the student as they say. I made timings for the implmentation below .

tries attempts memory real
-------- -------- -------- --------
1000 7 7k 0.078s
10000 7 70k 0.382s
100000 7 700k 3.770s
1000000 7 7M 79.58s

I would expect that for a 10 times increase in tries we'd get a 10 times increase in time spent but as we see after 100000 the time needed to seek to the correct spot causes significant slow down. To speed it up one may want to split up the string into
several smaller strings especially when you pre-allocate then its just math to get the correct index into the correct string to update the value.

oo::class create Bitfield {
# get and set taken from https://wiki.tcl-lang.org/page/bitstrings
variable data
variable maxposition
variable lastfree

constructor { { numbits -1 } } {
set data ""
set maxposition $numbits
set lastfree 0
if { $numbits >= 0 } {
# prealloc space
set numbytes [ expr { $numbits / 8 + (( $numbits % 8 ) ? 1 : 0 ) } ]
set data [ string repeat "\0" $numbytes ]
set maxposition [ expr $numbytes * 8 ]
}
}
method alloc { } {
set pos $lastfree
try {
while { [ my get $pos ] == 1 && ( $maxposition == -1 || $pos < $maxposition ) } {
incr pos
}
} trap BAD_INDEX { a } {
throw CAPACITY_ERROR " unable to allocate any more blocks!"
}

set lastfree [ expr $pos + 1 ]
while { [ my get $lastfree ] && $lastfree != $maxposition } { incr lastfree }
my set $pos 1
#puts "allocated block $pos"
return $pos
}

method free { pos } {
my set $pos 0
set lastfree $pos
}

method set { position bit } {
#puts "Bitfield: set $position -> $bit "
if { (! [string is integer $position]) || ($position < 0) } {
throw BAD_INDEX "position must be an integer >= 0"
}
if { ($bit ne "0") && ($bit ne "1") } {
throw BAD_VALUE "bit must be '0' or '1'"
}
set block [ expr { $position / 8 } ]
set bitnum [ expr { $position % 8 } ]
set byteval "00000000"
binary scan $data @${block}b8 byteval

# Compute new string of zeros and ones
set newbyte [string range $byteval 0 [expr $bitnum-1]]$bit[string range $byteval [expr $bitnum+1] end]

# Convert back into binary format
set data [binary format a*@${block}b8 $data $newbyte]
}

method toggle { position } {
set value [ my get $position ]
my set $position [expr { ( $value + 1 ) % 2 } ]
}
method get { position } {

if { (! [string is integer $position]) || ($position < 0) } {
throw BAD_INDEX error "position must be an integer >= 0"
}
if { $maxposition >= 0 && $position >= $maxposition } {
throw BAD_INDEX "position must be an integer < $maxposition"
}
# Compute byte and bit offsets
set block [ expr { $position / 8 } ]
set bitnum [expr $position % 8]

# Scan the btye from the string
# Default value is used if scan fails
set byteval "00000000"
binary scan $data @${block}b8 byteval
# return the interesting bit
set value [string range $byteval $bitnum $bitnum]
return $value
}
method setData { buffer } {
set data $buffer
}
method getData { } {
return $data
}
method length { } {
return [string length $data];
}
method capacity { } {
return [expr { [string length $data ] * 8 } ]
}
method toString { } {
set retval "[format "%10i " [my offset]]"
foreach c [split $data "" ] {
binary scan $c b8 x
append retval "$x"
}
return $retval
}
method offset {} {
return [ expr {[my length] + 11 }]
}
destructor {}
}
package provide Bitfield 1.0

if { 0 } {
set b [ Bitfield new ]
$b set 4 1
$b set 5 1
$b toggle 6
$b toggle 4
$b set 25 1
for { set i 0 } { $i < 30 } { incr i } {
puts "$i . '[$b get $i ]'"
}
puts "length : [$b length ]"
puts "capacity : [$b capacity ]"
}
# your code starts here
package require math ; # from tcllib
set nrOfTries 10000

set nrOfRolls 7
array set diceRoll {}
for {set j 0} {$j < $nrOfRolls} {incr j} {
for {set k 2 } { $k < 13 } { incr k } {
set diceRoll($j,$k) [Bitfield new ${nrOfTries} ]
}
}
proc rolls { trynum numRolls } {
for {set i 0} {$i < $numRolls} {incr i} {
set x [::math::random 1 $numRolls]
incr x [::math::random 1 $numRolls]
$::diceRoll($i,$x) set $trynum 1
}
}
for {set j 0 } {$j < $::nrOfTries} {incr j} {
rolls $j $::nrOfRolls
if { [ expr { $j % 10000 } ] == 0 } {
puts "$j"
}
}
proc getRolls { tryNum } {
set val {}
for {set j 0} {$j < $::nrOfRolls } {incr j} {
for {set k 2 } { $k < 13 } { incr k } {
if { [$::diceRoll($j,$k) get $tryNum ] eq "1" } {
lappend val $k
}
}
}
return $val
}
proc numTimes { value attempt } {
lassign [$::diceRoll($attempt,$value) toString] capacityLen bitStr
return [string length [string map { "0" "" } $bitStr ]]
}
puts "What were the rolls for 2500th try?"
puts "[join [getRolls 2499] \n]"
puts "How many times did 10 get rolled on the each attempt in $::nrOfTries tries?"
puts " Attempt Num Times "
puts " ======== =========="
for { set m 0 } { $m < 7 } { incr m } {
puts [format "% 7i % 5i" [expr { $m + 1 } ] [numTimes 10 $m ]]
}

- Carl

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bezoar@21:1/5 to Bezoar on Thu Mar 17 18:16:12 2022

On Thursday, March 17, 2022 at 8:12:34 PM UTC-5, Bezoar wrote:

On Sunday, March 13, 2022 at 11:18:33 PM UTC-5, [email protected] wrote:

On 3/12/22 11:09 AM, Cecil Westerhof wrote:

I have a program where I simulate throwing 2 dices several times a number of times.
In the code below nrOfTries is 25 million and nrOfRolls is 7. So I simulate rolling 2 dices seven times and registering the value of each roll 25 million times. The code is:
set diceRollDictSeq [dict create]
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set diceRoll {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
dict incr diceRollDictSeq ${diceRoll}
}

This takes almost 9 GB of RAM. Is there a way to reduce the amount of RAM significantly by using another data-structure?

Hello,

Interesting problem. My recommendation would be to use an array instead
of a dict which will save you in both in time and space. Here is a slightly modified version of your code using arrays:

array set diceRollDictSeq {}
for {set j 0} {$j < ${nrOfTries}} {incr j} {
set temp {}
for {set i 0} {$i < ${nrOfRolls}} {incr i} {
## you can map 10/11/12 to A/B/C here or later when accessing the values lappend temp [expr {[::math::random 1 7] + [::math::random 1 7]}]
}
incr diceRollDictSeq([join $temp ,])
}

I wasn't sure what to expect so I ran it for 10 million and 7
(inner/outer loop limits). On a Windows laptop, this ran for 6 minutes
and consumed 950MB of memory (while I was busy browsing). It resulted
in 5,346,039 unique rolls which is about a quarter of the total solution space.

You can generate the same output report as your other post indicates
with minor changes.

You can use a bitfield to hold the values. A bitfield basically holds a 1 or 0 at an index. So the string "001001" as an example would have a value at 2 and 5. So just using a string you can get ( assuming one byte per character and storing a bit

string for each value and each attempt , in your case : 2-12 or 10 possible dice values and 7 attempts we get 70 bit strings ) this gives us 25M x 70 or 1750M or 1.75 Gig . But wait we know that a byte is made up of 1's and 0's so each one byte character
can hold 8 values. So using bitwise operators we can cut the memory usage by 8 times or 1750 / 8 = 218.75M! which is significantly smaller than 9 G. But wait! we can also "run length encode" the string where any length of ones and zeros can be compressed
to a number and a value. For example, a string of all 1's 32 long can be compressed to "32 1" so we can do the same and depending on your data can reduce the memory significantly. In the implementation below ( I have objectified what I borrowed from the
tcl wiki ) we don't implement the run length encoding ; we leave that to the student as they say. I made timings for the implmentation below .

tries attempts memory real
-------- -------- -------- --------
1000 7 7k 0.078s
10000 7 70k 0.382s
100000 7 700k 3.770s
1000000 7 7M 79.58s

I would expect that for a 10 times increase in tries we'd get a 10 times increase in time spent but as we see after 100000 the time needed to seek to the correct spot causes significant slow down. To speed it up one may want to split up the string into

several smaller strings especially when you pre-allocate then its just math to get the correct index into the correct string to update the value.

oo::class create Bitfield {
# get and set taken from https://wiki.tcl-lang.org/page/bitstrings
variable data
variable maxposition
variable lastfree

constructor { { numbits -1 } } {
set data ""
set maxposition $numbits
set lastfree 0
if { $numbits >= 0 } {
# prealloc space
set numbytes [ expr { $numbits / 8 + (( $numbits % 8 ) ? 1 : 0 ) } ]
set data [ string repeat "\0" $numbytes ]
set maxposition [ expr $numbytes * 8 ]
}
}
method alloc { } {
set pos $lastfree
try {
while { [ my get $pos ] == 1 && ( $maxposition == -1 || $pos < $maxposition ) } {
incr pos
}
} trap BAD_INDEX { a } {
throw CAPACITY_ERROR " unable to allocate any more blocks!"
}

set lastfree [ expr $pos + 1 ]
while { [ my get $lastfree ] && $lastfree != $maxposition } { incr lastfree }
my set $pos 1
#puts "allocated block $pos"
return $pos
}

method free { pos } {
my set $pos 0
set lastfree $pos
}

method set { position bit } {
#puts "Bitfield: set $position -> $bit "
if { (! [string is integer $position]) || ($position < 0) } {
throw BAD_INDEX "position must be an integer >= 0"
}
if { ($bit ne "0") && ($bit ne "1") } {
throw BAD_VALUE "bit must be '0' or '1'"
}
set block [ expr { $position / 8 } ]
set bitnum [ expr { $position % 8 } ]
set byteval "00000000"
binary scan $data @${block}b8 byteval

# Compute new string of zeros and ones
set newbyte [string range $byteval 0 [expr $bitnum-1]]$bit[string range $byteval [expr $bitnum+1] end]

# Convert back into binary format
set data [binary format a*@${block}b8 $data $newbyte]
}

method toggle { position } {
set value [ my get $position ]
my set $position [expr { ( $value + 1 ) % 2 } ]
}
method get { position } {

if { (! [string is integer $position]) || ($position < 0) } {
throw BAD_INDEX error "position must be an integer >= 0"
}
if { $maxposition >= 0 && $position >= $maxposition } {
throw BAD_INDEX "position must be an integer < $maxposition"
}
# Compute byte and bit offsets
set block [ expr { $position / 8 } ]
set bitnum [expr $position % 8]

# Scan the btye from the string
# Default value is used if scan fails
set byteval "00000000"
binary scan $data @${block}b8 byteval
# return the interesting bit
set value [string range $byteval $bitnum $bitnum]
return $value
}
method setData { buffer } {
set data $buffer
}
method getData { } {
return $data
}
method length { } {
return [string length $data];
}
method capacity { } {
return [expr { [string length $data ] * 8 } ]
}
method toString { } {
set retval "[format "%10i " [my offset]]"
foreach c [split $data "" ] {
binary scan $c b8 x
append retval "$x"
}
return $retval
}
method offset {} {
return [ expr {[my length] + 11 }]
}
destructor {}
}
package provide Bitfield 1.0

if { 0 } {
set b [ Bitfield new ]
$b set 4 1
$b set 5 1
$b toggle 6
$b toggle 4
$b set 25 1
for { set i 0 } { $i < 30 } { incr i } {
puts "$i . '[$b get $i ]'"
}
puts "length : [$b length ]"
puts "capacity : [$b capacity ]"
}
# your code starts here
package require math ; # from tcllib
set nrOfTries 10000

set nrOfRolls 7
array set diceRoll {}
for {set j 0} {$j < $nrOfRolls} {incr j} {
for {set k 2 } { $k < 13 } { incr k } {
set diceRoll($j,$k) [Bitfield new ${nrOfTries} ]
}
}
proc rolls { trynum numRolls } {
for {set i 0} {$i < $numRolls} {incr i} {
set x [::math::random 1 $numRolls]
incr x [::math::random 1 $numRolls]
$::diceRoll($i,$x) set $trynum 1
}
}
for {set j 0 } {$j < $::nrOfTries} {incr j} {
rolls $j $::nrOfRolls
if { [ expr { $j % 10000 } ] == 0 } {
puts "$j"
}
}
proc getRolls { tryNum } {
set val {}
for {set j 0} {$j < $::nrOfRolls } {incr j} {
for {set k 2 } { $k < 13 } { incr k } {
if { [$::diceRoll($j,$k) get $tryNum ] eq "1" } {
lappend val $k
}
}
}
return $val
}
proc numTimes { value attempt } {
lassign [$::diceRoll($attempt,$value) toString] capacityLen bitStr
return [string length [string map { "0" "" } $bitStr ]]
}
puts "What were the rolls for 2500th try?"
puts "[join [getRolls 2499] \n]"
puts "How many times did 10 get rolled on the each attempt in $::nrOfTries tries?"
puts " Attempt Num Times "
puts " ======== =========="
for { set m 0 } { $m < 7 } { incr m } {
puts [format "% 7i % 5i" [expr { $m + 1 } ] [numTimes 10 $m ]]
}

- Carl

Forgot the output from the program:
What were the rolls for 25 time?
4
5
10
6
8
11
2
How many times did 10 get rolled on the each attempt in 10000 tries?
Attempt Num Times
======== ==========
1 866
2 858
3 828
4 784
5 873
6 862
7 798

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	52:11:07
Calls:	12,445
Calls today:	5
Files:	15,192
Messages:	6,537,264

More efficient data structure

Who's Online

Recent Visitors

System Info