• More efficient data structure

    From Cecil Westerhof@21:1/5 to All on Sat Mar 12 17:09:23 2022
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
    simulate rolling 2 dices seven times and registering the value of each
    roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?

    --
    Cecil Westerhof
    Senior Software Engineer
    LinkedIn: http://www.linkedin.com/in/cecilwesterhof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sat Mar 12 18:43:08 2022
    Each roll gives a value from 2-12 (inclusive), so there are 11 possible values. For a sequence of 7 rolls there are 11**7 = 19487175 possible sequences
    Store the counts for a sequence of seven rolls in a list

    #initalize list to 0

    set rolls [lrepeat [expr 11**7] 0]

    for {set i 0} {$i<25000000} {incr i} {
    set id 0
    for {set j 0} {$j<7} {incr j} {
    # index into the list by (...(r1*11+r2)*11+...+r6)*11+r7
    # subtract 2 from total of two dice for use as an index
    set id [expr {$id*11 + [math::random 6]+[math::random 6]}]
    }
    lset rolls $id [expr {1+[lindex $rolls $id]}]
    }

    # count nonzero list entries
    set nz 0
    foreach r $rolls {
    incr nz [expr {0 != $r}]
    }
    puts "$nz are not zero"

    rbc/BLT or VecTcl vectors might use even less memory

    Dave B

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Manuel Collado@21:1/5 to All on Sat Mar 12 19:27:34 2022
    El 12/03/2022 a las 17:09, Cecil Westerhof escribi�:
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
    simulate rolling 2 dices seven times and registering the value of each
    roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?


    Each roll of 2 dices gives a value 1-12. You can consider each value as
    an hexadecimal digit and pack the result of 7 rolls as a single 7 digit hexadecimal value. This would reduce the memory amount by a factor of 7.
    At the cost of unpacking and retrieving the individual values when needed.

    Just a naive idea.
    --
    Manuel Collado - http://mcollado.z15.es

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cecil Westerhof@21:1/5 to Manuel Collado on Sat Mar 12 23:42:21 2022
    Manuel Collado <[email protected]> writes:

    El 12/03/2022 a las 17:09, Cecil Westerhof escribió:
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
    simulate rolling 2 dices seven times and registering the value of each
    roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?


    Each roll of 2 dices gives a value 1-12. You can consider each value as
    an hexadecimal digit and pack the result of 7 rolls as a single 7 digit hexadecimal value. This would reduce the memory amount by a factor of 7.
    At the cost of unpacking and retrieving the individual values when
    needed.

    My info was not correct. :'-( After the rolls 8 GB was used. I then
    use the data structure and the memory usage grew to 9.3 GB.

    I rewrote it to:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    set roll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    switch $roll {
    10 { set roll A }
    11 { set roll B }
    12 { set roll C }
    }
    append diceRoll $roll
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    Now the memory used after creating the data-structure is 2.75 GB. So
    about a third (not a seventh). The needed time increases with a
    quarter. That is acceptable I think.
    When using the data-structure the memory usage becomes 4 GB. So the
    increase is about the same. I would have expected it to also be a
    third of what it was before.

    The data-structure is used like this:
    proc displayBest {diceRollDict} {
    puts "Element count: [dict size ${diceRollDict}] distinct values"
    set diceRollList {}
    dict for {diceRoll count} ${diceRollDict} {
    lappend diceRollList [list ${diceRoll} ${count}]
    }
    set diceRollList [lsort -index 1 -integer -decreasing ${diceRollList}]
    set first [lindex [lindex ${diceRollList} 0] 1]
    set second [lindex [lindex ${diceRollList} 1] 1]
    set percFirst [expr {100. * ${first} / ${::nrOfTries} }]
    set percGreater [expr {100. * ${first} / ${second} - 100}]
    puts "First has ${percFirst}% chance"
    puts "Chance first ${percGreater}% greater as second"
    foreach diceRollElement [lrange ${diceRollList} 0 ${::nrOfResults}] {
    set diceRoll [lindex ${diceRollElement} 0]
    set count [lindex ${diceRollElement} 1]
    puts "[split ${diceRoll} ""]: ${count}"
    }
    }

    Why is the memory usage a third instead of a seventh?
    Why does the usage of the smaller data-structure uses the same amount
    of extra memory as the greater data-structure?

    --
    Cecil Westerhof
    Senior Software Engineer
    LinkedIn: http://www.linkedin.com/in/cecilwesterhof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sun Mar 13 05:29:45 2022
    Take a look at the Tcl_NewObj man page (and any Tcl book that explains the C interface)


    El 12/03/2022 a las 17:09, Cecil Westerhof escribió:
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
    simulate rolling 2 dices seven times and registering the value of each
    roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?


    Each roll of 2 dices gives a value 1-12. You can consider each value as
    an hexadecimal digit and pack the result of 7 rolls as a single 7 digit
    hexadecimal value. This would reduce the memory amount by a factor of 7.
    At the cost of unpacking and retrieving the individual values when
    needed.

    My info was not correct. :'-( After the rolls 8 GB was used. I then
    use the data structure and the memory usage grew to 9.3 GB.

    I rewrote it to:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}

    consider using
    set diceRoll 0

    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    set roll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    switch $roll {
    10 { set roll A }
    11 { set roll B }
    12 { set roll C }
    }
    append diceRoll $roll

    consider using:
    set diceRoll [expr {$diceRoll<<4 + $roll}]

    the dict key will now be an integer (saved directly in the dict key's Tcl_Obj)

    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    Now the memory used after creating the data-structure is 2.75 GB. So
    about a third (not a seventh). The needed time increases with a
    quarter. That is acceptable I think.
    When using the data-structure the memory usage becomes 4 GB. So the
    increase is about the same. I would have expected it to also be a
    third of what it was before.

    The data-structure is used like this:
    proc displayBest {diceRollDict} {
    puts "Element count: [dict size ${diceRollDict}] distinct values"
    set diceRollList {}
    dict for {diceRoll count} ${diceRollDict} {
    lappend diceRollList [list ${diceRoll} ${count}]
    }
    set diceRollList [lsort -index 1 -integer -decreasing ${diceRollList}]

    The dict can appear as a list of keys and values.
    consider replacing everything after puts "Element..." above with:

    set diceRollList [lsort -stride 2 -index 1 -integer -decreasing $diceRollDict]


    set first [lindex [lindex ${diceRollList} 0] 1]

    set first [lindex $diceRollList 1]

    set second [lindex [lindex ${diceRollList} 1] 1]

    set second [lindex $diceRollList 3]

    set percFirst [expr {100. * ${first} / ${::nrOfTries} }]
    set percGreater [expr {100. * ${first} / ${second} - 100}]
    puts "First has ${percFirst}% chance"
    puts "Chance first ${percGreater}% greater as second"
    foreach diceRollElement [lrange ${diceRollList} 0 ${::nrOfResults}] {
    set diceRoll [lindex ${diceRollElement} 0]
    set count [lindex ${diceRollElement} 1]

    if the "lsort -stride 2" is used above, the foreach and lindexes become:
    foreach {diceRoll count} $diceRollList {

    puts "[split ${diceRoll} ""]: ${count}"
    }
    }

    Why is the memory usage a third instead of a seventh?

    Each entry in a dict has two Tcl_Obj structures (key and value)
    The value is a number, which is stashed within the Tcl_Obj

    The key in the first method was a list of 7 numbers using 8 Tcl_Obj (one for list, 7 for list entries which are numbers, plus memory allocated for an array of pointers to the list entries

    The key in the second method is a string, which requires a Tcl_Obj and memory allocated for a 7 char string.

    If the key is saved as an actual number instead of a string, the number will be stored directly in the key"s Tcl_Obj.

    each dict entry (key and value) needs:
    list of 7 numbers: 9 Tcl_Obj + memory allocated for pointer array, = 10 bits of memory
    hex string: 2 Tcl_Obj + memory allocated for 7 char string = 3 bits of memory key is a number: 2 Tcl_Obj = 2 bits of memory

    Looking at the Tcl_NewObj man page, a Tcl_Obj might take about 48 bytes on a 64 bit system (just my guess, I did not actually measure the size with sizeof). The point is the Tcl_Obj is larger than the values (strings and numbers), so their memory can
    dominate the total memory use. I'm not sure how much memory Tcl allocates for list poiner arrays and strings. Perhaps the minimum allocation size is the size of a Tcl_Obj (this could reduce memory fragmentation?).If so this hints at the 3:1 reduction
    instead of 7:1, and indicates you could get 4:1 or 5:1 using numeric keys (compared to the list key).

    None of this includes the overhead for the dict hash table (which is at least one pointer per entry).

    Why does the usage of the smaller data-structure uses the same amount
    of extra memory as the greater data-structure?

    perhaps this is the dict hash table overhead?


    Dave B

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From heinrichmartin@21:1/5 to Cecil Westerhof on Sun Mar 13 03:06:53 2022
    On Saturday, March 12, 2022 at 5:14:07 PM UTC+1, Cecil Westerhof wrote:
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
    simulate rolling 2 dices seven times and registering the value of each
    roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?

    Just an incomplete thought: do you really need all of them in memory? What will you be doing with the data?
    Maybe you want to collect aggregates only. Or you could produce data to disk (file, pipe, db, ...) and post-process it from there.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cecil Westerhof@21:1/5 to heinrichmartin on Sun Mar 13 13:28:39 2022
    heinrichmartin <[email protected]> writes:

    On Saturday, March 12, 2022 at 5:14:07 PM UTC+1, Cecil Westerhof wrote:
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
    simulate rolling 2 dices seven times and registering the value of each
    roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?

    Just an incomplete thought: do you really need all of them in memory?
    What will you be doing with the data?
    Maybe you want to collect aggregates only. Or you could produce data to
    disk (file, pipe, db, ...) and post-process it from there.

    I do not know what the next throw will become, so that is a bit
    difficult I am afraid.
    I would like to go to 10 sets of throws. But at the moment I cannot
    go beyond 7. With a better data-structure I could go (I hope) to 8.
    For 9 and 10 I have to do something like you suggest, but that will
    make the performance go down.

    --
    Cecil Westerhof
    Senior Software Engineer
    LinkedIn: http://www.linkedin.com/in/cecilwesterhof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to Cecil Westerhof on Mon Mar 14 00:18:25 2022
    On 3/12/22 11:09 AM, Cecil Westerhof wrote:
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I
    simulate rolling 2 dices seven times and registering the value of each
    roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?


    Hello,

    Interesting problem. My recommendation would be to use an array instead
    of a dict which will save you in both in time and space. Here is a
    slightly modified version of your code using arrays:


    array set diceRollDictSeq {}
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set temp {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    ## you can map 10/11/12 to A/B/C here or later when accessing the values
    lappend temp [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    incr diceRollDictSeq([join $temp ,])
    }


    I wasn't sure what to expect so I ran it for 10 million and 7
    (inner/outer loop limits). On a Windows laptop, this ran for 6 minutes
    and consumed 950MB of memory (while I was busy browsing). It resulted
    in 5,346,039 unique rolls which is about a quarter of the total solution
    space.

    You can generate the same output report as your other post indicates
    with minor changes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bezoar@21:1/5 to [email protected] on Thu Mar 17 18:12:31 2022
    On Sunday, March 13, 2022 at 11:18:33 PM UTC-5, [email protected] wrote:
    On 3/12/22 11:09 AM, Cecil Westerhof wrote:
    I have a program where I simulate throwing 2 dices several times a
    number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I simulate rolling 2 dices seven times and registering the value of each roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of
    RAM significantly by using another data-structure?

    Hello,

    Interesting problem. My recommendation would be to use an array instead
    of a dict which will save you in both in time and space. Here is a
    slightly modified version of your code using arrays:


    array set diceRollDictSeq {}
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set temp {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    ## you can map 10/11/12 to A/B/C here or later when accessing the values lappend temp [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    incr diceRollDictSeq([join $temp ,])
    }


    I wasn't sure what to expect so I ran it for 10 million and 7
    (inner/outer loop limits). On a Windows laptop, this ran for 6 minutes
    and consumed 950MB of memory (while I was busy browsing). It resulted
    in 5,346,039 unique rolls which is about a quarter of the total solution space.

    You can generate the same output report as your other post indicates
    with minor changes.
    You can use a bitfield to hold the values. A bitfield basically holds a 1 or 0 at an index. So the string "001001" as an example would have a value at 2 and 5. So just using a string you can get ( assuming one byte per character and storing a
    bit string for each value and each attempt , in your case : 2-12 or 10 possible dice values and 7 attempts we get 70 bit strings ) this gives us 25M x 70 or 1750M or 1.75 Gig . But wait we know that a byte is made up of 1's and 0's so each one byte
    character can hold 8 values. So using bitwise operators we can cut the memory usage by 8 times or 1750 / 8 = 218.75M! which is significantly smaller than 9 G. But wait! we can also "run length encode" the string where any length of ones and zeros can
    be compressed to a number and a value. For example, a string of all 1's 32 long can be compressed to "32 1" so we can do the same and depending on your data can reduce the memory significantly. In the implementation below ( I have objectified what I
    borrowed from the tcl wiki ) we don't implement the run length encoding ; we leave that to the student as they say. I made timings for the implmentation below .

    tries attempts memory real
    -------- -------- -------- --------
    1000 7 7k 0.078s
    10000 7 70k 0.382s
    100000 7 700k 3.770s
    1000000 7 7M 79.58s

    I would expect that for a 10 times increase in tries we'd get a 10 times increase in time spent but as we see after 100000 the time needed to seek to the correct spot causes significant slow down. To speed it up one may want to split up the string into
    several smaller strings especially when you pre-allocate then its just math to get the correct index into the correct string to update the value.

    oo::class create Bitfield {
    # get and set taken from https://wiki.tcl-lang.org/page/bitstrings
    variable data
    variable maxposition
    variable lastfree

    constructor { { numbits -1 } } {
    set data ""
    set maxposition $numbits
    set lastfree 0
    if { $numbits >= 0 } {
    # prealloc space
    set numbytes [ expr { $numbits / 8 + (( $numbits % 8 ) ? 1 : 0 ) } ]
    set data [ string repeat "\0" $numbytes ]
    set maxposition [ expr $numbytes * 8 ]
    }
    }
    method alloc { } {
    set pos $lastfree
    try {
    while { [ my get $pos ] == 1 && ( $maxposition == -1 || $pos < $maxposition ) } {
    incr pos
    }
    } trap BAD_INDEX { a } {
    throw CAPACITY_ERROR " unable to allocate any more blocks!"
    }

    set lastfree [ expr $pos + 1 ]
    while { [ my get $lastfree ] && $lastfree != $maxposition } { incr lastfree }
    my set $pos 1
    #puts "allocated block $pos"
    return $pos
    }

    method free { pos } {
    my set $pos 0
    set lastfree $pos
    }

    method set { position bit } {
    #puts "Bitfield: set $position -> $bit "
    if { (! [string is integer $position]) || ($position < 0) } {
    throw BAD_INDEX "position must be an integer >= 0"
    }
    if { ($bit ne "0") && ($bit ne "1") } {
    throw BAD_VALUE "bit must be '0' or '1'"
    }
    set block [ expr { $position / 8 } ]
    set bitnum [ expr { $position % 8 } ]
    set byteval "00000000"
    binary scan $data @${block}b8 byteval

    # Compute new string of zeros and ones
    set newbyte [string range $byteval 0 [expr $bitnum-1]]$bit[string range $byteval [expr $bitnum+1] end]

    # Convert back into binary format
    set data [binary format a*@${block}b8 $data $newbyte]
    }

    method toggle { position } {
    set value [ my get $position ]
    my set $position [expr { ( $value + 1 ) % 2 } ]
    }
    method get { position } {

    if { (! [string is integer $position]) || ($position < 0) } {
    throw BAD_INDEX error "position must be an integer >= 0"
    }
    if { $maxposition >= 0 && $position >= $maxposition } {
    throw BAD_INDEX "position must be an integer < $maxposition"
    }
    # Compute byte and bit offsets
    set block [ expr { $position / 8 } ]
    set bitnum [expr $position % 8]

    # Scan the btye from the string
    # Default value is used if scan fails
    set byteval "00000000"
    binary scan $data @${block}b8 byteval
    # return the interesting bit
    set value [string range $byteval $bitnum $bitnum]
    return $value
    }
    method setData { buffer } {
    set data $buffer
    }
    method getData { } {
    return $data
    }
    method length { } {
    return [string length $data];
    }
    method capacity { } {
    return [expr { [string length $data ] * 8 } ]
    }
    method toString { } {
    set retval "[format "%10i " [my offset]]"
    foreach c [split $data "" ] {
    binary scan $c b8 x
    append retval "$x"
    }
    return $retval
    }
    method offset {} {
    return [ expr {[my length] + 11 }]
    }
    destructor {}
    }
    package provide Bitfield 1.0

    if { 0 } {
    set b [ Bitfield new ]
    $b set 4 1
    $b set 5 1
    $b toggle 6
    $b toggle 4
    $b set 25 1
    for { set i 0 } { $i < 30 } { incr i } {
    puts "$i . '[$b get $i ]'"
    }
    puts "length : [$b length ]"
    puts "capacity : [$b capacity ]"
    }
    # your code starts here
    package require math ; # from tcllib
    set nrOfTries 10000

    set nrOfRolls 7
    array set diceRoll {}
    for {set j 0} {$j < $nrOfRolls} {incr j} {
    for {set k 2 } { $k < 13 } { incr k } {
    set diceRoll($j,$k) [Bitfield new ${nrOfTries} ]
    }
    }
    proc rolls { trynum numRolls } {
    for {set i 0} {$i < $numRolls} {incr i} {
    set x [::math::random 1 $numRolls]
    incr x [::math::random 1 $numRolls]
    $::diceRoll($i,$x) set $trynum 1
    }
    }
    for {set j 0 } {$j < $::nrOfTries} {incr j} {
    rolls $j $::nrOfRolls
    if { [ expr { $j % 10000 } ] == 0 } {
    puts "$j"
    }
    }
    proc getRolls { tryNum } {
    set val {}
    for {set j 0} {$j < $::nrOfRolls } {incr j} {
    for {set k 2 } { $k < 13 } { incr k } {
    if { [$::diceRoll($j,$k) get $tryNum ] eq "1" } {
    lappend val $k
    }
    }
    }
    return $val
    }
    proc numTimes { value attempt } {
    lassign [$::diceRoll($attempt,$value) toString] capacityLen bitStr
    return [string length [string map { "0" "" } $bitStr ]]
    }
    puts "What were the rolls for 2500th try?"
    puts "[join [getRolls 2499] \n]"
    puts "How many times did 10 get rolled on the each attempt in $::nrOfTries tries?"
    puts " Attempt Num Times "
    puts " ======== =========="
    for { set m 0 } { $m < 7 } { incr m } {
    puts [format "% 7i % 5i" [expr { $m + 1 } ] [numTimes 10 $m ]]
    }

    - Carl

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bezoar@21:1/5 to Bezoar on Thu Mar 17 18:16:12 2022
    On Thursday, March 17, 2022 at 8:12:34 PM UTC-5, Bezoar wrote:
    On Sunday, March 13, 2022 at 11:18:33 PM UTC-5, [email protected] wrote:
    On 3/12/22 11:09 AM, Cecil Westerhof wrote:
    I have a program where I simulate throwing 2 dices several times a number of times.
    In the code below nrOfTries is 25 million and nrOfRolls is 7. So I simulate rolling 2 dices seven times and registering the value of each roll 25 million times. The code is:
    set diceRollDictSeq [dict create]
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set diceRoll {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    lappend diceRoll [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    dict incr diceRollDictSeq ${diceRoll}
    }

    This takes almost 9 GB of RAM. Is there a way to reduce the amount of RAM significantly by using another data-structure?

    Hello,

    Interesting problem. My recommendation would be to use an array instead
    of a dict which will save you in both in time and space. Here is a slightly modified version of your code using arrays:


    array set diceRollDictSeq {}
    for {set j 0} {$j < ${nrOfTries}} {incr j} {
    set temp {}
    for {set i 0} {$i < ${nrOfRolls}} {incr i} {
    ## you can map 10/11/12 to A/B/C here or later when accessing the values lappend temp [expr {[::math::random 1 7] + [::math::random 1 7]}]
    }
    incr diceRollDictSeq([join $temp ,])
    }


    I wasn't sure what to expect so I ran it for 10 million and 7
    (inner/outer loop limits). On a Windows laptop, this ran for 6 minutes
    and consumed 950MB of memory (while I was busy browsing). It resulted
    in 5,346,039 unique rolls which is about a quarter of the total solution space.

    You can generate the same output report as your other post indicates
    with minor changes.
    You can use a bitfield to hold the values. A bitfield basically holds a 1 or 0 at an index. So the string "001001" as an example would have a value at 2 and 5. So just using a string you can get ( assuming one byte per character and storing a bit
    string for each value and each attempt , in your case : 2-12 or 10 possible dice values and 7 attempts we get 70 bit strings ) this gives us 25M x 70 or 1750M or 1.75 Gig . But wait we know that a byte is made up of 1's and 0's so each one byte character
    can hold 8 values. So using bitwise operators we can cut the memory usage by 8 times or 1750 / 8 = 218.75M! which is significantly smaller than 9 G. But wait! we can also "run length encode" the string where any length of ones and zeros can be compressed
    to a number and a value. For example, a string of all 1's 32 long can be compressed to "32 1" so we can do the same and depending on your data can reduce the memory significantly. In the implementation below ( I have objectified what I borrowed from the
    tcl wiki ) we don't implement the run length encoding ; we leave that to the student as they say. I made timings for the implmentation below .

    tries attempts memory real
    -------- -------- -------- --------
    1000 7 7k 0.078s
    10000 7 70k 0.382s
    100000 7 700k 3.770s
    1000000 7 7M 79.58s

    I would expect that for a 10 times increase in tries we'd get a 10 times increase in time spent but as we see after 100000 the time needed to seek to the correct spot causes significant slow down. To speed it up one may want to split up the string into
    several smaller strings especially when you pre-allocate then its just math to get the correct index into the correct string to update the value.

    oo::class create Bitfield {
    # get and set taken from https://wiki.tcl-lang.org/page/bitstrings
    variable data
    variable maxposition
    variable lastfree

    constructor { { numbits -1 } } {
    set data ""
    set maxposition $numbits
    set lastfree 0
    if { $numbits >= 0 } {
    # prealloc space
    set numbytes [ expr { $numbits / 8 + (( $numbits % 8 ) ? 1 : 0 ) } ]
    set data [ string repeat "\0" $numbytes ]
    set maxposition [ expr $numbytes * 8 ]
    }
    }
    method alloc { } {
    set pos $lastfree
    try {
    while { [ my get $pos ] == 1 && ( $maxposition == -1 || $pos < $maxposition ) } {
    incr pos
    }
    } trap BAD_INDEX { a } {
    throw CAPACITY_ERROR " unable to allocate any more blocks!"
    }

    set lastfree [ expr $pos + 1 ]
    while { [ my get $lastfree ] && $lastfree != $maxposition } { incr lastfree }
    my set $pos 1
    #puts "allocated block $pos"
    return $pos
    }

    method free { pos } {
    my set $pos 0
    set lastfree $pos
    }

    method set { position bit } {
    #puts "Bitfield: set $position -> $bit "
    if { (! [string is integer $position]) || ($position < 0) } {
    throw BAD_INDEX "position must be an integer >= 0"
    }
    if { ($bit ne "0") && ($bit ne "1") } {
    throw BAD_VALUE "bit must be '0' or '1'"
    }
    set block [ expr { $position / 8 } ]
    set bitnum [ expr { $position % 8 } ]
    set byteval "00000000"
    binary scan $data @${block}b8 byteval

    # Compute new string of zeros and ones
    set newbyte [string range $byteval 0 [expr $bitnum-1]]$bit[string range $byteval [expr $bitnum+1] end]

    # Convert back into binary format
    set data [binary format a*@${block}b8 $data $newbyte]
    }

    method toggle { position } {
    set value [ my get $position ]
    my set $position [expr { ( $value + 1 ) % 2 } ]
    }
    method get { position } {

    if { (! [string is integer $position]) || ($position < 0) } {
    throw BAD_INDEX error "position must be an integer >= 0"
    }
    if { $maxposition >= 0 && $position >= $maxposition } {
    throw BAD_INDEX "position must be an integer < $maxposition"
    }
    # Compute byte and bit offsets
    set block [ expr { $position / 8 } ]
    set bitnum [expr $position % 8]

    # Scan the btye from the string
    # Default value is used if scan fails
    set byteval "00000000"
    binary scan $data @${block}b8 byteval
    # return the interesting bit
    set value [string range $byteval $bitnum $bitnum]
    return $value
    }
    method setData { buffer } {
    set data $buffer
    }
    method getData { } {
    return $data
    }
    method length { } {
    return [string length $data];
    }
    method capacity { } {
    return [expr { [string length $data ] * 8 } ]
    }
    method toString { } {
    set retval "[format "%10i " [my offset]]"
    foreach c [split $data "" ] {
    binary scan $c b8 x
    append retval "$x"
    }
    return $retval
    }
    method offset {} {
    return [ expr {[my length] + 11 }]
    }
    destructor {}
    }
    package provide Bitfield 1.0

    if { 0 } {
    set b [ Bitfield new ]
    $b set 4 1
    $b set 5 1
    $b toggle 6
    $b toggle 4
    $b set 25 1
    for { set i 0 } { $i < 30 } { incr i } {
    puts "$i . '[$b get $i ]'"
    }
    puts "length : [$b length ]"
    puts "capacity : [$b capacity ]"
    }
    # your code starts here
    package require math ; # from tcllib
    set nrOfTries 10000

    set nrOfRolls 7
    array set diceRoll {}
    for {set j 0} {$j < $nrOfRolls} {incr j} {
    for {set k 2 } { $k < 13 } { incr k } {
    set diceRoll($j,$k) [Bitfield new ${nrOfTries} ]
    }
    }
    proc rolls { trynum numRolls } {
    for {set i 0} {$i < $numRolls} {incr i} {
    set x [::math::random 1 $numRolls]
    incr x [::math::random 1 $numRolls]
    $::diceRoll($i,$x) set $trynum 1
    }
    }
    for {set j 0 } {$j < $::nrOfTries} {incr j} {
    rolls $j $::nrOfRolls
    if { [ expr { $j % 10000 } ] == 0 } {
    puts "$j"
    }
    }
    proc getRolls { tryNum } {
    set val {}
    for {set j 0} {$j < $::nrOfRolls } {incr j} {
    for {set k 2 } { $k < 13 } { incr k } {
    if { [$::diceRoll($j,$k) get $tryNum ] eq "1" } {
    lappend val $k
    }
    }
    }
    return $val
    }
    proc numTimes { value attempt } {
    lassign [$::diceRoll($attempt,$value) toString] capacityLen bitStr
    return [string length [string map { "0" "" } $bitStr ]]
    }
    puts "What were the rolls for 2500th try?"
    puts "[join [getRolls 2499] \n]"
    puts "How many times did 10 get rolled on the each attempt in $::nrOfTries tries?"
    puts " Attempt Num Times "
    puts " ======== =========="
    for { set m 0 } { $m < 7 } { incr m } {
    puts [format "% 7i % 5i" [expr { $m + 1 } ] [numTimes 10 $m ]]
    }

    - Carl

    Forgot the output from the program:
    What were the rolls for 25 time?
    4
    5
    10
    6
    8
    11
    2
    How many times did 10 get rolled on the each attempt in 10000 tries?
    Attempt Num Times
    ======== ==========
    1 866
    2 858
    3 828
    4 784
    5 873
    6 862
    7 798

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)