• speed improve wanted

    From [email protected]@21:1/5 to All on Fri Apr 21 01:11:26 2023
    Hi all

    Could some one give me some hit or direction to improve the speed?

    same code by different language in same machine....
    the program find the minimum target 1 appear times in 100 row by 3125 column

    ex matrix: 0 1 0 0 1 0 1 0 1...
    1 0 1 0 0 0 0 1 1...
    1 1 1 0 0 1 1 0 1..
    ....
    ...

    the speed gap almost 4 times


    tcl(8.6.12) :
    total_time: 266766 microseconds 266ms 0.266766s
    Average execution time per iteration: 85 microseconds

    python(3.9):
    total_time: 59829800 nanoseconds 59.8298ms 0.0598298s
    Average execution time per iteration: 19145.536 nanoseconds



    ###tcl code ###

    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    for {set j 0} {$j < $num_columns} {incr j} {
    set matrix([expr {$i + 1}],$j) [expr {int(rand()*2)}]
    }
    }

    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos {}
    set total_time 0

    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}
    set start_time [clock microseconds]
    for {set i 0} {$i < $num_rows} {incr i} {
    if {$matrix([expr {$i + 1}],$j) == 1} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    }
    break
    }
    }
    set end_time [clock microseconds]
    set iteration_time [expr {$end_time - $start_time}]
    set total_time [expr {$total_time + $iteration_time}]
    }
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"

    ####python code ###
    import random
    import time

    num_iterations = 1000
    num_columns = 3125
    num_rows = 100
    num_hits = 25

    # Generate random matrix
    matrix = []
    for i in range(num_rows):
    row = []
    for j in range(num_columns):
    row.append(random.randint(0, 1))
    matrix.append(row)

    # Find shortest sequence of 1's with given number of hits
    shortest_seq = ""
    shortest_len = num_rows
    shortest_pos = []
    total_time = 0

    for j in range(num_columns):
    hits = 0
    seq = ""
    pos = []
    start_time = time.perf_counter_ns()
    for i in range(num_rows):
    if matrix[i+1][j] == 1:
    hits += 1
    seq += "1"
    pos.append(i)
    else:
    seq += "0"
    if hits >= num_hits:
    seq_len = len(seq)
    if seq_len < shortest_len:
    shortest_len = seq_len
    shortest_seq = seq
    shortest_pos = pos
    break
    end_time = time.perf_counter_ns()
    iteration_time = end_time - start_time
    total_time += iteration_time

    avg_time = total_time / num_columns
    print(f"total_time: {total_time} nanoseconds {total_time/1000000.0}ms {total_time/1000000000.0}s")
    print(f"Average execution time per iteration: {avg_time} nanoseconds")


    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Fri Apr 21 10:52:27 2023
    * "[email protected]" <[email protected]>
    | Could some one give me some hit or direction to improve the speed?

    Putting the TCL code in a proc so it gets byte-compiled would be my
    first attempt on speedup.

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cecil Westerhof@21:1/5 to Ralf Fassel on Fri Apr 21 11:29:39 2023
    Ralf Fassel <[email protected]> writes:

    * "[email protected]" <[email protected]>
    | Could some one give me some hit or direction to improve the speed?

    Putting the TCL code in a proc so it gets byte-compiled would be my
    first attempt on speedup.

    I did not know that. Good to know.

    --
    Cecil Westerhof
    Senior Software Engineer
    LinkedIn: http://www.linkedin.com/in/cecilwesterhof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Fri Apr 21 04:13:59 2023
    Cecil Westerhof 在 2023年4月21日 星期五晚上9:44:06 [UTC+12] 的信中寫道:
    Ralf Fassel <[email protected]> writes:

    * "[email protected]" <[email protected]>
    | Could some one give me some hit or direction to improve the speed?

    Putting the TCL code in a proc so it gets byte-compiled would be my
    first attempt on speedup.
    I did not know that. Good to know.

    --
    Cecil Westerhof
    Senior Software Engineer
    LinkedIn: http://www.linkedin.com/in/cecilwesterhof

    Hi Ralf
    thanks for your advice , in proc the consume time
    163413 microseconds per iteration reduce to half ,
    due to the proc is main analysis program key code , need exetcute x3125x3125x100 times in target period
    have some extension or method to speed up without change to other lang?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Fri Apr 21 05:11:24 2023
    Harald Oehlmann 在 2023年4月21日 星期五晚上11:18:37 [UTC+12] 的信中寫道:
    Am 21.04.2023 um 10:11 schrieb [email protected]:
    Hi all

    Could some one give me some hit or direction to improve the speed?

    same code by different language in same machine....
    the program find the minimum target 1 appear times in 100 row by 3125 column

    ex matrix: 0 1 0 0 1 0 1 0 1...
    1 0 1 0 0 0 0 1 1...
    1 1 1 0 0 1 1 0 1..
    ....
    ...

    the speed gap almost 4 times


    tcl(8.6.12) :
    total_time: 266766 microseconds 266ms 0.266766s
    Average execution time per iteration: 85 microseconds

    python(3.9):
    total_time: 59829800 nanoseconds 59.8298ms 0.0598298s
    Average execution time per iteration: 19145.536 nanoseconds



    ###tcl code ###

    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    for {set j 0} {$j < $num_columns} {incr j} {
    set matrix([expr {$i + 1}],$j) [expr {int(rand()*2)}]
    }
    }

    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos {}
    set total_time 0

    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}
    set start_time [clock microseconds]
    for {set i 0} {$i < $num_rows} {incr i} {
    if {$matrix([expr {$i + 1}],$j) == 1} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    }
    break
    }
    }
    set end_time [clock microseconds]
    set iteration_time [expr {$end_time - $start_time}]
    set total_time [expr {$total_time + $iteration_time}]
    }
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"

    ####python code ###
    import random
    import time

    num_iterations = 1000
    num_columns = 3125
    num_rows = 100
    num_hits = 25

    # Generate random matrix
    matrix = []
    for i in range(num_rows):
    row = []
    for j in range(num_columns):
    row.append(random.randint(0, 1))
    matrix.append(row)

    # Find shortest sequence of 1's with given number of hits
    shortest_seq = ""
    shortest_len = num_rows
    shortest_pos = []
    total_time = 0

    for j in range(num_columns):
    hits = 0
    seq = ""
    pos = []
    start_time = time.perf_counter_ns()
    for i in range(num_rows):
    if matrix[i+1][j] == 1:
    hits += 1
    seq += "1"
    pos.append(i)
    else:
    seq += "0"
    if hits >= num_hits:
    seq_len = len(seq)
    if seq_len < shortest_len:
    shortest_len = seq_len
    shortest_seq = seq
    shortest_pos = pos
    break
    end_time = time.perf_counter_ns()
    iteration_time = end_time - start_time
    total_time += iteration_time

    avg_time = total_time / num_columns
    print(f"total_time: {total_time} nanoseconds {total_time/1000000.0}ms {total_time/1000000000.0}s")
    print(f"Average execution time per iteration: {avg_time} nanoseconds")


    BR
    Rolance
    Rolance,
    thank you for the question. The principle difference I see between the programs is the way, the matrix is done:
    TCL: matrix([expr {$i + 1}],$j)
    Python: matrix[i+1][j]

    So, on TCL, a hash is used to access the elements. I suppose, on Python,
    you have index access.
    So, it might be advisable to use a list of lists on TCL to also have
    index access.

    But I am sure, numeric wizards will answer here and give solutions using VecTCL, which has a native matrix type.

    Harald
    Hi Harald
    thanks for your advice , will try it and report later.
    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Fri Apr 21 13:18:33 2023
    Am 21.04.2023 um 10:11 schrieb [email protected]:
    Hi all

    Could some one give me some hit or direction to improve the speed?

    same code by different language in same machine....
    the program find the minimum target 1 appear times in 100 row by 3125 column

    ex matrix: 0 1 0 0 1 0 1 0 1...
    1 0 1 0 0 0 0 1 1...
    1 1 1 0 0 1 1 0 1..
    ....
    ...

    the speed gap almost 4 times


    tcl(8.6.12) :
    total_time: 266766 microseconds 266ms 0.266766s
    Average execution time per iteration: 85 microseconds

    python(3.9):
    total_time: 59829800 nanoseconds 59.8298ms 0.0598298s
    Average execution time per iteration: 19145.536 nanoseconds



    ###tcl code ###

    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    for {set j 0} {$j < $num_columns} {incr j} {
    set matrix([expr {$i + 1}],$j) [expr {int(rand()*2)}]
    }
    }

    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos {}
    set total_time 0

    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}
    set start_time [clock microseconds]
    for {set i 0} {$i < $num_rows} {incr i} {
    if {$matrix([expr {$i + 1}],$j) == 1} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    }
    break
    }
    }
    set end_time [clock microseconds]
    set iteration_time [expr {$end_time - $start_time}]
    set total_time [expr {$total_time + $iteration_time}]
    }
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"

    ####python code ###
    import random
    import time

    num_iterations = 1000
    num_columns = 3125
    num_rows = 100
    num_hits = 25

    # Generate random matrix
    matrix = []
    for i in range(num_rows):
    row = []
    for j in range(num_columns):
    row.append(random.randint(0, 1))
    matrix.append(row)

    # Find shortest sequence of 1's with given number of hits
    shortest_seq = ""
    shortest_len = num_rows
    shortest_pos = []
    total_time = 0

    for j in range(num_columns):
    hits = 0
    seq = ""
    pos = []
    start_time = time.perf_counter_ns()
    for i in range(num_rows):
    if matrix[i+1][j] == 1:
    hits += 1
    seq += "1"
    pos.append(i)
    else:
    seq += "0"
    if hits >= num_hits:
    seq_len = len(seq)
    if seq_len < shortest_len:
    shortest_len = seq_len
    shortest_seq = seq
    shortest_pos = pos
    break
    end_time = time.perf_counter_ns()
    iteration_time = end_time - start_time
    total_time += iteration_time

    avg_time = total_time / num_columns
    print(f"total_time: {total_time} nanoseconds {total_time/1000000.0}ms {total_time/1000000000.0}s")
    print(f"Average execution time per iteration: {avg_time} nanoseconds")


    BR
    Rolance

    Rolance,
    thank you for the question. The principle difference I see between the
    programs is the way, the matrix is done:
    TCL: matrix([expr {$i + 1}],$j)
    Python: matrix[i+1][j]

    So, on TCL, a hash is used to access the elements. I suppose, on Python,
    you have index access.
    So, it might be advisable to use a list of lists on TCL to also have
    index access.

    But I am sure, numeric wizards will answer here and give solutions using VecTCL, which has a native matrix type.

    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Fri Apr 21 16:22:23 2023
    * "[email protected]" <[email protected]>
    | Cecil Westerhof 在 2023年4月21日 星期五晚上9:44:06 [UTC+12] 的信中寫道:
    | > Ralf Fassel <[email protected]> writes:
    | >
    | > > * "[email protected]" <[email protected]>
    | > > | Could some one give me some hit or direction to improve the speed?
    | > >
    | > > Putting the TCL code in a proc so it gets byte-compiled would be my
    | > > first attempt on speedup.
    --<snip-snip>--
    | Hi Ralf
    | thanks for your advice , in proc the consume time
    | 163413 microseconds per iteration reduce to half ,

    On my machine, using a list-based approach as Harald pointed out (and
    inside a proc) I get comparable speeds for the TCL and python code:

    your original code
    total_time: 193136 microseconds 193ms 0.193136s
    Average execution time per iteration: 61 microseconds

    your original code inside proc
    total_time: 94736 microseconds 94ms 0.094736s
    Average execution time per iteration: 30 microseconds

    using list ("lappend rows $row" instead of set matrix([expr]) and
    "lindex $row $column" instead of $matrix([]))
    total_time: 69328 microseconds 69ms 0.069328s
    Average execution time per iteration: 22 microseconds

    python as-is (had to change 'perf_counter_ns' to 'perf_counter' since my
    python did not know perf_counter_ns):
    python3 t.py
    total_time: 0.0770318127470091 seconds [...]
    Average execution time per iteration: 2.465018007904291e-05 seconds

    HTH
    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Fri Apr 21 08:23:20 2023
    Ralf Fassel 在 2023年4月22日 星期六凌晨2:22:29 [UTC+12] 的信中寫道:
    * "[email protected]" <[email protected]>
    | Cecil Westerhof 在 2023年4月21日 星期五晚上9:44:06 [UTC+12] 的信中寫道:
    | > Ralf Fassel <[email protected]> writes:
    | >
    | > > * "[email protected]" <[email protected]>
    | > > | Could some one give me some hit or direction to improve the speed?
    | > >
    | > > Putting the TCL code in a proc so it gets byte-compiled would be my
    | > > first attempt on speedup.
    --<snip-snip>--
    | Hi Ralf
    | thanks for your advice , in proc the consume time
    | 163413 microseconds per iteration reduce to half ,
    On my machine, using a list-based approach as Harald pointed out (and
    inside a proc) I get comparable speeds for the TCL and python code:

    your original code
    total_time: 193136 microseconds 193ms 0.193136s
    Average execution time per iteration: 61 microseconds

    your original code inside proc
    total_time: 94736 microseconds 94ms 0.094736s
    Average execution time per iteration: 30 microseconds

    using list ("lappend rows $row" instead of set matrix([expr]) and
    "lindex $row $column" instead of $matrix([]))
    total_time: 69328 microseconds 69ms 0.069328s
    Average execution time per iteration: 22 microseconds

    python as-is (had to change 'perf_counter_ns' to 'perf_counter' since my python did not know perf_counter_ns):
    python3 t.py
    total_time: 0.0770318127470091 seconds [...]
    Average execution time per iteration: 2.465018007904291e-05 seconds

    HTH
    R'

    Hi Ralf
    thanks for your hint and test result
    change to "lappend rows $row" method can be
    total_time: 66394 microseconds per iteration

    in multi proc implement in same time , the tcl still cost twice time than python (generate same result)
    will do more analysis , which part will affect most .. data passing , file reading , array transfer etc..

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to [email protected] on Fri Apr 21 13:02:54 2023
    On 4/21/2023 4:11 AM, [email protected] wrote:
    Hi all

    Could some one give me some hit or direction to improve the speed?

    same code by different language in same machine....
    the program find the minimum target 1 appear times in 100 row by 3125 column


    How does this version compare?



    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    for {set j 0} {$j < $num_columns} {incr j} {
    set matrix([expr {$i + 1}],$j) [expr {int(rand()*2)}]
    }
    }

    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos {}
    set total_time 0

    # put the main calculation in a proc
    proc calculate {num_rows num_columns num_hits shortest_len} {
    global matrix
    global timings

    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}
    for {set i 0} {$i < $num_rows} {incr i} {
    if {$matrix([expr {$i + 1}],$j) == 1} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    }
    break
    }
    }
    # no need to do time calculations here
    lappend timings [clock microseconds]
    }
    }


    # put the timer in a proc so we can run multiple tests
    proc time_it {num_rows num_columns num_hits shortest_len} {
    global matrix
    global timings


    set timings [list [clock microseconds]]

    calculate $num_rows $num_columns $num_hits $shortest_len


    # add up the timing differentials
    set total_time [lindex $timings 0]
    for {set i 1} {$i < [llength $timings]} {incr i} {
    set delta [expr {[lindex $timings $i] - [lindex $timings [expr {$i -
    1}]]}]
    set total_time [expr {$total_time + $delta}]
    }

    # print it
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms
    [expr $total_time/1000000.0]s \nAverage execution time per iteration:
    $avg_time microseconds"

    }




    puts "time_it $num_rows $num_columns $num_hits $shortest_len"
    puts "RUN 1: "
    puts "================"
    time_it $num_rows $num_columns $num_hits $shortest_len
    puts "\n\n"
    puts "RUN 2:"
    puts "================"
    time_it $num_rows $num_columns $num_hits $shortest_len

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to All on Fri Apr 21 13:06:41 2023
    On 4/21/2023 1:02 PM, saitology9 wrote:
      # add up the timing differentials
      set total_time [lindex $timings 0]


    There is a typo in this line which is in time_it.
    Please change it as follows:

    set total_time 0

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alex Bochannek@21:1/5 to [email protected] on Fri Apr 21 10:52:27 2023
    "[email protected]" <[email protected]> writes:

    Could some one give me some hit or direction to improve the speed?

    same code by different language in same machine....
    the program find the minimum target 1 appear times in 100 row by 3125 column

    ex matrix: 0 1 0 0 1 0 1 0 1...
    1 0 1 0 0 0 0 1 1...
    1 1 1 0 0 1 1 0 1..
    ....
    ...

    I don't know if this is a synthetic problem or something you are going
    to have to maintain. If it is the latter, it seems like an array
    programming language would be a better fit (hope that's OK to point out
    on c.l.tcl) I did not measure performance, which was your original
    question, but maybe this is still a useful perspective.

    In APL, finding the lowest numbers of 1s (no less than 25) in your
    random array you can do like this: 25+⌊/(+/1-⍨?3125 100⍴2)-25
    I am using Dyalog APL in these examples and the default ⎕IO of 1. I am
    sure there are better, more efficient ways.

    If you need to know the column, an easy to maintain solution is this
    (comments inline):

    ⍝ 3125 by 100 array of random 0s and 1s
    array←1-⍨?3125 100⍴2
    ⍝ Sum up columns
    ones←+/array
    ⍝ Minimum number above 25
    lowest←25+⌊/ones-25
    ⍝ Index of lowest numbers
    ⍸lowest⍷ones

    As a one-liner it could look like this:

    ⍸({⍵+⌊/A-⍵}25)⍷A←+/1-⍨?3125 100⍴2

    Again, apologies for a non-Tcl answer! I haven't written Tcl in many
    years, but continue to follow this group, because I am curious about
    progress of the Tcl 9.0 release.

    --
    Alex.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Fri Apr 21 14:18:57 2023
    saitology9 在 2023年4月22日 星期六清晨5:06:47 [UTC+12] 的信中寫道:
    On 4/21/2023 1:02 PM, saitology9 wrote:
    # add up the timing differentials
    set total_time [lindex $timings 0]
    There is a typo in this line which is in time_it.
    Please change it as follows:

    set total_time 0
    Hi saitology9

    thanks for your advice
    test result :
    time_it 100 3125 25 100
    RUN 1:
    ================
    total_time: 115791 microseconds 115ms
    0.115791s
    Average execution time per iteration:
    37 microseconds


    RUN 2:
    ================
    total_time: 105553 microseconds 105ms
    0.105553s
    Average execution time per iteration:
    33 microseconds

    as now below method still be the best solution:
    change to "lappend rows $row" method can be
    total_time: 66394 microseconds per iteration

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Fri Apr 21 16:04:38 2023
    [email protected] 在 2023年4月22日 星期六上午9:19:00 [UTC+12] 的信中寫道:
    saitology9 在 2023年4月22日 星期六清晨5:06:47 [UTC+12] 的信中寫道:
    On 4/21/2023 1:02 PM, saitology9 wrote:
    # add up the timing differentials
    set total_time [lindex $timings 0]
    There is a typo in this line which is in time_it.
    Please change it as follows:

    set total_time 0
    Hi saitology9

    thanks for your advice
    test result :
    time_it 100 3125 25 100
    RUN 1:
    ================
    total_time: 115791 microseconds 115ms
    0.115791s
    Average execution time per iteration:
    37 microseconds


    RUN 2:
    ================
    total_time: 105553 microseconds 105ms
    0.105553s
    Average execution time per iteration:
    33 microseconds

    as now below method still be the best solution:
    change to "lappend rows $row" method can be
    total_time: 66394 microseconds per iteration
    BR
    Rolance

    hi all

    already transfer to proc in two lang.

    tcl : total time 0.0968s (twice....)
    python: total time: 0.0496496

    Could someone point me the key affect factor ? data passing issue ?


    #### tcl ####

    #set num_iterations 1000
    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    for {set j 0} {$j < $num_columns} {incr j} {
    lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}]
    }
    }

    proc find_shortest_hit_org {num_hits mx} {
    array set matrix $mx

    set num_columns [llength $matrix(1)]
    set num_rows [llength [array names matrix]]


    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos ""

    set seq_lst ""

    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos_list ""
    set act_hit 0
    for {set i 0} {$i < $num_rows} {incr i} {
    if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {
    incr hits
    append seq "1"
    lappend pos_list $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos [join $pos_list ","]
    }
    if {[string index $seq end] == 0} {set act_hit 1 ; break}

    }
    }
    if {$act_hit == 1} {
    lappend seq_lst "$j [string length $seq]"
    }
    }


    return [lrange [lsort -index 1 -increasing -integer $seq_lst] 0 9]

    }

    set be [clock microseconds]
    find_shortest_hit_org 25 [array get matrix]
    puts "total time [expr {([clock microseconds] -$be)/1000000.0}]s"


    ### python ####

    import random
    import time

    num_iterations = 1000
    num_columns = 3125
    num_rows = 100
    num_hits = 25

    # Generate random matrix
    matrix = []
    for i in range(num_rows):
    row = []
    for j in range(num_columns):
    row.append(random.randint(0, 1))
    matrix.append(row)


    def find_shortest_hit(num_hits, mx):
    #matrix = dict(mx)
    matrix = mx
    #print(matrix[1])
    num_columns = len(matrix[1])
    num_rows = len(matrix)

    shortest_seq = ''
    shortest_len = num_rows
    shortest_pos = ''

    seq_lst = []

    for j in range(num_columns):
    hits = 0
    seq = ''
    pos_list = []
    act_hit = 0

    for i in range(num_rows):
    #print(matrix[i+1][j])
    if matrix[i+1][j] == 1:
    hits += 1
    seq += '1'
    pos_list.append(str(i))
    else:
    seq += '0'

    if hits >= num_hits:
    seq_len = len(seq)
    if seq_len < shortest_len:
    shortest_len = seq_len
    shortest_seq = seq
    shortest_pos = ','.join(pos_list)

    if seq[-1] == '0':
    act_hit = 1
    break

    if act_hit == 1:
    seq_lst.append((j, len(seq)))

    return sorted(seq_lst, key=lambda x: x[1])[:10]




    start_time = time.perf_counter_ns()

    find_shortest_hit(25, matrix)

    end_time = time.perf_counter_ns()
    iteration_time = end_time - start_time
    print(f"total time: {iteration_time/1000000000}")


    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to [email protected] on Fri Apr 21 17:44:44 2023
    On 4/21/2023 4:04 PM, [email protected] wrote:
    [email protected] 在 2023年4月22日 星期六上午9:19:00 [UTC+12] 的信中寫道:
    saitology9 在 2023年4月22日 星期六清晨5:06:47 [UTC+12] 的信中寫道:
    On 4/21/2023 1:02 PM, saitology9 wrote:
    # add up the timing differentials
    set total_time [lindex $timings 0]
    There is a typo in this line which is in time_it.
    Please change it as follows:

    set total_time 0
    Hi saitology9

    thanks for your advice
    test result :
    time_it 100 3125 25 100
    RUN 1:
    ================
    total_time: 115791 microseconds 115ms
    0.115791s
    Average execution time per iteration:
    37 microseconds


    RUN 2:
    ================
    total_time: 105553 microseconds 105ms
    0.105553s
    Average execution time per iteration:
    33 microseconds

    as now below method still be the best solution:
    change to "lappend rows $row" method can be
    total_time: 66394 microseconds per iteration
    BR
    Rolance

    hi all

    already transfer to proc in two lang.

    tcl : total time 0.0968s (twice....)
    python: total time: 0.0496496

    Could someone point me the key affect factor ? data passing issue ?


    #### tcl ####

    #set num_iterations 1000
    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    for {set j 0} {$j < $num_columns} {incr j} {
    lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}]
    }
    }

    proc find_shortest_hit_org {num_hits mx} {
    array set matrix $mx

    set num_columns [llength $matrix(1)]
    set num_rows [llength [array names matrix]]


    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos ""

    set seq_lst ""

    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos_list ""
    set act_hit 0
    for {set i 0} {$i < $num_rows} {incr i} {
    if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {
    incr hits
    append seq "1"
    lappend pos_list $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos [join $pos_list ","]
    }
    if {[string index $seq end] == 0} {set act_hit 1 ; break}

    }
    }
    if {$act_hit == 1} {
    lappend seq_lst "$j [string length $seq]"
    }
    }


    return [lrange [lsort -index 1 -increasing -integer $seq_lst] 0 9]

    }

    set be [clock microseconds]
    find_shortest_hit_org 25 [array get matrix]
    puts "total time [expr {([clock microseconds] -$be)/1000000.0}]s"


    ### python ####

    import random
    import time

    num_iterations = 1000
    num_columns = 3125
    num_rows = 100
    num_hits = 25

    # Generate random matrix
    matrix = []
    for i in range(num_rows):
    row = []
    for j in range(num_columns):
    row.append(random.randint(0, 1))
    matrix.append(row)


    def find_shortest_hit(num_hits, mx):
    #matrix = dict(mx)
    matrix = mx
    #print(matrix[1])
    num_columns = len(matrix[1])
    num_rows = len(matrix)

    shortest_seq = ''
    shortest_len = num_rows
    shortest_pos = ''

    seq_lst = []

    for j in range(num_columns):
    hits = 0
    seq = ''
    pos_list = []
    act_hit = 0

    for i in range(num_rows):
    #print(matrix[i+1][j])
    if matrix[i+1][j] == 1:
    hits += 1
    seq += '1'
    pos_list.append(str(i))
    else:
    seq += '0'

    if hits >= num_hits:
    seq_len = len(seq)
    if seq_len < shortest_len:
    shortest_len = seq_len
    shortest_seq = seq
    shortest_pos = ','.join(pos_list)

    if seq[-1] == '0':
    act_hit = 1
    break

    if act_hit == 1:
    seq_lst.append((j, len(seq)))

    return sorted(seq_lst, key=lambda x: x[1])[:10]




    start_time = time.perf_counter_ns()

    find_shortest_hit(25, matrix)

    end_time = time.perf_counter_ns()
    iteration_time = end_time - start_time
    print(f"total time: {iteration_time/1000000000}")


    BR
    Rolance


    You could consider using tcl threads.

    It would appear that the calculation can be factored
    into several parts that could be done concurrently.

    You could setup the matrix using tsv::set one time
    and call on several threads to each search some
    set of rows or columns, and then collate the results.

    Even my 2 year old intel computer has 12 cores.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to [email protected] on Fri Apr 21 21:53:50 2023
    On 4/21/2023 5:18 PM, [email protected] wrote:

    as now below method still be the best solution:
    change to "lappend rows $row" method can be
    total_time: 66394 microseconds per iteration


    You can definitely combine the two suggestions. Switching from the
    matrix as an array to a lists/sub-lists version, we get a slight
    improvement. See three sample runs below:


    % time_it 100 3125 25 100
    total_time: 30661 microseconds 30ms 0.030661s
    Average execution time per iteration: 9 microseconds


    % time_it 100 3125 25 100
    total_time: 35541 microseconds 35ms 0.035541s
    Average execution time per iteration: 11 microseconds


    % time_it 100 3125 25 100
    total_time: 31111 microseconds 31ms 0.031111s
    Average execution time per iteration: 9 microseconds

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to [email protected] on Fri Apr 21 21:37:44 2023
    On 4/21/2023 5:18 PM, [email protected] wrote:
    Hi saitology9

    thanks for your advice
    test result :
    time_it 100 3125 25 100
    RUN 1:
    ================
    total_time: 115791 microseconds 115ms
    0.115791s
    Average execution time per iteration:
    37 microseconds


    RUN 2:
    ================
    total_time: 105553 microseconds 105ms
    0.105553s
    Average execution time per iteration:
    33 microseconds

    as now below method still be the best solution:
    change to "lappend rows $row" method can be
    total_time: 66394 microseconds per iteration


    Are you sure you ran the version that I posted? Because on a mid range
    laptop, I am getting number that are about 1/3rd of what you posted.


    time_it 100 3125 25 100
    RUN 1:
    ================
    total_time: 43239 microseconds 43ms 0.043239s
    Average execution time per iteration: 13 microseconds



    RUN 2:
    ================
    total_time: 41028 microseconds 41ms 0.041028s
    Average execution time per iteration: 13 microseconds

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to All on Fri Apr 21 22:05:31 2023
    On 4/21/2023 8:44 PM, et99 wrote:
    You could consider using tcl threads.

    It would appear that the calculation can be factored
    into several parts that could be done concurrently.

    You could setup the matrix using tsv::set one time
    and call on several threads to each search some
    set of rows or columns, and then collate the results.

    Even my 2 year old intel computer has 12 cores.



    The OP is concerned about a run-time difference of less than 0.05 of a
    second. No reason to time it now but I would suspect that the thread
    setup would eat a good portion of any savings it may generate at this threshold.


    The question could be why does this difference matter?


    @Rolance:
    Another potential reason for the difference: Tcl's random number
    generator may be better than Python's so the algorithm spends more time
    on each row/column, whereas the Python version may be hitting the break
    quite early in each loop. I would suggest keeping the data set
    identical for both programs. You can save the matrix to a file and have
    both programs read it in before doing their calculations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to All on Fri Apr 21 22:57:05 2023
    On 4/21/2023 7:05 PM, saitology9 wrote:
    On 4/21/2023 8:44 PM, et99 wrote:
    You could consider using tcl threads.

    It would appear that the calculation can be factored
    into several parts that could be done concurrently.

    You could setup the matrix using tsv::set one time
    and call on several threads to each search some
    set of rows or columns, and then collate the results.

    Even my 2 year old intel computer has 12 cores.



    The OP is concerned about a run-time difference of less than 0.05 of a second.  No reason to time it now but I would suspect that the thread setup would eat a good portion of any savings it may generate at this threshold.


    The question could be why does this difference matter?


    @Rolance:
    Another potential reason for the difference: Tcl's random number generator may be better than Python's so the algorithm spends more time on each row/column, whereas the Python version may be hitting the break quite early in each loop.  I would suggest
    keeping the data set identical for both programs.  You can save the matrix to a file and have both programs read it in before doing their calculations.



    Yes, you are right, it depends on whether the OP's example was a
    small test case or not.

    I guessed if OP was concerned with mere 100s of microseconds,
    it was small and/or one of many to be computed.

    I substituted in a tsv shared matrix to see what overhead was there,
    which was only about 15% when I bumped up the rows/columns.

    This is also an example of where an efficient bit/bitfield array
    mechanism would be useful. Maybe Brian's lseq technology might
    be adapted in the future.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Fri Apr 21 23:26:47 2023
    saitology9 在 2023年4月22日 星期六下午2:05:36 [UTC+12] 的信中寫道:
    On 4/21/2023 8:44 PM, et99 wrote:
    You could consider using tcl threads.

    It would appear that the calculation can be factored
    into several parts that could be done concurrently.

    You could setup the matrix using tsv::set one time
    and call on several threads to each search some
    set of rows or columns, and then collate the results.

    Even my 2 year old intel computer has 12 cores.

    The OP is concerned about a run-time difference of less than 0.05 of a second. No reason to time it now but I would suspect that the thread
    setup would eat a good portion of any savings it may generate at this threshold.


    The question could be why does this difference matter?


    @Rolance:
    Another potential reason for the difference: Tcl's random number
    generator may be better than Python's so the algorithm spends more time
    on each row/column, whereas the Python version may be hitting the break quite early in each loop. I would suggest keeping the data set
    identical for both programs. You can save the matrix to a file and have
    both programs read it in before doing their calculations.

    Hi saitology9
    thanks for your advice , in single run test program ,
    the result can be ---> total_time: 46189 microseconds
    but transfer to proc run load data form file in real situation , python more stable and faster (twice ) than tcl
    this is why post real two program , and search for help ,
    change several different data structure still not ideal ...

    the random data for test proc running , and the result speed same as real data Could point me the last proc I post , where can improve.

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Fri Apr 21 23:34:49 2023
    et99 在 2023年4月22日 星期六下午5:57:12 [UTC+12] 的信中寫道:
    On 4/21/2023 7:05 PM, saitology9 wrote:
    On 4/21/2023 8:44 PM, et99 wrote:
    You could consider using tcl threads.

    It would appear that the calculation can be factored
    into several parts that could be done concurrently.

    You could setup the matrix using tsv::set one time
    and call on several threads to each search some
    set of rows or columns, and then collate the results.

    Even my 2 year old intel computer has 12 cores.



    The OP is concerned about a run-time difference of less than 0.05 of a second. No reason to time it now but I would suspect that the thread setup would eat a good portion of any savings it may generate at this threshold.


    The question could be why does this difference matter?


    @Rolance:
    Another potential reason for the difference: Tcl's random number generator may be better than Python's so the algorithm spends more time on each row/column, whereas the Python version may be hitting the break quite early in each loop. I would
    suggest keeping the data set identical for both programs. You can save the matrix to a file and have both programs read it in before doing their calculations.

    Yes, you are right, it depends on whether the OP's example was a
    small test case or not.

    I guessed if OP was concerned with mere 100s of microseconds,
    it was small and/or one of many to be computed.

    I substituted in a tsv shared matrix to see what overhead was there,
    which was only about 15% when I bumped up the rows/columns.

    This is also an example of where an efficient bit/bitfield array
    mechanism would be useful. Maybe Brian's lseq technology might
    be adapted in the future.

    Hi et99
    thanks for your advice ,
    in real situation , already run multi-thread to accreate the calculate time the last post proc is the base unit need implement in single thread
    if still not find the faster way , may replace to python in the base unit

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sat Apr 22 08:04:20 2023
    You can improve speed in TCL (and probably python as well) by:

    stop searching down any column once the position is past the current shortest sequence found.

    when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
    pointer, and increments a reference count).

    In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.

    I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).

    Dave B

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sat Apr 22 02:28:03 2023
    [email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
    You can improve speed in TCL (and probably python as well) by:

    stop searching down any column once the position is past the current shortest sequence found.

    when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
    pointer, and increments a reference count).

    In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.

    I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).

    Dave B

    Hi Dave

    thanks for your advice , already change the search method ,as you mention get the ideal speed
    what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
    different lang have performance gap..
    Could you point out my code where to improve with same search method.

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to [email protected] on Sat Apr 22 10:59:31 2023
    On 4/22/2023 2:26 AM, [email protected] wrote:

    the random data for test proc running , and the result speed same as real data
    Could point me the last proc I post , where can improve.


    You can squeeze out a few microseconds from these changes:

    - instead of sending the whole matrix to find_shortest_hit_org, you can
    use a global or upvar statement.

    - you can improve the looping over the rows and elimintate the expr on
    the row index. Your statement would look like this:

    ---
    for {set i 1} {$i <= $num_rows} {incr i} {
    if {[lindex $matrix($i) $j] == 1} {
    ---


    Other than that, I am not sure what the code does. Others have asked
    for a description and have gotten none. So, don't know what else to say.

    Finally, you may want to check out a solution based on regexp. Maybe
    someone with more regexp chops can help.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to [email protected] on Sat Apr 22 09:39:50 2023
    On 4/22/2023 2:28 AM, [email protected] wrote:
    [email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
    You can improve speed in TCL (and probably python as well) by:

    stop searching down any column once the position is past the current shortest sequence found.

    when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
    pointer, and increments a reference count).

    In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.

    I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).

    Dave B

    Hi Dave

    thanks for your advice , already change the search method ,as you mention get the ideal speed
    what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
    different lang have performance gap..
    Could you point out my code where to improve with same search method.

    BR
    Rolance


    As Dave suggested, using a list of lists is likely the best bet. Then
    you could use [lindex $matrix $i $j] for the test, and don't test against
    being = to 1, just do a boolean test.

    you can see the bytecode with

    tcl::unsupported::disassemble proc find_shortest_hit_org

    or

    tcl::unsupported::disassemble script {if {[lindex $lst 1 1]} {set x 1}}

    where one would see (partially shown):
    ...
    Command 1: "if {[lindex $lst 1 1]} {set x 1}"
    Command 2: "lindex $lst 1 1..."
    (0) push1 0 # "lst"
    (2) loadStk
    (3) push1 1 # "1"
    (5) push1 1 # "1"
    (7) lindexMulti 3
    (12) nop
    (13) jumpFalse1 +9 # pc 22


    so there is a specific byte code for an lindex with more than 1 index
    and you can use the [time] command to just test small script code, say
    to compare an array lookup vs. a list lookup.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sat Apr 22 14:50:39 2023
    saitology9 在 2023年4月23日 星期日凌晨2:59:37 [UTC+12] 的信中寫道:
    On 4/22/2023 2:26 AM, [email protected] wrote:

    the random data for test proc running , and the result speed same as real data
    Could point me the last proc I post , where can improve.

    You can squeeze out a few microseconds from these changes:

    - instead of sending the whole matrix to find_shortest_hit_org, you can
    use a global or upvar statement.

    - you can improve the looping over the rows and elimintate the expr on
    the row index. Your statement would look like this:

    ---
    for {set i 1} {$i <= $num_rows} {incr i} {
    if {[lindex $matrix($i) $j] == 1} {
    ---


    Other than that, I am not sure what the code does. Others have asked
    for a description and have gotten none. So, don't know what else to say.

    Finally, you may want to check out a solution based on regexp. Maybe
    someone with more regexp chops can help.

    Hi saitology9

    thanks for advice , this is a big data analysis program
    the data already pre-scan by customer 's rule , 1 : hit , 0 : non-hit
    already separate several part to save time..
    orginal program develop by Tcl , the search speed may less than other langurage (C++) customer tell me...
    by the budget , no value to rewrite all program to other to improve
    as i post before , same data structure by diferent lang. have obviously speed gap...

    see the ram usage tcl may use more than others , and less machine overload .... may the org code have room to improve.... if have other extension to speed up ...

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sat Apr 22 14:54:30 2023
    et99 在 2023年4月23日 星期日凌晨4:39:55 [UTC+12] 的信中寫道:
    On 4/22/2023 2:28 AM, [email protected] wrote:
    [email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
    You can improve speed in TCL (and probably python as well) by:

    stop searching down any column once the position is past the current shortest sequence found.

    when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
    pointer, and increments a reference count).

    In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.

    I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).

    Dave B

    Hi Dave

    thanks for your advice , already change the search method ,as you mention get the ideal speed
    what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
    different lang have performance gap..
    Could you point out my code where to improve with same search method.

    BR
    Rolance
    As Dave suggested, using a list of lists is likely the best bet. Then
    you could use [lindex $matrix $i $j] for the test, and don't test against being = to 1, just do a boolean test.

    you can see the bytecode with

    tcl::unsupported::disassemble proc find_shortest_hit_org

    or

    tcl::unsupported::disassemble script {if {[lindex $lst 1 1]} {set x 1}}

    where one would see (partially shown):
    ...
    Command 1: "if {[lindex $lst 1 1]} {set x 1}"
    Command 2: "lindex $lst 1 1..."
    (0) push1 0 # "lst"
    (2) loadStk
    (3) push1 1 # "1"
    (5) push1 1 # "1"
    (7) lindexMulti 3
    (12) nop
    (13) jumpFalse1 +9 # pc 22


    so there is a specific byte code for an lindex with more than 1 index
    and you can use the [time] command to just test small script code, say
    to compare an array lookup vs. a list lookup.

    hi et99

    thanks your suggestion , will try and report later.

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to [email protected] on Sat Apr 22 15:23:26 2023
    On 4/22/2023 2:54 PM, [email protected] wrote:
    et99 在 2023年4月23日 星期日凌晨4:39:55 [UTC+12] 的信中寫道:
    On 4/22/2023 2:28 AM, [email protected] wrote:
    [email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
    You can improve speed in TCL (and probably python as well) by:

    stop searching down any column once the position is past the current shortest sequence found.

    when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
    pointer, and increments a reference count).

    In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.

    I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).

    Dave B

    Hi Dave

    thanks for your advice , already change the search method ,as you mention get the ideal speed
    what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
    different lang have performance gap..
    Could you point out my code where to improve with same search method.

    BR
    Rolance
    As Dave suggested, using a list of lists is likely the best bet. Then
    you could use [lindex $matrix $i $j] for the test, and don't test against
    being = to 1, just do a boolean test.

    you can see the bytecode with

    tcl::unsupported::disassemble proc find_shortest_hit_org

    or

    tcl::unsupported::disassemble script {if {[lindex $lst 1 1]} {set x 1}}

    where one would see (partially shown):
    ...
    Command 1: "if {[lindex $lst 1 1]} {set x 1}"
    Command 2: "lindex $lst 1 1..."
    (0) push1 0 # "lst"
    (2) loadStk
    (3) push1 1 # "1"
    (5) push1 1 # "1"
    (7) lindexMulti 3
    (12) nop
    (13) jumpFalse1 +9 # pc 22


    so there is a specific byte code for an lindex with more than 1 index
    and you can use the [time] command to just test small script code, say
    to compare an array lookup vs. a list lookup.

    hi et99

    thanks your suggestion , will try and report later.

    BR
    Rolance


    I did a test using the [lindex $matrix $i $j] and found about a 40% reduction in time.

    It appears that the overhead is beginning to get down to the "if test" and the "for loop".

    This is why Dave is suggesting you use foreach, which has less to do on each iteration, since the test for the end and incrementing the index is effectively done in the C code whereas a for loop does that in script code:



    % tcl::unsupported::disassemble script {for {set i 1} {$i < $num_rows} {incr i} {doit}}
    ByteCode 0x05902750, refCt 1, epoch 17, interp 0x00939060 (epoch 17)
    Source "for {set i 1} {$i < $num_rows} {incr i} {doit}"
    Cmds 4, src 46, inst 39, litObjs 5, aux 0, stkDepth 2, code/src 0.00
    Exception ranges 2, depth 1:
    0: level 0, loop, pc 8-11, continue 13, break 36
    1: level 0, loop, pc 13-25, continue -1, break 36
    Commands 4:
    1: pc 0-37, src 0-45 2: pc 0-4, src 5-11
    3: pc 8-11, src 41-44 4: pc 13-25, src 32-37
    Command 1: "for {set i 1} {$i < $num_rows} {incr i} {doit}"
    Command 2: "set i 1..."
    (0) push1 0 # "i"
    (2) push1 1 # "1"
    (4) storeStk
    (5) pop
    (6) jump1 +21 # pc 27
    Command 3: "doit..."
    (8) push1 2 # "doit"
    (10) invokeStk1 1
    (12) pop
    Command 4: "incr i..."
    (13) startCommand +13 1 # next cmd at pc 26, 1 cmds start here
    (22) push1 0 # "i"
    (24) incrStkImm +1
    (26) pop
    (27) push1 0 # "i"
    (29) loadStk
    (30) push1 3 # "num_rows"
    (32) loadStk
    (33) lt
    (34) jumpTrue1 -26 # pc 8
    (36) push1 4 # ""
    (38) done

    % tcl::unsupported::disassemble script {foreach i $list {doit}}
    ByteCode 0x05902650, refCt 1, epoch 17, interp 0x00939060 (epoch 17)
    Source "foreach i $list {doit}"
    Cmds 1, src 22, inst 12, litObjs 4, aux 0, stkDepth 4, code/src 0.00
    Commands 1:
    1: pc 0-10, src 0-21
    Command 1: "foreach i $list {doit}"
    (0) push1 0 # "foreach"
    (2) push1 1 # "i"
    (4) push1 2 # "list"
    (6) loadStk
    (7) push1 3 # "doit"
    (9) invokeStk1 4
    (11) done

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to All on Sat Apr 22 18:05:22 2023
    On 4/22/2023 3:23 PM, et99 wrote:
    On 4/22/2023 2:54 PM, [email protected] wrote:
    et99 在 2023年4月23日 星期日凌晨4:39:55 [UTC+12] 的信中寫道: >>> On 4/22/2023 2:28 AM, [email protected] wrote:
    [email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
    You can improve speed in TCL (and probably python as well) by:

    stop searching down any column once the position is past the current shortest sequence found.

    when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
    pointer, and increments a reference count).

    In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.

    I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).

    Dave B

    Hi Dave

    thanks for your advice , already change the search method ,as you mention get the ideal speed
    what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
    different lang have performance gap..
    Could you point out my code where to improve with same search method.

    BR
    Rolance
    As Dave suggested, using a list of lists is likely the best bet. Then
    you could use [lindex $matrix $i $j] for the test, and don't test against >>> being = to 1, just do a boolean test.

    you can see the bytecode with

    tcl::unsupported::disassemble proc find_shortest_hit_org

    or

    tcl::unsupported::disassemble script {if {[lindex $lst 1 1]} {set x 1}}

    where one would see (partially shown):
    ...
    Command 1: "if {[lindex $lst 1 1]} {set x 1}"
    Command 2: "lindex $lst 1 1..."
    (0) push1 0 # "lst"
    (2) loadStk
    (3) push1 1 # "1"
    (5) push1 1 # "1"
    (7) lindexMulti 3
    (12) nop
    (13) jumpFalse1 +9 # pc 22


    so there is a specific byte code for an lindex with more than 1 index
    and you can use the [time] command to just test small script code, say
    to compare an array lookup vs. a list lookup.

    hi et99

    thanks your suggestion , will try and report later.

    BR
    Rolance


    I did a test using the [lindex $matrix $i $j] and found about a 40% reduction in time.

    It appears that the overhead is beginning to get down to the "if test" and the
    "for loop".

    This is why Dave is suggesting you use foreach, which has less to do on each iteration, since the test for the end and incrementing the index is effectively
    done in the C code whereas a for loop does that in script code:



    % tcl::unsupported::disassemble script {for {set i 1} {$i < $num_rows} {incr i} {doit}}
    ByteCode 0x05902750, refCt 1, epoch 17, interp 0x00939060 (epoch 17)
      Source "for {set i 1} {$i < $num_rows} {incr i} {doit}"
      Cmds 4, src 46, inst 39, litObjs 5, aux 0, stkDepth 2, code/src 0.00
      Exception ranges 2, depth 1:
          0: level 0, loop, pc 8-11, continue 13, break 36
          1: level 0, loop, pc 13-25, continue -1, break 36
      Commands 4:
          1: pc 0-37, src 0-45        2: pc 0-4, src 5-11
          3: pc 8-11, src 41-44        4: pc 13-25, src 32-37
      Command 1: "for {set i 1} {$i < $num_rows} {incr i} {doit}"
      Command 2: "set i 1..."
        (0) push1 0     # "i"
        (2) push1 1     # "1"
        (4) storeStk
        (5) pop
        (6) jump1 +21     # pc 27
      Command 3: "doit..."
        (8) push1 2     # "doit"
        (10) invokeStk1 1
        (12) pop
      Command 4: "incr i..."
        (13) startCommand +13 1     # next cmd at pc 26, 1 cmds start here
        (22) push1 0     # "i"
        (24) incrStkImm +1
        (26) pop
        (27) push1 0     # "i"
        (29) loadStk
        (30) push1 3     # "num_rows"
        (32) loadStk
        (33) lt
        (34) jumpTrue1 -26     # pc 8
        (36) push1 4     # ""
        (38) done

    % tcl::unsupported::disassemble script {foreach i $list {doit}}
    ByteCode 0x05902650, refCt 1, epoch 17, interp 0x00939060 (epoch 17)
      Source "foreach i $list {doit}"
      Cmds 1, src 22, inst 12, litObjs 4, aux 0, stkDepth 4, code/src 0.00
      Commands 1:
          1: pc 0-10, src 0-21
      Command 1: "foreach i $list {doit}"
        (0) push1 0     # "foreach"
        (2) push1 1     # "i"
        (4) push1 2     # "list"
        (6) loadStk
        (7) push1 3     # "doit"
        (9) invokeStk1 4
        (11) done




    On last thought on using foreach, the index is the item itself, 0 or 1, and so the if test doesn't have to do an array or lindex list lookup.
    Something like this:

    foreach row {list of the rows}
    foreach column $row
    if {$column} ....

    Actually, I couldn't really tell if you searching a row or a column,
    so adjust as needed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sun Apr 23 05:48:10 2023
    I've timed the execution from entry to the proc to just before writing the result to the screen.

    On my machine for the original code as a proc:
    proc ex time 184 ms
    inner loop time 58 ms
    overhead time 126 ms

    For my optimized case (list of lists, foreach ans early out)
    proc ex time 87 ms
    inner loop time 8 ms
    overhead time 79 ms

    The overhead is creating the data to be analyzed, and the outer loops.
    In the optimized case about 90% of the run time is overhead, formatting the data.
    This likely applies to python as well

    I suspect this is the case in your actual application too.
    If you want more speed, you need to minimize the work done to process
    the raw data into what you analyze.
    The test code should be modified to use that raw data
    (or at least minimize the processing of the raw data before analysis).

    Tcl has very good string processing commands, and can handle byte strings as well.
    Perhaps using those data formats directly in the analysis code would increase overall speed.

    Trying to speed up the inner loop at this point is likely wasted effort.

    Dave B

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sun Apr 23 02:47:09 2023
    [email protected] 在 2023年4月23日 星期日下午5:48:13 [UTC+12] 的信中寫道:
    I've timed the execution from entry to the proc to just before writing the result to the screen.

    On my machine for the original code as a proc:
    proc ex time 184 ms
    inner loop time 58 ms
    overhead time 126 ms

    For my optimized case (list of lists, foreach ans early out)
    proc ex time 87 ms
    inner loop time 8 ms
    overhead time 79 ms

    The overhead is creating the data to be analyzed, and the outer loops.
    In the optimized case about 90% of the run time is overhead, formatting the data.
    This likely applies to python as well

    I suspect this is the case in your actual application too.
    If you want more speed, you need to minimize the work done to process
    the raw data into what you analyze.
    The test code should be modified to use that raw data
    (or at least minimize the processing of the raw data before analysis).

    Tcl has very good string processing commands, and can handle byte strings as well.
    Perhaps using those data formats directly in the analysis code would increase overall speed.

    Trying to speed up the inner loop at this point is likely wasted effort.

    Dave B

    Hi Dave
    thanks for your advice, will optimize my code by your suggestion

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to [email protected] on Sun Apr 23 07:54:04 2023
    On 4/22/2023 5:50 PM, [email protected] wrote:

    Hi saitology9

    thanks for advice , this is a big data analysis program
    the data already pre-scan by customer 's rule , 1 : hit , 0 : non-hit already separate several part to save time..
    orginal program develop by Tcl , the search speed may less than other langurage (C++) customer tell me...
    by the budget , no value to rewrite all program to other to improve
    as i post before , same data structure by diferent lang. have obviously speed gap...

    see the ram usage tcl may use more than others , and less machine overload ....
    may the org code have room to improve.... if have other extension to speed up ...



    Thanks. Alas, you are not describing what the program is trying to do.
    Also the improvements we can suggest, we can only test againts the base version, and it looks like you are testing it in the real app which
    introduces a lot of unknowns for us so we can't see what helps and what doesn't.

    A few suggestions as the code you posted can use improvements:
    1) See if you can switch the loop order to rows first. You could go from
    3125 loops to 100.
    2) Modify your loop indexes to start from 1 and avoid extra "expr"
    statements.
    3) No reason for "append seq 0/1". "seq" is simply the last character,
    which is the value of the cell in the if-statement.
    4) Likewise, no reason to do seq_len (twice!), as it is simply the row
    index, which is "i".
    5) No reason to calculate pos_list and shortest_pos.
    6) No reason to do "string length $seq" after breaking out of the loop.
    Just put the "lappend ..." right before the "break".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Sun Apr 23 05:47:11 2023
    saitology9 在 2023年4月23日 星期日晚上11:54:10 [UTC+12] 的信中寫道:
    On 4/22/2023 5:50 PM, [email protected] wrote:

    Hi saitology9

    thanks for advice , this is a big data analysis program
    the data already pre-scan by customer 's rule , 1 : hit , 0 : non-hit already separate several part to save time..
    orginal program develop by Tcl , the search speed may less than other langurage (C++) customer tell me...
    by the budget , no value to rewrite all program to other to improve
    as i post before , same data structure by diferent lang. have obviously speed gap...

    see the ram usage tcl may use more than others , and less machine overload ....
    may the org code have room to improve.... if have other extension to speed up ...


    Hi saitology9
    thanks for your continue support

    Thanks. Alas, you are not describing what the program is trying to do.
    Also the improvements we can suggest, we can only test againts the base version, and it looks like you are testing it in the real app which introduces a lot of unknowns for us so we can't see what helps and what doesn't.

    the real problem is analysis speed , the program i post is the finial key part to generate result
    already follow your advice , and get more improve for speed
    in this structure , no more other effort "Python >> twice than tcl ...." (i can transfer the part individual use other lang. but i intend use tcl )
    look for root cause why the speed gap appear , if do same effort the Python performance more than 4 times?
    list is a best data structure for tcl , do have some extension to faster analysis search " 0 1 0 1 0 1 1...." list ?


    A few suggestions as the code you posted can use improvements:
    1) See if you can switch the loop order to rows first. You could go from 3125 loops to 100.
    2) Modify your loop indexes to start from 1 and avoid extra "expr" statements.
    3) No reason for "append seq 0/1". "seq" is simply the last character,
    which is the value of the cell in the if-statement.

    this part is for customer verify

    4) Likewise, no reason to do seq_len (twice!), as it is simply the row index, which is "i".
    5) No reason to calculate pos_list and shortest_pos.

    this part is for customer verify

    6) No reason to do "string length $seq" after breaking out of the loop.
    Just put the "lappend ..." right before the "break".

    the part for customer rule , need record and check next item with non-hit

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to [email protected] on Sun Apr 23 09:35:40 2023
    On 4/23/2023 8:47 AM, [email protected] wrote:

    this part is for customer verify


    :-)



    the part for customer rule , need record and check next item with non-hit


    The info is already available at that point. You can put it in the
    result you return.

    Final suggestion:

    See if you can put your data in a list indexed by column.
    So your data will look like this:
    matrix(0) {0 1 0 0 ....}
    matrix(1) {1 0 0 1 ....}

    Here is the code to generate this:

    for {set j 0} {$j < $num_columns} {incr j} {
    for {set i 0} {$i < $num_rows} {incr i} {
    lappend matrix($j) [expr {int(rand()*2)}]
    }
    }



    Then you prepare a "key" - this is what you are looking for:

    set key [string trim [string repeat "1 " $shortest_len]]


    Now, we can simplify your search. This should give you the best results:


    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}

    set pos [string first $matrix($j) $key]
    if {$pos >= 0} {
    ## we found the answer
    ## it is at: row=$pos, col=$j
    ## do whatever you need with it
    break
    }
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to All on Sun Apr 23 08:02:15 2023
    On 4/23/2023 6:35 AM, saitology9 wrote:
    On 4/23/2023 8:47 AM, [email protected] wrote:

    this part is for customer verify


    :-)



    the part for customer rule , need record and check next item with non-hit


    The info is already available at that point.  You can put it in the result you return.

    Final suggestion:

    See if you can put your data in a list indexed by column.
    So your data will look like this:
    matrix(0) {0 1 0 0 ....}
    matrix(1) {1 0 0 1 ....}

    Here is the code to generate this:

    for {set j 0} {$j < $num_columns} {incr j} {
    for {set i 0} {$i < $num_rows} {incr i} {
        lappend matrix($j) [expr {int(rand()*2)}]
        }
    }



    Then you prepare a "key" - this is what you are looking for:

    set key [string trim [string repeat "1 " $shortest_len]]


    Now, we can simplify your search.  This should give you the best results:


     for {set j 0} {$j < $num_columns} {incr j} {
       set hits 0
       set seq ""
       set pos {}

       set pos [string first $matrix($j) $key]
       if {$pos >= 0} {
            ## we found the answer
            ## it is at: row=$pos, col=$j
            ## do whatever you need with it
            break
       }
    }







    The below is my final version. I too don't really know what
    the code is trying to do. My guess was that it is looking for
    the tightest cluster of 1 bits in the rows. But all my results begin at position 0 or 1, so there's some bias to clusters beginning
    earlier in the rows. I also swapped row/column at some point
    so I don't know if it's still what was wanted, and I did
    all 0..n-1 for the numerical indices.

    This is a mod of saitology9's version using Daves list of lists idea
    with results (the puts were few enough so didn't change the timing much)


    matrix start 07:45:15.067
    matrix done 07:45:15.442
    time_it 100 3125 25 3125
    RUN 1:
    ================
    j= |0| i= |51| seq_len= |52| seq= |0100000011100000000110111010011100010101110110111011| pos= |1|
    j= |1| i= |39| seq_len= |40| seq= |1101010110000111100111111011100101011011| pos= |0|
    j= |20| i= |38| seq_len= |39| seq= |101101011001101101110111100000111101111| pos= |0|
    j= |94| i= |35| seq_len= |36| seq= |111101010010010111010111111111011011| pos= |0|
    j= |389| i= |34| seq_len= |35| seq= |11010111111101110011011100111111001| pos= |0|
    j= |974| i= |33| seq_len= |34| seq= |1111111110010111111001011111001101| pos= |0|
    j= |1000| i= |30| seq_len= |31| seq= |1100111111111111111111010010111| pos= |0| j= |2344| i= |29| seq_len= |30| seq= |111101111110111100110111111111| pos= |0| calc2
    total_time: 35133 microseconds 35ms 0.035133s
    Average execution time per iteration: 11 microseconds
    shortest_len= |30| shortest_seq= |111101111110111100110111111111| shortest_pos= |0|










    set num_columns 3125
    set num_rows 100
    set num_hits 25
    puts "matrix start [Time]" ; update

    # Generate random matrix but same each time
    expr srand(1)
    set lmatrix {{}}
    for {set i 0} {$i < $num_rows} {incr i} {
    for {set j 0} {$j < $num_columns} {incr j} {
    set v [expr {int(rand()*2)}]
    lset lmatrix $j $i $v
    }
    }
    puts "matrix done [Time]" ; update
    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_columns
    set shortest_pos {}
    set total_time 0

    # put the main calculation in a proc



    proc calculate2 {num_rows num_columns num_hits shortest_len} {
    global matrix lmatrix
    global timings shortest_seq shortest_pos

    set j -1
    foreach row $lmatrix {
    incr j
    set hits 0
    set seq ""
    set pos {}
    set i -1
    foreach column $row {
    incr i
    if {$column} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set ::shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    puts "j= |$j| i= |$i| seq_len= |$seq_len| seq= |$seq| pos= |[lindex $pos 0 ]| "
    }
    break
    }
    }
    # no need to do time calculations here
    lappend timings [clock microseconds]
    }
    puts calc2
    }


    # put the timer in a proc so we can run multiple tests
    proc time_it {num_rows num_columns num_hits shortest_len} {
    global matrix
    global timings


    set timings [list [clock microseconds]]

    # calculate $num_rows $num_columns $num_hits $shortest_len
    calculate2 $num_rows $num_columns $num_hits $shortest_len


    # add up the timing differentials
    set total_time 0
    for {set i 1} {$i < [llength $timings]} {incr i} {
    set delta [expr {[lindex $timings $i] - [lindex $timings [expr {$i - 1}]]}]
    set total_time [expr {$total_time + $delta}]
    }

    # print it
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"

    }




    puts "time_it $num_rows $num_columns $num_hits $shortest_len"
    puts "RUN 1: "
    puts "================"
    time_it $num_rows $num_columns $num_hits $shortest_len

    puts "shortest_len= |$shortest_len| shortest_seq= |$shortest_seq| shortest_pos= |[lindex $shortest_pos 0 ]| "

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Mon Apr 24 12:07:26 2023
    * "[email protected]" <[email protected]>
    | already transfer to proc in two lang.

    | tcl : total time 0.0968s (twice....)
    | python: total time: 0.0496496

    | Could someone point me the key affect factor ? data passing issue ? --<snip-snip>--

    | # Generate random matrix
    | for {set i 0} {$i < $num_rows} {incr i} {
    | for {set j 0} {$j < $num_columns} {incr j} {
    | lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}] --<snip-snip>--
    | for {set i 0} {$i < $num_rows} {incr i} {
    | if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {

    You are still using a hash-based data struct here (matrix()), this will
    slow things down compared to plain list-of-lists (variable 'rows' in
    below code):


    proc do_it_2 {} {
    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    set row [list]
    for {set j 0} {$j < $num_columns} {incr j} {
    lappend row [expr {int(rand()*2)}]
    }
    lappend rows $row
    }

    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos {}
    set total_time 0

    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}
    set start_time [clock microseconds]
    for {set i 0} {$i < $num_rows} {incr i} {
    set elt [lindex $rows $i $j]
    if {$elt == 1} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    }
    break
    }
    }
    set end_time [clock microseconds]
    set iteration_time [expr {$end_time - $start_time}]
    set total_time [expr {$total_time + $iteration_time}]
    }
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"
    }


    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From [email protected]@21:1/5 to All on Mon Apr 24 05:02:25 2023
    Ralf Fassel 在 2023年4月24日 星期一晚上10:07:32 [UTC+12] 的信中寫道:
    * "[email protected]" <[email protected]>
    | already transfer to proc in two lang.

    | tcl : total time 0.0968s (twice....)
    | python: total time: 0.0496496

    | Could someone point me the key affect factor ? data passing issue ? --<snip-snip>--

    | # Generate random matrix
    | for {set i 0} {$i < $num_rows} {incr i} {
    | for {set j 0} {$j < $num_columns} {incr j} {
    | lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}]
    --<snip-snip>--
    | for {set i 0} {$i < $num_rows} {incr i} {
    | if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {
    You are still using a hash-based data struct here (matrix()), this will
    slow things down compared to plain list-of-lists (variable 'rows' in
    below code):


    proc do_it_2 {} {
    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    set row [list]
    for {set j 0} {$j < $num_columns} {incr j} {
    lappend row [expr {int(rand()*2)}]
    }
    lappend rows $row
    }

    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos {}
    set total_time 0
    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}
    set start_time [clock microseconds]
    for {set i 0} {$i < $num_rows} {incr i} {
    set elt [lindex $rows $i $j]
    if {$elt == 1} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    }
    break
    }
    }
    set end_time [clock microseconds]
    set iteration_time [expr {$end_time - $start_time}]
    set total_time [expr {$total_time + $iteration_time}]
    }
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"
    }
    R'

    Hi Ralf

    thanks for the code
    it is the most efficient way , I will change the data structure to fit this the speed loss may in the raw data pass between multi proc
    in real program still twice low than python , will do more research ...

    @ saitology9 , et99
    thanks for your detail explain and value suggestion
    will optimize each proc

    by the way the python version gererate by chatgpt and do some small modification
    let me easy to compare different langure perforamce and resource usage

    BR
    Rolance

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to [email protected] on Mon Apr 24 09:10:34 2023
    On 4/24/2023 5:02 AM, [email protected] wrote:
    Ralf Fassel 在 2023年4月24日 星期一晚上10:07:32 [UTC+12] 的信中寫道:
    * "[email protected]" <[email protected]>
    | already transfer to proc in two lang.

    | tcl : total time 0.0968s (twice....)
    | python: total time: 0.0496496

    | Could someone point me the key affect factor ? data passing issue ?
    --<snip-snip>--

    | # Generate random matrix
    | for {set i 0} {$i < $num_rows} {incr i} {
    | for {set j 0} {$j < $num_columns} {incr j} {
    | lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}]
    --<snip-snip>--
    | for {set i 0} {$i < $num_rows} {incr i} {
    | if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {
    You are still using a hash-based data struct here (matrix()), this will
    slow things down compared to plain list-of-lists (variable 'rows' in
    below code):


    proc do_it_2 {} {
    set num_columns 3125
    set num_rows 100
    set num_hits 25

    # Generate random matrix
    for {set i 0} {$i < $num_rows} {incr i} {
    set row [list]
    for {set j 0} {$j < $num_columns} {incr j} {
    lappend row [expr {int(rand()*2)}]
    }
    lappend rows $row
    }

    # Find shortest sequence of 1's with given number of hits
    set shortest_seq ""
    set shortest_len $num_rows
    set shortest_pos {}
    set total_time 0
    for {set j 0} {$j < $num_columns} {incr j} {
    set hits 0
    set seq ""
    set pos {}
    set start_time [clock microseconds]
    for {set i 0} {$i < $num_rows} {incr i} {
    set elt [lindex $rows $i $j]
    if {$elt == 1} {
    incr hits
    append seq "1"
    lappend pos $i
    } else {
    append seq "0"
    }
    if {$hits >= $num_hits} {
    set seq_len [string length $seq]
    if {$seq_len < $shortest_len} {
    set shortest_len $seq_len
    set shortest_seq $seq
    set shortest_pos $pos
    }
    break
    }
    }
    set end_time [clock microseconds]
    set iteration_time [expr {$end_time - $start_time}]
    set total_time [expr {$total_time + $iteration_time}]
    }
    set avg_time [expr {$total_time / $num_columns}]
    puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"
    }
    R'

    Hi Ralf

    thanks for the code
    it is the most efficient way , I will change the data structure to fit this the speed loss may in the raw data pass between multi proc
    in real program still twice low than python , will do more research ...

    @ saitology9 , et99
    thanks for your detail explain and value suggestion
    will optimize each proc

    by the way the python version gererate by chatgpt and do some small modification
    let me easy to compare different langure perforamce and resource usage

    BR
    Rolance


    chatgpt, now that's interesting. If it can write a python
    program, I wonder if it could write a C program, in particular
    a tcl C extension. That would likely be the best solution
    for performance and still stay within tcl and a single process.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)