* "[email protected]" <[email protected]>
| Could some one give me some hit or direction to improve the speed?
Putting the TCL code in a proc so it gets byte-compiled would be my
first attempt on speedup.
Ralf Fassel <[email protected]> writes:
* "[email protected]" <[email protected]>
| Could some one give me some hit or direction to improve the speed?
Putting the TCL code in a proc so it gets byte-compiled would be myI did not know that. Good to know.
first attempt on speedup.
--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
Am 21.04.2023 um 10:11 schrieb [email protected]:
Hi all
Could some one give me some hit or direction to improve the speed?
same code by different language in same machine....
the program find the minimum target 1 appear times in 100 row by 3125 column
ex matrix: 0 1 0 0 1 0 1 0 1...
1 0 1 0 0 0 0 1 1...
1 1 1 0 0 1 1 0 1..
....
...
the speed gap almost 4 times
tcl(8.6.12) :
total_time: 266766 microseconds 266ms 0.266766s
Average execution time per iteration: 85 microseconds
python(3.9):
total_time: 59829800 nanoseconds 59.8298ms 0.0598298s
Average execution time per iteration: 19145.536 nanoseconds
###tcl code ###
set num_columns 3125
set num_rows 100
set num_hits 25
# Generate random matrix
for {set i 0} {$i < $num_rows} {incr i} {
for {set j 0} {$j < $num_columns} {incr j} {
set matrix([expr {$i + 1}],$j) [expr {int(rand()*2)}]
}
}
# Find shortest sequence of 1's with given number of hits
set shortest_seq ""
set shortest_len $num_rows
set shortest_pos {}
set total_time 0
for {set j 0} {$j < $num_columns} {incr j} {
set hits 0
set seq ""
set pos {}
set start_time [clock microseconds]
for {set i 0} {$i < $num_rows} {incr i} {
if {$matrix([expr {$i + 1}],$j) == 1} {
incr hits
append seq "1"
lappend pos $i
} else {
append seq "0"
}
if {$hits >= $num_hits} {
set seq_len [string length $seq]
if {$seq_len < $shortest_len} {
set shortest_len $seq_len
set shortest_seq $seq
set shortest_pos $pos
}
break
}
}
set end_time [clock microseconds]
set iteration_time [expr {$end_time - $start_time}]
set total_time [expr {$total_time + $iteration_time}]
}
set avg_time [expr {$total_time / $num_columns}]
puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"
####python code ###
import random
import time
num_iterations = 1000
num_columns = 3125
num_rows = 100
num_hits = 25
# Generate random matrix
matrix = []
for i in range(num_rows):
row = []
for j in range(num_columns):
row.append(random.randint(0, 1))
matrix.append(row)
# Find shortest sequence of 1's with given number of hits
shortest_seq = ""
shortest_len = num_rows
shortest_pos = []
total_time = 0
for j in range(num_columns):
hits = 0
seq = ""
pos = []
start_time = time.perf_counter_ns()
for i in range(num_rows):
if matrix[i+1][j] == 1:
hits += 1
seq += "1"
pos.append(i)
else:
seq += "0"
if hits >= num_hits:
seq_len = len(seq)
if seq_len < shortest_len:
shortest_len = seq_len
shortest_seq = seq
shortest_pos = pos
break
end_time = time.perf_counter_ns()
iteration_time = end_time - start_time
total_time += iteration_time
avg_time = total_time / num_columns
print(f"total_time: {total_time} nanoseconds {total_time/1000000.0}ms {total_time/1000000000.0}s")
print(f"Average execution time per iteration: {avg_time} nanoseconds")
Hi HaraldBRRolance,
Rolance
thank you for the question. The principle difference I see between the programs is the way, the matrix is done:
TCL: matrix([expr {$i + 1}],$j)
Python: matrix[i+1][j]
So, on TCL, a hash is used to access the elements. I suppose, on Python,
you have index access.
So, it might be advisable to use a list of lists on TCL to also have
index access.
But I am sure, numeric wizards will answer here and give solutions using VecTCL, which has a native matrix type.
Harald
Hi all
Could some one give me some hit or direction to improve the speed?
same code by different language in same machine....
the program find the minimum target 1 appear times in 100 row by 3125 column
ex matrix: 0 1 0 0 1 0 1 0 1...
1 0 1 0 0 0 0 1 1...
1 1 1 0 0 1 1 0 1..
....
...
the speed gap almost 4 times
tcl(8.6.12) :
total_time: 266766 microseconds 266ms 0.266766s
Average execution time per iteration: 85 microseconds
python(3.9):
total_time: 59829800 nanoseconds 59.8298ms 0.0598298s
Average execution time per iteration: 19145.536 nanoseconds
###tcl code ###
set num_columns 3125
set num_rows 100
set num_hits 25
# Generate random matrix
for {set i 0} {$i < $num_rows} {incr i} {
for {set j 0} {$j < $num_columns} {incr j} {
set matrix([expr {$i + 1}],$j) [expr {int(rand()*2)}]
}
}
# Find shortest sequence of 1's with given number of hits
set shortest_seq ""
set shortest_len $num_rows
set shortest_pos {}
set total_time 0
for {set j 0} {$j < $num_columns} {incr j} {
set hits 0
set seq ""
set pos {}
set start_time [clock microseconds]
for {set i 0} {$i < $num_rows} {incr i} {
if {$matrix([expr {$i + 1}],$j) == 1} {
incr hits
append seq "1"
lappend pos $i
} else {
append seq "0"
}
if {$hits >= $num_hits} {
set seq_len [string length $seq]
if {$seq_len < $shortest_len} {
set shortest_len $seq_len
set shortest_seq $seq
set shortest_pos $pos
}
break
}
}
set end_time [clock microseconds]
set iteration_time [expr {$end_time - $start_time}]
set total_time [expr {$total_time + $iteration_time}]
}
set avg_time [expr {$total_time / $num_columns}]
puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"
####python code ###
import random
import time
num_iterations = 1000
num_columns = 3125
num_rows = 100
num_hits = 25
# Generate random matrix
matrix = []
for i in range(num_rows):
row = []
for j in range(num_columns):
row.append(random.randint(0, 1))
matrix.append(row)
# Find shortest sequence of 1's with given number of hits
shortest_seq = ""
shortest_len = num_rows
shortest_pos = []
total_time = 0
for j in range(num_columns):
hits = 0
seq = ""
pos = []
start_time = time.perf_counter_ns()
for i in range(num_rows):
if matrix[i+1][j] == 1:
hits += 1
seq += "1"
pos.append(i)
else:
seq += "0"
if hits >= num_hits:
seq_len = len(seq)
if seq_len < shortest_len:
shortest_len = seq_len
shortest_seq = seq
shortest_pos = pos
break
end_time = time.perf_counter_ns()
iteration_time = end_time - start_time
total_time += iteration_time
avg_time = total_time / num_columns
print(f"total_time: {total_time} nanoseconds {total_time/1000000.0}ms {total_time/1000000000.0}s")
print(f"Average execution time per iteration: {avg_time} nanoseconds")
BR
Rolance
* "[email protected]" <[email protected]>
| Cecil Westerhof 在 2023年4月21日 星期五晚上9:44:06 [UTC+12] 的信中寫道:
| > Ralf Fassel <[email protected]> writes:
| >
| > > * "[email protected]" <[email protected]>
| > > | Could some one give me some hit or direction to improve the speed?
| > >
| > > Putting the TCL code in a proc so it gets byte-compiled would be my
| > > first attempt on speedup.
--<snip-snip>--
| Hi Ralf
| thanks for your advice , in proc the consume time
| 163413 microseconds per iteration reduce to half ,
On my machine, using a list-based approach as Harald pointed out (and
inside a proc) I get comparable speeds for the TCL and python code:
your original code
total_time: 193136 microseconds 193ms 0.193136s
Average execution time per iteration: 61 microseconds
your original code inside proc
total_time: 94736 microseconds 94ms 0.094736s
Average execution time per iteration: 30 microseconds
using list ("lappend rows $row" instead of set matrix([expr]) and
"lindex $row $column" instead of $matrix([]))
total_time: 69328 microseconds 69ms 0.069328s
Average execution time per iteration: 22 microseconds
python as-is (had to change 'perf_counter_ns' to 'perf_counter' since my python did not know perf_counter_ns):
python3 t.py
total_time: 0.0770318127470091 seconds [...]
Average execution time per iteration: 2.465018007904291e-05 seconds
HTH
R'
Hi all
Could some one give me some hit or direction to improve the speed?
same code by different language in same machine....
the program find the minimum target 1 appear times in 100 row by 3125 column
# add up the timing differentials
set total_time [lindex $timings 0]
Could some one give me some hit or direction to improve the speed?
same code by different language in same machine....
the program find the minimum target 1 appear times in 100 row by 3125 column
ex matrix: 0 1 0 0 1 0 1 0 1...
1 0 1 0 0 0 0 1 1...
1 1 1 0 0 1 1 0 1..
....
...
On 4/21/2023 1:02 PM, saitology9 wrote:Hi saitology9
# add up the timing differentialsThere is a typo in this line which is in time_it.
set total_time [lindex $timings 0]
Please change it as follows:
set total_time 0
saitology9 在 2023年4月22日 星期六清晨5:06:47 [UTC+12] 的信中寫道:
On 4/21/2023 1:02 PM, saitology9 wrote:
# add up the timing differentialsThere is a typo in this line which is in time_it.
set total_time [lindex $timings 0]
Please change it as follows:
set total_time 0Hi saitology9
thanks for your advice
test result :
time_it 100 3125 25 100
RUN 1:
================
total_time: 115791 microseconds 115ms
0.115791s
Average execution time per iteration:
37 microseconds
RUN 2:
================
total_time: 105553 microseconds 105ms
0.105553s
Average execution time per iteration:
33 microseconds
as now below method still be the best solution:
change to "lappend rows $row" method can be
total_time: 66394 microseconds per iteration
BR
Rolance
[email protected] 在 2023年4月22日 星期六上午9:19:00 [UTC+12] 的信中寫道:
saitology9 在 2023年4月22日 星期六清晨5:06:47 [UTC+12] 的信中寫道:
On 4/21/2023 1:02 PM, saitology9 wrote:Hi saitology9
# add up the timing differentialsThere is a typo in this line which is in time_it.
set total_time [lindex $timings 0]
Please change it as follows:
set total_time 0
thanks for your advice
test result :
time_it 100 3125 25 100
RUN 1:
================
total_time: 115791 microseconds 115ms
0.115791s
Average execution time per iteration:
37 microseconds
RUN 2:
================
total_time: 105553 microseconds 105ms
0.105553s
Average execution time per iteration:
33 microseconds
as now below method still be the best solution:
change to "lappend rows $row" method can be
total_time: 66394 microseconds per iteration
BR
Rolance
hi all
already transfer to proc in two lang.
tcl : total time 0.0968s (twice....)
python: total time: 0.0496496
Could someone point me the key affect factor ? data passing issue ?
#### tcl ####
#set num_iterations 1000
set num_columns 3125
set num_rows 100
set num_hits 25
# Generate random matrix
for {set i 0} {$i < $num_rows} {incr i} {
for {set j 0} {$j < $num_columns} {incr j} {
lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}]
}
}
proc find_shortest_hit_org {num_hits mx} {
array set matrix $mx
set num_columns [llength $matrix(1)]
set num_rows [llength [array names matrix]]
# Find shortest sequence of 1's with given number of hits
set shortest_seq ""
set shortest_len $num_rows
set shortest_pos ""
set seq_lst ""
for {set j 0} {$j < $num_columns} {incr j} {
set hits 0
set seq ""
set pos_list ""
set act_hit 0
for {set i 0} {$i < $num_rows} {incr i} {
if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {
incr hits
append seq "1"
lappend pos_list $i
} else {
append seq "0"
}
if {$hits >= $num_hits} {
set seq_len [string length $seq]
if {$seq_len < $shortest_len} {
set shortest_len $seq_len
set shortest_seq $seq
set shortest_pos [join $pos_list ","]
}
if {[string index $seq end] == 0} {set act_hit 1 ; break}
}
}
if {$act_hit == 1} {
lappend seq_lst "$j [string length $seq]"
}
}
return [lrange [lsort -index 1 -increasing -integer $seq_lst] 0 9]
}
set be [clock microseconds]
find_shortest_hit_org 25 [array get matrix]
puts "total time [expr {([clock microseconds] -$be)/1000000.0}]s"
### python ####
import random
import time
num_iterations = 1000
num_columns = 3125
num_rows = 100
num_hits = 25
# Generate random matrix
matrix = []
for i in range(num_rows):
row = []
for j in range(num_columns):
row.append(random.randint(0, 1))
matrix.append(row)
def find_shortest_hit(num_hits, mx):
#matrix = dict(mx)
matrix = mx
#print(matrix[1])
num_columns = len(matrix[1])
num_rows = len(matrix)
shortest_seq = ''
shortest_len = num_rows
shortest_pos = ''
seq_lst = []
for j in range(num_columns):
hits = 0
seq = ''
pos_list = []
act_hit = 0
for i in range(num_rows):
#print(matrix[i+1][j])
if matrix[i+1][j] == 1:
hits += 1
seq += '1'
pos_list.append(str(i))
else:
seq += '0'
if hits >= num_hits:
seq_len = len(seq)
if seq_len < shortest_len:
shortest_len = seq_len
shortest_seq = seq
shortest_pos = ','.join(pos_list)
if seq[-1] == '0':
act_hit = 1
break
if act_hit == 1:
seq_lst.append((j, len(seq)))
return sorted(seq_lst, key=lambda x: x[1])[:10]
start_time = time.perf_counter_ns()
find_shortest_hit(25, matrix)
end_time = time.perf_counter_ns()
iteration_time = end_time - start_time
print(f"total time: {iteration_time/1000000000}")
BR
Rolance
as now below method still be the best solution:
change to "lappend rows $row" method can be
total_time: 66394 microseconds per iteration
Hi saitology9
thanks for your advice
test result :
time_it 100 3125 25 100
RUN 1:
================
total_time: 115791 microseconds 115ms
0.115791s
Average execution time per iteration:
37 microseconds
RUN 2:
================
total_time: 105553 microseconds 105ms
0.105553s
Average execution time per iteration:
33 microseconds
as now below method still be the best solution:
change to "lappend rows $row" method can be
total_time: 66394 microseconds per iteration
You could consider using tcl threads.
It would appear that the calculation can be factored
into several parts that could be done concurrently.
You could setup the matrix using tsv::set one time
and call on several threads to each search some
set of rows or columns, and then collate the results.
Even my 2 year old intel computer has 12 cores.
On 4/21/2023 8:44 PM, et99 wrote:
You could consider using tcl threads.
It would appear that the calculation can be factored
into several parts that could be done concurrently.
You could setup the matrix using tsv::set one time
and call on several threads to each search some
set of rows or columns, and then collate the results.
Even my 2 year old intel computer has 12 cores.
The OP is concerned about a run-time difference of less than 0.05 of a second. No reason to time it now but I would suspect that the thread setup would eat a good portion of any savings it may generate at this threshold.
The question could be why does this difference matter?
@Rolance:keeping the data set identical for both programs. You can save the matrix to a file and have both programs read it in before doing their calculations.
Another potential reason for the difference: Tcl's random number generator may be better than Python's so the algorithm spends more time on each row/column, whereas the Python version may be hitting the break quite early in each loop. I would suggest
On 4/21/2023 8:44 PM, et99 wrote:
You could consider using tcl threads.
It would appear that the calculation can be factored
into several parts that could be done concurrently.
You could setup the matrix using tsv::set one time
and call on several threads to each search some
set of rows or columns, and then collate the results.
Even my 2 year old intel computer has 12 cores.
The OP is concerned about a run-time difference of less than 0.05 of a second. No reason to time it now but I would suspect that the thread
setup would eat a good portion of any savings it may generate at this threshold.
The question could be why does this difference matter?
@Rolance:
Another potential reason for the difference: Tcl's random number
generator may be better than Python's so the algorithm spends more time
on each row/column, whereas the Python version may be hitting the break quite early in each loop. I would suggest keeping the data set
identical for both programs. You can save the matrix to a file and have
both programs read it in before doing their calculations.
On 4/21/2023 7:05 PM, saitology9 wrote:
On 4/21/2023 8:44 PM, et99 wrote:
You could consider using tcl threads.
It would appear that the calculation can be factored
into several parts that could be done concurrently.
You could setup the matrix using tsv::set one time
and call on several threads to each search some
set of rows or columns, and then collate the results.
Even my 2 year old intel computer has 12 cores.
The OP is concerned about a run-time difference of less than 0.05 of a second. No reason to time it now but I would suspect that the thread setup would eat a good portion of any savings it may generate at this threshold.
The question could be why does this difference matter?
suggest keeping the data set identical for both programs. You can save the matrix to a file and have both programs read it in before doing their calculations.@Rolance:
Another potential reason for the difference: Tcl's random number generator may be better than Python's so the algorithm spends more time on each row/column, whereas the Python version may be hitting the break quite early in each loop. I would
Yes, you are right, it depends on whether the OP's example was a
small test case or not.
I guessed if OP was concerned with mere 100s of microseconds,
it was small and/or one of many to be computed.
I substituted in a tsv shared matrix to see what overhead was there,
which was only about 15% when I bumped up the rows/columns.
This is also an example of where an efficient bit/bitfield array
mechanism would be useful. Maybe Brian's lseq technology might
be adapted in the future.
You can improve speed in TCL (and probably python as well) by:pointer, and increments a reference count).
stop searching down any column once the position is past the current shortest sequence found.
when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.
I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).
Dave B
the random data for test proc running , and the result speed same as real data
Could point me the last proc I post , where can improve.
[email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:pointer, and increments a reference count).
You can improve speed in TCL (and probably python as well) by:
stop searching down any column once the position is past the current shortest sequence found.
when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.
I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).
Dave B
Hi Dave
thanks for your advice , already change the search method ,as you mention get the ideal speed
what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
different lang have performance gap..
Could you point out my code where to improve with same search method.
BR
Rolance
On 4/22/2023 2:26 AM, [email protected] wrote:
the random data for test proc running , and the result speed same as real data
Could point me the last proc I post , where can improve.
You can squeeze out a few microseconds from these changes:
- instead of sending the whole matrix to find_shortest_hit_org, you can
use a global or upvar statement.
- you can improve the looping over the rows and elimintate the expr on
the row index. Your statement would look like this:
---
for {set i 1} {$i <= $num_rows} {incr i} {
if {[lindex $matrix($i) $j] == 1} {
---
Other than that, I am not sure what the code does. Others have asked
for a description and have gotten none. So, don't know what else to say.
Finally, you may want to check out a solution based on regexp. Maybe
someone with more regexp chops can help.
On 4/22/2023 2:28 AM, [email protected] wrote:pointer, and increments a reference count).
[email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
You can improve speed in TCL (and probably python as well) by:
stop searching down any column once the position is past the current shortest sequence found.
when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.
I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).
Dave B
Hi Dave
thanks for your advice , already change the search method ,as you mention get the ideal speed
what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
different lang have performance gap..
Could you point out my code where to improve with same search method.
BRAs Dave suggested, using a list of lists is likely the best bet. Then
Rolance
you could use [lindex $matrix $i $j] for the test, and don't test against being = to 1, just do a boolean test.
you can see the bytecode with
tcl::unsupported::disassemble proc find_shortest_hit_org
or
tcl::unsupported::disassemble script {if {[lindex $lst 1 1]} {set x 1}}
where one would see (partially shown):
...
Command 1: "if {[lindex $lst 1 1]} {set x 1}"
Command 2: "lindex $lst 1 1..."
(0) push1 0 # "lst"
(2) loadStk
(3) push1 1 # "1"
(5) push1 1 # "1"
(7) lindexMulti 3
(12) nop
(13) jumpFalse1 +9 # pc 22
so there is a specific byte code for an lindex with more than 1 index
and you can use the [time] command to just test small script code, say
to compare an array lookup vs. a list lookup.
et99 在 2023年4月23日 星期日凌晨4:39:55 [UTC+12] 的信中寫道:pointer, and increments a reference count).
On 4/22/2023 2:28 AM, [email protected] wrote:
[email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
You can improve speed in TCL (and probably python as well) by:
stop searching down any column once the position is past the current shortest sequence found.
when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
As Dave suggested, using a list of lists is likely the best bet. Then
In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.
I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).
Dave B
Hi Dave
thanks for your advice , already change the search method ,as you mention get the ideal speed
what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
different lang have performance gap..
Could you point out my code where to improve with same search method.
BR
Rolance
you could use [lindex $matrix $i $j] for the test, and don't test against
being = to 1, just do a boolean test.
you can see the bytecode with
tcl::unsupported::disassemble proc find_shortest_hit_org
or
tcl::unsupported::disassemble script {if {[lindex $lst 1 1]} {set x 1}}
where one would see (partially shown):
...
Command 1: "if {[lindex $lst 1 1]} {set x 1}"
Command 2: "lindex $lst 1 1..."
(0) push1 0 # "lst"
(2) loadStk
(3) push1 1 # "1"
(5) push1 1 # "1"
(7) lindexMulti 3
(12) nop
(13) jumpFalse1 +9 # pc 22
so there is a specific byte code for an lindex with more than 1 index
and you can use the [time] command to just test small script code, say
to compare an array lookup vs. a list lookup.
hi et99
thanks your suggestion , will try and report later.
BR
Rolance
On 4/22/2023 2:54 PM, [email protected] wrote:pointer, and increments a reference count).
et99 在 2023年4月23日 星期日凌晨4:39:55 [UTC+12] 的信中寫道: >>> On 4/22/2023 2:28 AM, [email protected] wrote:
[email protected] 在 2023年4月22日 星期六晚上8:04:24 [UTC+12] 的信中寫道:
You can improve speed in TCL (and probably python as well) by:
stop searching down any column once the position is past the current shortest sequence found.
when a new short sequence is found, save only enough information to generate the result. Only when all columns are searched, do the work to generate the result. Saving copies of even large structures is very fast in TCL (the C code just copies of a
As Dave suggested, using a list of lists is likely the best bet. Then
In TCL if the data is organized as a list of sequences (which are lists of 0 and 1) the foreach command can be used to process the list of columns and the list of bits in each column.
I'm seeing about a 10x speed up over the original code by using these suggestions, where both are procedures (to get them compiled).
Dave B
Hi Dave
thanks for your advice , already change the search method ,as you mention get the ideal speed
what I post the topic is search for more help , if the search method no room to improve , how to improve tcl performance ...
different lang have performance gap..
Could you point out my code where to improve with same search method.
BR
Rolance
you could use [lindex $matrix $i $j] for the test, and don't test against >>> being = to 1, just do a boolean test.
you can see the bytecode with
tcl::unsupported::disassemble proc find_shortest_hit_org
or
tcl::unsupported::disassemble script {if {[lindex $lst 1 1]} {set x 1}}
where one would see (partially shown):
...
Command 1: "if {[lindex $lst 1 1]} {set x 1}"
Command 2: "lindex $lst 1 1..."
(0) push1 0 # "lst"
(2) loadStk
(3) push1 1 # "1"
(5) push1 1 # "1"
(7) lindexMulti 3
(12) nop
(13) jumpFalse1 +9 # pc 22
so there is a specific byte code for an lindex with more than 1 index
and you can use the [time] command to just test small script code, say
to compare an array lookup vs. a list lookup.
hi et99
thanks your suggestion , will try and report later.
BR
Rolance
I did a test using the [lindex $matrix $i $j] and found about a 40% reduction in time.
It appears that the overhead is beginning to get down to the "if test" and the
"for loop".
This is why Dave is suggesting you use foreach, which has less to do on each iteration, since the test for the end and incrementing the index is effectively
done in the C code whereas a for loop does that in script code:
% tcl::unsupported::disassemble script {for {set i 1} {$i < $num_rows} {incr i} {doit}}
ByteCode 0x05902750, refCt 1, epoch 17, interp 0x00939060 (epoch 17)
Source "for {set i 1} {$i < $num_rows} {incr i} {doit}"
Cmds 4, src 46, inst 39, litObjs 5, aux 0, stkDepth 2, code/src 0.00
Exception ranges 2, depth 1:
0: level 0, loop, pc 8-11, continue 13, break 36
1: level 0, loop, pc 13-25, continue -1, break 36
Commands 4:
1: pc 0-37, src 0-45 2: pc 0-4, src 5-11
3: pc 8-11, src 41-44 4: pc 13-25, src 32-37
Command 1: "for {set i 1} {$i < $num_rows} {incr i} {doit}"
Command 2: "set i 1..."
(0) push1 0 # "i"
(2) push1 1 # "1"
(4) storeStk
(5) pop
(6) jump1 +21 # pc 27
Command 3: "doit..."
(8) push1 2 # "doit"
(10) invokeStk1 1
(12) pop
Command 4: "incr i..."
(13) startCommand +13 1 # next cmd at pc 26, 1 cmds start here
(22) push1 0 # "i"
(24) incrStkImm +1
(26) pop
(27) push1 0 # "i"
(29) loadStk
(30) push1 3 # "num_rows"
(32) loadStk
(33) lt
(34) jumpTrue1 -26 # pc 8
(36) push1 4 # ""
(38) done
% tcl::unsupported::disassemble script {foreach i $list {doit}}
ByteCode 0x05902650, refCt 1, epoch 17, interp 0x00939060 (epoch 17)
Source "foreach i $list {doit}"
Cmds 1, src 22, inst 12, litObjs 4, aux 0, stkDepth 4, code/src 0.00
Commands 1:
1: pc 0-10, src 0-21
Command 1: "foreach i $list {doit}"
(0) push1 0 # "foreach"
(2) push1 1 # "i"
(4) push1 2 # "list"
(6) loadStk
(7) push1 3 # "doit"
(9) invokeStk1 4
(11) done
I've timed the execution from entry to the proc to just before writing the result to the screen.
On my machine for the original code as a proc:
proc ex time 184 ms
inner loop time 58 ms
overhead time 126 ms
For my optimized case (list of lists, foreach ans early out)
proc ex time 87 ms
inner loop time 8 ms
overhead time 79 ms
The overhead is creating the data to be analyzed, and the outer loops.
In the optimized case about 90% of the run time is overhead, formatting the data.
This likely applies to python as well
I suspect this is the case in your actual application too.
If you want more speed, you need to minimize the work done to process
the raw data into what you analyze.
The test code should be modified to use that raw data
(or at least minimize the processing of the raw data before analysis).
Tcl has very good string processing commands, and can handle byte strings as well.
Perhaps using those data formats directly in the analysis code would increase overall speed.
Trying to speed up the inner loop at this point is likely wasted effort.
Dave B
Hi saitology9
thanks for advice , this is a big data analysis program
the data already pre-scan by customer 's rule , 1 : hit , 0 : non-hit already separate several part to save time..
orginal program develop by Tcl , the search speed may less than other langurage (C++) customer tell me...
by the budget , no value to rewrite all program to other to improve
as i post before , same data structure by diferent lang. have obviously speed gap...
see the ram usage tcl may use more than others , and less machine overload ....
may the org code have room to improve.... if have other extension to speed up ...
On 4/22/2023 5:50 PM, [email protected] wrote:
Hi saitology9
thanks for advice , this is a big data analysis program
the data already pre-scan by customer 's rule , 1 : hit , 0 : non-hit already separate several part to save time..
orginal program develop by Tcl , the search speed may less than other langurage (C++) customer tell me...
by the budget , no value to rewrite all program to other to improve
as i post before , same data structure by diferent lang. have obviously speed gap...
see the ram usage tcl may use more than others , and less machine overload ....
may the org code have room to improve.... if have other extension to speed up ...
Thanks. Alas, you are not describing what the program is trying to do.
Also the improvements we can suggest, we can only test againts the base version, and it looks like you are testing it in the real app which introduces a lot of unknowns for us so we can't see what helps and what doesn't.
A few suggestions as the code you posted can use improvements:
1) See if you can switch the loop order to rows first. You could go from 3125 loops to 100.
2) Modify your loop indexes to start from 1 and avoid extra "expr" statements.
3) No reason for "append seq 0/1". "seq" is simply the last character,
which is the value of the cell in the if-statement.
4) Likewise, no reason to do seq_len (twice!), as it is simply the row index, which is "i".
5) No reason to calculate pos_list and shortest_pos.
6) No reason to do "string length $seq" after breaking out of the loop.
Just put the "lappend ..." right before the "break".
this part is for customer verify
the part for customer rule , need record and check next item with non-hit
On 4/23/2023 8:47 AM, [email protected] wrote:
this part is for customer verify
:-)
the part for customer rule , need record and check next item with non-hit
The info is already available at that point. You can put it in the result you return.
Final suggestion:
See if you can put your data in a list indexed by column.
So your data will look like this:
matrix(0) {0 1 0 0 ....}
matrix(1) {1 0 0 1 ....}
Here is the code to generate this:
for {set j 0} {$j < $num_columns} {incr j} {
for {set i 0} {$i < $num_rows} {incr i} {
lappend matrix($j) [expr {int(rand()*2)}]
}
}
Then you prepare a "key" - this is what you are looking for:
set key [string trim [string repeat "1 " $shortest_len]]
Now, we can simplify your search. This should give you the best results:
for {set j 0} {$j < $num_columns} {incr j} {
set hits 0
set seq ""
set pos {}
set pos [string first $matrix($j) $key]
if {$pos >= 0} {
## we found the answer
## it is at: row=$pos, col=$j
## do whatever you need with it
break
}
}
* "[email protected]" <[email protected]>
| already transfer to proc in two lang.
| tcl : total time 0.0968s (twice....)
| python: total time: 0.0496496
| Could someone point me the key affect factor ? data passing issue ? --<snip-snip>--
| # Generate random matrix
| for {set i 0} {$i < $num_rows} {incr i} {
| for {set j 0} {$j < $num_columns} {incr j} {
| lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}]
--<snip-snip>--
| for {set i 0} {$i < $num_rows} {incr i} {
| if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {
You are still using a hash-based data struct here (matrix()), this will
slow things down compared to plain list-of-lists (variable 'rows' in
below code):
proc do_it_2 {} {
set num_columns 3125
set num_rows 100
set num_hits 25
# Generate random matrix
for {set i 0} {$i < $num_rows} {incr i} {
set row [list]
for {set j 0} {$j < $num_columns} {incr j} {
lappend row [expr {int(rand()*2)}]
}
lappend rows $row
}
# Find shortest sequence of 1's with given number of hits
set shortest_seq ""
set shortest_len $num_rows
set shortest_pos {}
set total_time 0
for {set j 0} {$j < $num_columns} {incr j} {
set hits 0
set seq ""
set pos {}
set start_time [clock microseconds]
for {set i 0} {$i < $num_rows} {incr i} {
set elt [lindex $rows $i $j]
if {$elt == 1} {
incr hits
append seq "1"
lappend pos $i
} else {
append seq "0"
}
if {$hits >= $num_hits} {
set seq_len [string length $seq]
if {$seq_len < $shortest_len} {
set shortest_len $seq_len
set shortest_seq $seq
set shortest_pos $pos
}
break
}
}
set end_time [clock microseconds]
set iteration_time [expr {$end_time - $start_time}]
set total_time [expr {$total_time + $iteration_time}]
}
set avg_time [expr {$total_time / $num_columns}]
puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"
}
R'
Ralf Fassel 在 2023年4月24日 星期一晚上10:07:32 [UTC+12] 的信中寫道:
* "[email protected]" <[email protected]>
| already transfer to proc in two lang.
| tcl : total time 0.0968s (twice....)
| python: total time: 0.0496496
| Could someone point me the key affect factor ? data passing issue ?
--<snip-snip>--
| # Generate random matrix
| for {set i 0} {$i < $num_rows} {incr i} {
| for {set j 0} {$j < $num_columns} {incr j} {
| lappend matrix([expr {$i + 1}]) [expr {int(rand()*2)}]
--<snip-snip>--
| for {set i 0} {$i < $num_rows} {incr i} {
| if {[lindex $matrix([expr {$i + 1}]) $j] == 1} {
You are still using a hash-based data struct here (matrix()), this will
slow things down compared to plain list-of-lists (variable 'rows' in
below code):
proc do_it_2 {} {
set num_columns 3125
set num_rows 100
set num_hits 25
# Generate random matrix
for {set i 0} {$i < $num_rows} {incr i} {
set row [list]
for {set j 0} {$j < $num_columns} {incr j} {
lappend row [expr {int(rand()*2)}]
}
lappend rows $row
}
# Find shortest sequence of 1's with given number of hits
set shortest_seq ""
set shortest_len $num_rows
set shortest_pos {}
set total_time 0
for {set j 0} {$j < $num_columns} {incr j} {
set hits 0
set seq ""
set pos {}
set start_time [clock microseconds]
for {set i 0} {$i < $num_rows} {incr i} {
set elt [lindex $rows $i $j]
if {$elt == 1} {
incr hits
append seq "1"
lappend pos $i
} else {
append seq "0"
}
if {$hits >= $num_hits} {
set seq_len [string length $seq]
if {$seq_len < $shortest_len} {
set shortest_len $seq_len
set shortest_seq $seq
set shortest_pos $pos
}
break
}
}
set end_time [clock microseconds]
set iteration_time [expr {$end_time - $start_time}]
set total_time [expr {$total_time + $iteration_time}]
}
set avg_time [expr {$total_time / $num_columns}]
puts "total_time: $total_time microseconds [expr $total_time/1000]ms [expr $total_time/1000000.0]s \nAverage execution time per iteration: $avg_time microseconds"
}
R'
Hi Ralf
thanks for the code
it is the most efficient way , I will change the data structure to fit this the speed loss may in the raw data pass between multi proc
in real program still twice low than python , will do more research ...
@ saitology9 , et99
thanks for your detail explain and value suggestion
will optimize each proc
by the way the python version gererate by chatgpt and do some small modification
let me easy to compare different langure perforamce and resource usage
BR
Rolance
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (0 / 16) |
| Uptime: | 162:58:05 |
| Calls: | 12,095 |
| Calls today: | 3 |
| Files: | 15,000 |
| Messages: | 6,517,783 |