Forum: >>> Magnum BBS <<<

Re: Thread with -async exits prematurely

From Rich@21:1/5 to Luis Mendes on Wed Jun 19 15:02:28 2024

Luis Mendes <[email protected]> wrote:

Hi all!

My program is working fine when thread::send don't use the -async
option. When it does, all of those created threads exit prematurely.

The pseudo-code I have is this:

Working code that you've tested to exhibit the bug you see is
preferable, and your code was *very* close....

===== main file

while 1 {
...
while {nr_live_threads < nr_max_threads} {

This will error out as a syntax error. You want to both initialize
these variables before you use them, and to interpolate them using $
above.

set tid [thread::create $init_script]
thread::send -async $tid [list sourceFiles ....]
}
after 10000
}

You never increment nr_live_threads, so this loop above will (assuming
the variables were initalized, and referenced, correctly) simply loop
forever, creating new threads. At least until the whole process is
killed for using all free memory up.

===== oo.tcl
namespace eval ns0 {
proc runAnsible {...} {
Parse new ...
vwait ::exit_flag
}
}

You never signal to the master thread that this thread has exited, so
the master (as written here) will never launch a new thread when an
existing one finishes.

This comprises the important parts of the script, I think.
When thread::send does not use `-async`, the `vwait ::exit_flag` works and the thread is run until the end.
With `-async`, the thread exits shortly after the `thread::send` command.

Something must be different in the "psudeo" code vs. your real code
then.

I've read about `thread::preserve` and `thread::release`, but interpreted
it as necessary when threads have to be orchestrated and some may be dependent on the results of others.

No, those are to do reference counting for thread cleanup.

What I want is really to have several threads launched in the same moment,
at each run of the while loop that checks if the number of active threads
is less than the nr_max_threads.
How can that be accomplished?

Well, first, you have to communicate the exit of a child thread back to
he main thread, and have that comm path decrement "nr_active" (and you
also need to increment nr_active when you launch a new thread).

Syntax cleaned up -- and simplified version of your original code, that *actually runs*:

thread-test:
#!/usr/bin/tclsh

package require Thread

set nr_max_threads 4
set nr_live_threads 0

set init_script {
puts stderr "Thread: [thread::id] Init: creating sourceFiles"
proc sourceFiles {args} {
source oo.tl
ns0::runAnsible $args
}
puts stderr "Thread: [thread::id] waiting"
thread::wait
puts stderr "Thread: [thread::id] out of wait"
}

while 1 {
while {$nr_live_threads < $nr_max_threads} {
puts stderr "Main: live=$nr_live_threads max=$nr_max_threads"
set tid [thread::create $init_script]
incr nr_live_threads
thread::send -async $tid [list sourceFiles ....]
}
puts stderr "Main: sleeping for 10s"
after 10000
puts stderr "Main: awake"
}

oo.tl:
namespace eval ns0 {
proc runAnsible {...} {
puts stderr "Thread [thread::id]: executing parse new"
Parse new ...
puts stderr "Thread [thread::id]: vwait begin"
vwait ::exit_flag
puts stderr "Thread [thread::id]: vwait complete"
}
}
oo::class create Parse {
constructor {...} {
set random [expr {int(rand()*20000)}]
puts stderr "Thread [thread::id]: object constructor - sleeping for $random"
after $random [list set ::exit_flag 1]
}
}

Sample run of the above:

$ ./thread-test
Main: live=0 max=4
Thread: tid0x7fbfa7b6c640 Init: creating sourceFiles
Thread: tid0x7fbfa7b6c640 waiting
Main: live=1 max=4
Thread tid0x7fbfa7b6c640: executing parse new
Thread tid0x7fbfa7b6c640: object constructor - sleeping for 8476
Thread tid0x7fbfa7b6c640: vwait begin
Thread: tid0x7fbfa6b6a640 Init: creating sourceFiles
Thread: tid0x7fbfa6b6a640 waiting
Main: live=2 max=4
Thread tid0x7fbfa6b6a640: executing parse new
Thread tid0x7fbfa6b6a640: object constructor - sleeping for 16806
Thread tid0x7fbfa6b6a640: vwait begin
Thread: tid0x7fbfa6369640 Init: creating sourceFiles
Thread: tid0x7fbfa6369640 waiting
Main: live=3 max=4
Thread tid0x7fbfa6369640: executing parse new
Thread tid0x7fbfa6369640: object constructor - sleeping for 11225
Thread tid0x7fbfa6369640: vwait begin
Thread: tid0x7fbfa5b68640 Init: creating sourceFiles
Thread: tid0x7fbfa5b68640 waiting
Main: sleeping for 10s
Thread tid0x7fbfa5b68640: executing parse new
Thread tid0x7fbfa5b68640: object constructor - sleeping for 5573
Thread tid0x7fbfa5b68640: vwait begin
Thread tid0x7fbfa5b68640: vwait complete
Thread tid0x7fbfa7b6c640: vwait complete
Main: awake
Main: sleeping for 10s
Thread tid0x7fbfa6369640: vwait complete
Thread tid0x7fbfa6b6a640: vwait complete
Main: awake
Main: sleeping for 10s
Main: awake
Main: sleeping for 10s

And, it will continue to loop saying 'awake' and 'sleeping' since the
exit of the children is never communicated to the master.

You need to master to become aware that one of the children has exited,
so it knows to relaunch another child. One way is to use the
additional result variable for -async threads and vwait on that
variable in the master.

Here is the 'diff' necessary to have the master monitor children
exiting and to launch a new child when that happens:

--- thread-test.v1 2024-06-19 10:42:34.359605931 -0400
+++ thread-test 2024-06-19 11:00:01.433949725 -0400
@@ -4,6 +4,7 @@

set nr_max_threads 4
set nr_live_threads 0
+set sync 0

set init_script {
puts stderr "Thread: [thread::id] Init: creating sourceFiles"
@@ -21,10 +22,10 @@
puts stderr "Main: live=$nr_live_threads max=$nr_max_threads"
set tid [thread::create $init_script]
incr nr_live_threads
- thread::send -async $tid [list sourceFiles ....]
+ thread::send -async $tid [list sourceFiles ....] sync
}
- puts stderr "Main: sleeping for 10s"
- after 10000
- puts stderr "Main: awake"
+ puts stderr "Main: waiting for a child to exit"
+ vwait sync
+ puts stderr "Main: a child exited"
+ incr nr_live_threads -1
}

And a sample run:

$ ./thread-test
Main: live=0 max=4
Thread: tid0x7f2d21a4b640 Init: creating sourceFiles
Thread: tid0x7f2d21a4b640 waiting
Main: live=1 max=4
Thread tid0x7f2d21a4b640: executing parse new
Thread tid0x7f2d21a4b640: object constructor - sleeping for 19992
Thread tid0x7f2d21a4b640: vwait begin
Thread: tid0x7f2d20a49640 Init: creating sourceFiles
Thread: tid0x7f2d20a49640 waiting
Main: live=2 max=4
Thread tid0x7f2d20a49640: executing parse new
Thread tid0x7f2d20a49640: object constructor - sleeping for 8316
Thread tid0x7f2d20a49640: vwait begin
Thread: tid0x7f2d1bfff640 Init: creating sourceFiles
Thread: tid0x7f2d1bfff640 waiting
Main: live=3 max=4
Thread tid0x7f2d1bfff640: executing parse new
Thread tid0x7f2d1bfff640: object constructor - sleeping for 17902
Thread tid0x7f2d1bfff640: vwait begin
Thread: tid0x7f2d1b7fe640 Init: creating sourceFiles
Thread: tid0x7f2d1b7fe640 waiting
Main: waiting for a child to exit
Thread tid0x7f2d1b7fe640: executing parse new
Thread tid0x7f2d1b7fe640: object constructor - sleeping for 12322
Thread tid0x7f2d1b7fe640: vwait begin
Thread tid0x7f2d20a49640: vwait complete
Main: a child exited
Main: live=3 max=4
Thread: tid0x7f2d1affd640 Init: creating sourceFiles
Thread: tid0x7f2d1affd640 waiting
Main: waiting for a child to exit
Thread tid0x7f2d1affd640: executing parse new
Thread tid0x7f2d1affd640: object constructor - sleeping for 7521
Thread tid0x7f2d1affd640: vwait begin
Thread tid0x7f2d1b7fe640: vwait complete
Main: a child exited
Main: live=3 max=4
Thread: tid0x7f2d1a7fc640 Init: creating sourceFiles
Thread: tid0x7f2d1a7fc640 waiting
Main: waiting for a child to exit
Thread tid0x7f2d1a7fc640: executing parse new
Thread tid0x7f2d1a7fc640: object constructor - sleeping for 9508
Thread tid0x7f2d1a7fc640: vwait begin
Thread tid0x7f2d1affd640: vwait complete
Main: a child exited
Main: live=3 max=4
Thread: tid0x7f2d19ffb640 Init: creating sourceFiles
Thread: tid0x7f2d19ffb640 waiting
Main: waiting for a child to exit
Thread tid0x7f2d19ffb640: executing parse new
Thread tid0x7f2d19ffb640: object constructor - sleeping for 13232
Thread tid0x7f2d19ffb640: vwait begin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From et99@21:1/5 to Luis Mendes on Wed Jun 19 15:37:08 2024

On 6/19/2024 3:49 AM, Luis Mendes wrote:

Hi all!

My program is working fine when thread::send don't use the -async option. When it does, all of those created threads exit prematurely.

The pseudo-code I have is this:

-snip-

What I want is really to have several threads launched in the same moment,
at each run of the while loop that checks if the number of active threads
is less than the nr_max_threads.
How can that be accomplished?

Thanks,

Luís

I can't tell your true intent from the pseudo code alone. As Rich said, non-working but real code is preferred here, then people can run the code and help you debug it.

Are you re-using threads or creating new ones and are you using threads as workers to process jobs? Is this going to eventually be a multi-server, single-queue implementation? Do all threads run the same code?

I have a way of doing those kinds of threads you might find interesting; it's sort of object oriented threads.

------------------------------

set script {
package require... ;# one time setups
source ....

global var1 var2 ...
proc init {args} ;# like a constructor, one time to init the thread's var's
lassign $args ::var1 ::var2 ....
}
proc work {arg1 arg2 ...} { ;# like a method
...
return result
}
... including any oo code ...

thread::wait ;# don't exit, re-use

}

# now to create a worker thread(s) and init it:

set tid(1) [thread::create $script] ;# 2,3, ...
thread::send $tid(1) [list init value1 value2 ...] ;# similar to an OO constructor

#Then sync or async job requests:

set result($N) [thead::send $tid($N) [list work ...arglist...]] ;# sync call to thread N

# async call
unset -nocomplain result(1)
set status [thread::send -async $tid(1) [list work ...arglist...] result(1)]
# .... other stuff before waiting including other work and update or vwait calls ....
if {![info exist result(1)]} {vwait result(1)} ;#conditional vwait

------------------------------

Some notes:

Each thread has it's own interpreter, so global data in a thread is not "global" to the program nor visible to other threads (or the main thread). So, unless you really need multiple namespaces in a thread, global variables might be simpler to use.

Re-use threads; you can just "call" it (like calling a method) for each new job you want done. No need to package require or "source" code more than once (per thread).

End the script with a thread::wait, with nothing after that (see manual's warning).

The unset/if not exist technique protects against any entering of the event loop before vwait-ing. Can wait for 1 or all or any combo. Can also set a write trace on a result variable in lieu of doing a vwait.

Each thread needs it's own result variable (which resides in the main thread) where it both signals a job is done, and also can return the job's value (scalars, array element, lists, dicts, etc).

You will need to keep track of the tid's and the result's by using your own thread index. You might also need to create a job queue.

This is essentially how my Tasks module works, except it hides all the above details. You can also use the tpool and ttrace packages, which has many of the same and lots more features.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich@21:1/5 to Luis Mendes on Wed Jun 26 22:12:48 2024

Luis Mendes <[email protected]> wrote:

Hi Rich,

Once again, thank you very much for your help.
I could manage to application run in a multi-threaded way.

Still, there are a couple of things that I haven't yet understood, maybe
you (or other person) can help me figure this out.

1. Regarding vwait
As stated in https://www.tcl-lang.org/man/tcl/TclCmd/vwait.htm
"""It continues processing events until some event handler sets the value
of the global variable varName. Once varName has been set, the vwait
command will return as soon as the event handler that modified varName completes."""
This was a difficulty I had before, maybe because English is not my main language.
I thought that varName would have to change for every event handler that signaled the end of some operation, like:

It is not the "value" that triggers vwait, it is the action of
"writing" (any write of anything) to the variable.

2. Regarding the script that you modified, I changed it a bit as well to
show what I don't understand.
The thread::names command shows all the threads that have been created
from the start of execution and not only the ones that were created in the last cycle.
And this is confirmed by thread::exists that show 1 for all of them.
I was expecting that threads would no longer exist after an event handler sets a varName for vwait.
Otherwise, we can end up with millions of threads existing at the same
time.

That is because that version does not "delete" the threads that finish (preferably you'd reuse threads rather than delete and recreate them
over and over so I did not try to 'release' them when they finished).

Creation of a thread takes some non-zero amount of time, so if you
reuse existing threads you amortize the creation time across each
usage. If you recreate them anew each time, you pay the cost (in time)
to create them each time you use them.

Below is a patch for the code you posted in the last message that
actually "deletes" the threads when they 'complete':

--- thread-test2.orig 2024-06-26 17:55:57.564273473 -0400
+++ thread-test2 2024-06-26 18:04:58.037504557 -0400
@@ -21,6 +21,7 @@
proc sourceFiles {args} {
source oo.tcl
ns0::runAnsible $args
+ return [thread::id]
}
puts stderr "Thread: [thread::id] waiting"
thread::wait
@@ -33,13 +34,15 @@
puts "++++++++ cycle [incr cycle]"
while {$nr_live_threads < $nr_max_threads} {
puts stderr "Main: live=$nr_live_threads max=$nr_max_threads"
- set tid [thread::create $init_script]
+ set tid [thread::create -preserved $init_script]
incr nr_live_threads
thread::send -async $tid [list sourceFiles ....] sync
}
puts stderr "Main: waiting for a child to exit."
vwait sync
- puts stderr "Main: a child exited."
+ set exited $sync
+ puts stderr "Main: child '$exited' exited."
+ thread::release $exited
foreach tn [thread::names] {
puts "$tn\t\t[thread::exists $tn]"

From et99@21:1/5 to All on Wed Jun 26 16:56:10 2024

Luis Mendes <[email protected]> wrote:

1. Regarding vwait

-snip-

Care must be taken to -NOT- do any [update] calls or a [vwait] on another variable any time between the thread::send calls and the vwait on the variable sync - since all threads are setting the same variable. If the event loop is entered with a few
queued up events to set the variable sync, then some of them will not be processed and the threads will not be killed off.

Here's an example of that happening, where the timing is such that the threads return and set sync before they are vwait'd on:

------------------------

set sync 0
package require Thread

for {set n 0} {$n < 5} {incr n} {
set tid [thread::create]
puts "created tid $tid"
thread::send -async $tid {after 120; set foo [thread::id]} ::sync
}

puts "before waiting with sync = $::sync"

set ::avar 0
after 100 {set ::avar 1}
vwait ::avar

for {set m 0} {$m < 5} {incr m} {
vwait ::sync
puts "m=$m after waiting for sync with sync now = $::sync"
}

------------------------

And here is the output of two runs:

created tid tid0000578C
created tid tid00001A68
created tid tid0000555C
created tid tid00005B14
created tid tid00000810
before waiting with sync = 0
m=0 after waiting for sync with sync now = tid00005B14
m=1 after waiting for sync with sync now = tid00000810

--------

created tid tid00003464
created tid tid0000558C
created tid tid00002A90
created tid tid000045F0
created tid tid00003418
before waiting with sync = 0
m=0 after waiting for sync with sync now = tid0000558C
m=1 after waiting for sync with sync now = tid00002A90
m=2 after waiting for sync with sync now = tid000045F0
m=3 after waiting for sync with sync now = tid00003418

--------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich@21:1/5 to Luis Mendes on Fri Jun 28 16:14:07 2024

Luis Mendes <[email protected]> wrote:

Hi et99,

Thank you for your help.

Please, read below.

On Wed, 26 Jun 2024 16:56:10 -0700, et99 wrote:

Luis Mendes <[email protected]> wrote:

1. Regarding vwait

-snip-

Care must be taken to -NOT- do any [update] calls or a [vwait] on
another variable any time between the thread::send calls and the vwait
on the variable sync - since all threads are setting the same variable.

Can you please elaborate on this?
any calls?
any update calls?

[update] is the Tk command to explicitly reenter the event loop from
Tcl code. vwait is an event loop wait command, and 'reentering' the
event loop in the wrong place (per et99's info) may mess up the
handling of the return events from the threads.

the vwait I understand.

vwaits nest, so a second vwait (if called) while an existing vwait is outstanding must itself first complete before the outer one can
complete. This too might mess up the handling of the return events
from the threads.

I have a vwait inside the sourced file that is running under some child thread, it should be fine, right?

If the vwait is in a separate thread, then it has no bearing on a vwait
in "this" thread. Each thread in Tcl is more similar to a "process" in Linux/Windows than to a true "thread". The term used on the wiki is
the "apartment model" of threading. Each thread is an independent
interpreter that by default shares nothing with other interpreters.

But, I placed some 'after xxxx' commands in between the thread::send and
the vwait sync.
Is it a mistake?

after with just a number does not reenter the event loop, so there
should be no problem there.

The funny thing is that when I first tried this, it worked as you show
above.
But now, everytime I run it, all five threads finish their job.

for f in {1..50}; do echo -n "$f -> "; ./et_thread1.tcl | grep 'm=4'; done

The last line 'm=4' shows up every time.

Threads (as well as cooperating processes) introduce the aspect of indeterminism to your code. Absent explict locking to control
execution order (and if done, esp. incorrectly, this would often erase
the parallelism available via threads) you have no control over the
order that any thread executes with respect to others.

So some runs, with a given order, finish off properly.

Other runs, with a different ordering of execution, can produce other
results.

Which means that there might need to be some small amount of
"synchronizing" that needs to be done to eliminate the orders that
produce the unwanted results.

Note that et99 and my examples are just that, examples, and don't take
into account all the possibilities nor try to sand off any rough
edges.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From et99@21:1/5 to Luis Mendes on Fri Jun 28 09:16:58 2024

On 6/28/2024 6:52 AM, Luis Mendes wrote:

Hi et99,

Thank you for your help.

Please, read below.

On Wed, 26 Jun 2024 16:56:10 -0700, et99 wrote:

Luis Mendes <[email protected]> wrote:

1. Regarding vwait

-snip-

Care must be taken to -NOT- do any [update] calls or a [vwait] on
another variable any time between the thread::send calls and the vwait
on the variable sync - since all threads are setting the same variable.

Can you please elaborate on this?
any calls?
any update calls?
the vwait I understand.

I have a vwait inside the sourced file that is running under some child thread, it should be fine, right?

It depends on the timing. I can't say what will happen here. The timing issue I mentioned occurs in the main thread, not the child threads.

But, I placed some 'after xxxx' commands in between the thread::send and
the vwait sync.
Is it a mistake?

after xxxx

alone (i.e. no script), does not enter the event loop, it merely causes the thread to sleep for xxxx ms.

snip

The funny thing is that when I first tried this, it worked as you show
above.
But now, everytime I run it, all five threads finish their job.

for f in {1..50}; do echo -n "$f -> "; ./et_thread1.tcl | grep 'm=4'; done

The last line 'm=4' shows up every time.

It depends on the timing. I am assuming you are on linux and that is some shell script. I don't know what would occur in that case. However, if you change the time it waits in the main thread, I suspect it will never come back at all from the first vwait
on sync. Try this, instead of the after 100 in my original posting.

after 1000 {set ::avar 1}

I don't know what your program is doing in each thread and what you are doing in the main thread while the child threads are processing. So, I can't help you further. I just know that if there's a race condition, your program can run fine for a long time
and then might just deadlock.

If it does ever deadlock, then it is likely for the reason I have given.

To be certain that it won't deadlock, you should not do anything in the main thread that can cause the event loop to process more than one setting of the variable sync per time you wake to use the value placed in that variable.

If it is set more than once, before you vwait, you will miss killing off a thread. Eventually, you will have many zombie threads. That may not be fatal, if say you are on a 64 bit system, where you can have lots of threads. If your program doesn't run
for a very long time, you may not have any problems.

I just wanted you to be aware of the potential problem here.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	68:00:00
Calls:	12,448
Calls today:	3
Files:	15,194
Messages:	6,537,582

Re: Thread with -async exits prematurely

Who's Online

Recent Visitors

System Info