• Re: Thread with -async exits prematurely

    From Rich@21:1/5 to Luis Mendes on Wed Jun 19 15:02:28 2024
    Luis Mendes <[email protected]> wrote:
    Hi all!


    My program is working fine when thread::send don't use the -async
    option. When it does, all of those created threads exit prematurely.

    The pseudo-code I have is this:

    Working code that you've tested to exhibit the bug you see is
    preferable, and your code was *very* close....

    ===== main file

    while 1 {
    ...
    while {nr_live_threads < nr_max_threads} {

    This will error out as a syntax error. You want to both initialize
    these variables before you use them, and to interpolate them using $
    above.

    set tid [thread::create $init_script]
    thread::send -async $tid [list sourceFiles ....]
    }
    after 10000
    }

    You never increment nr_live_threads, so this loop above will (assuming
    the variables were initalized, and referenced, correctly) simply loop
    forever, creating new threads. At least until the whole process is
    killed for using all free memory up.

    ===== oo.tcl
    namespace eval ns0 {
    proc runAnsible {...} {
    Parse new ...
    vwait ::exit_flag
    }
    }

    You never signal to the master thread that this thread has exited, so
    the master (as written here) will never launch a new thread when an
    existing one finishes.

    This comprises the important parts of the script, I think.
    When thread::send does not use `-async`, the `vwait ::exit_flag` works and the thread is run until the end.
    With `-async`, the thread exits shortly after the `thread::send` command.

    Something must be different in the "psudeo" code vs. your real code
    then.

    I've read about `thread::preserve` and `thread::release`, but interpreted
    it as necessary when threads have to be orchestrated and some may be dependent on the results of others.

    No, those are to do reference counting for thread cleanup.

    What I want is really to have several threads launched in the same moment,
    at each run of the while loop that checks if the number of active threads
    is less than the nr_max_threads.
    How can that be accomplished?

    Well, first, you have to communicate the exit of a child thread back to
    he main thread, and have that comm path decrement "nr_active" (and you
    also need to increment nr_active when you launch a new thread).

    Syntax cleaned up -- and simplified version of your original code, that *actually runs*:

    thread-test:
    #!/usr/bin/tclsh

    package require Thread

    set nr_max_threads 4
    set nr_live_threads 0

    set init_script {
    puts stderr "Thread: [thread::id] Init: creating sourceFiles"
    proc sourceFiles {args} {
    source oo.tl
    ns0::runAnsible $args
    }
    puts stderr "Thread: [thread::id] waiting"
    thread::wait
    puts stderr "Thread: [thread::id] out of wait"
    }

    while 1 {
    while {$nr_live_threads < $nr_max_threads} {
    puts stderr "Main: live=$nr_live_threads max=$nr_max_threads"
    set tid [thread::create $init_script]
    incr nr_live_threads
    thread::send -async $tid [list sourceFiles ....]
    }
    puts stderr "Main: sleeping for 10s"
    after 10000
    puts stderr "Main: awake"
    }

    oo.tl:
    namespace eval ns0 {
    proc runAnsible {...} {
    puts stderr "Thread [thread::id]: executing parse new"
    Parse new ...
    puts stderr "Thread [thread::id]: vwait begin"
    vwait ::exit_flag
    puts stderr "Thread [thread::id]: vwait complete"
    }
    }
    oo::class create Parse {
    constructor {...} {
    set random [expr {int(rand()*20000)}]
    puts stderr "Thread [thread::id]: object constructor - sleeping for $random"
    after $random [list set ::exit_flag 1]
    }
    }

    Sample run of the above:

    $ ./thread-test
    Main: live=0 max=4
    Thread: tid0x7fbfa7b6c640 Init: creating sourceFiles
    Thread: tid0x7fbfa7b6c640 waiting
    Main: live=1 max=4
    Thread tid0x7fbfa7b6c640: executing parse new
    Thread tid0x7fbfa7b6c640: object constructor - sleeping for 8476
    Thread tid0x7fbfa7b6c640: vwait begin
    Thread: tid0x7fbfa6b6a640 Init: creating sourceFiles
    Thread: tid0x7fbfa6b6a640 waiting
    Main: live=2 max=4
    Thread tid0x7fbfa6b6a640: executing parse new
    Thread tid0x7fbfa6b6a640: object constructor - sleeping for 16806
    Thread tid0x7fbfa6b6a640: vwait begin
    Thread: tid0x7fbfa6369640 Init: creating sourceFiles
    Thread: tid0x7fbfa6369640 waiting
    Main: live=3 max=4
    Thread tid0x7fbfa6369640: executing parse new
    Thread tid0x7fbfa6369640: object constructor - sleeping for 11225
    Thread tid0x7fbfa6369640: vwait begin
    Thread: tid0x7fbfa5b68640 Init: creating sourceFiles
    Thread: tid0x7fbfa5b68640 waiting
    Main: sleeping for 10s
    Thread tid0x7fbfa5b68640: executing parse new
    Thread tid0x7fbfa5b68640: object constructor - sleeping for 5573
    Thread tid0x7fbfa5b68640: vwait begin
    Thread tid0x7fbfa5b68640: vwait complete
    Thread tid0x7fbfa7b6c640: vwait complete
    Main: awake
    Main: sleeping for 10s
    Thread tid0x7fbfa6369640: vwait complete
    Thread tid0x7fbfa6b6a640: vwait complete
    Main: awake
    Main: sleeping for 10s
    Main: awake
    Main: sleeping for 10s

    And, it will continue to loop saying 'awake' and 'sleeping' since the
    exit of the children is never communicated to the master.

    You need to master to become aware that one of the children has exited,
    so it knows to relaunch another child. One way is to use the
    additional result variable for -async threads and vwait on that
    variable in the master.

    Here is the 'diff' necessary to have the master monitor children
    exiting and to launch a new child when that happens:

    --- thread-test.v1 2024-06-19 10:42:34.359605931 -0400
    +++ thread-test 2024-06-19 11:00:01.433949725 -0400
    @@ -4,6 +4,7 @@

    set nr_max_threads 4
    set nr_live_threads 0
    +set sync 0

    set init_script {
    puts stderr "Thread: [thread::id] Init: creating sourceFiles"
    @@ -21,10 +22,10 @@
    puts stderr "Main: live=$nr_live_threads max=$nr_max_threads"
    set tid [thread::create $init_script]
    incr nr_live_threads
    - thread::send -async $tid [list sourceFiles ....]
    + thread::send -async $tid [list sourceFiles ....] sync
    }
    - puts stderr "Main: sleeping for 10s"
    - after 10000
    - puts stderr "Main: awake"
    + puts stderr "Main: waiting for a child to exit"
    + vwait sync
    + puts stderr "Main: a child exited"
    + incr nr_live_threads -1
    }


    And a sample run:

    $ ./thread-test
    Main: live=0 max=4
    Thread: tid0x7f2d21a4b640 Init: creating sourceFiles
    Thread: tid0x7f2d21a4b640 waiting
    Main: live=1 max=4
    Thread tid0x7f2d21a4b640: executing parse new
    Thread tid0x7f2d21a4b640: object constructor - sleeping for 19992
    Thread tid0x7f2d21a4b640: vwait begin
    Thread: tid0x7f2d20a49640 Init: creating sourceFiles
    Thread: tid0x7f2d20a49640 waiting
    Main: live=2 max=4
    Thread tid0x7f2d20a49640: executing parse new
    Thread tid0x7f2d20a49640: object constructor - sleeping for 8316
    Thread tid0x7f2d20a49640: vwait begin
    Thread: tid0x7f2d1bfff640 Init: creating sourceFiles
    Thread: tid0x7f2d1bfff640 waiting
    Main: live=3 max=4
    Thread tid0x7f2d1bfff640: executing parse new
    Thread tid0x7f2d1bfff640: object constructor - sleeping for 17902
    Thread tid0x7f2d1bfff640: vwait begin
    Thread: tid0x7f2d1b7fe640 Init: creating sourceFiles
    Thread: tid0x7f2d1b7fe640 waiting
    Main: waiting for a child to exit
    Thread tid0x7f2d1b7fe640: executing parse new
    Thread tid0x7f2d1b7fe640: object constructor - sleeping for 12322
    Thread tid0x7f2d1b7fe640: vwait begin
    Thread tid0x7f2d20a49640: vwait complete
    Main: a child exited
    Main: live=3 max=4
    Thread: tid0x7f2d1affd640 Init: creating sourceFiles
    Thread: tid0x7f2d1affd640 waiting
    Main: waiting for a child to exit
    Thread tid0x7f2d1affd640: executing parse new
    Thread tid0x7f2d1affd640: object constructor - sleeping for 7521
    Thread tid0x7f2d1affd640: vwait begin
    Thread tid0x7f2d1b7fe640: vwait complete
    Main: a child exited
    Main: live=3 max=4
    Thread: tid0x7f2d1a7fc640 Init: creating sourceFiles
    Thread: tid0x7f2d1a7fc640 waiting
    Main: waiting for a child to exit
    Thread tid0x7f2d1a7fc640: executing parse new
    Thread tid0x7f2d1a7fc640: object constructor - sleeping for 9508
    Thread tid0x7f2d1a7fc640: vwait begin
    Thread tid0x7f2d1affd640: vwait complete
    Main: a child exited
    Main: live=3 max=4
    Thread: tid0x7f2d19ffb640 Init: creating sourceFiles
    Thread: tid0x7f2d19ffb640 waiting
    Main: waiting for a child to exit
    Thread tid0x7f2d19ffb640: executing parse new
    Thread tid0x7f2d19ffb640: object constructor - sleeping for 13232
    Thread tid0x7f2d19ffb640: vwait begin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to Luis Mendes on Wed Jun 19 15:37:08 2024
    On 6/19/2024 3:49 AM, Luis Mendes wrote:
    Hi all!


    My program is working fine when thread::send don't use the -async option. When it does, all of those created threads exit prematurely.

    The pseudo-code I have is this:


    -snip-


    What I want is really to have several threads launched in the same moment,
    at each run of the while loop that checks if the number of active threads
    is less than the nr_max_threads.
    How can that be accomplished?

    Thanks,


    Luís


    I can't tell your true intent from the pseudo code alone. As Rich said, non-working but real code is preferred here, then people can run the code and help you debug it.

    Are you re-using threads or creating new ones and are you using threads as workers to process jobs? Is this going to eventually be a multi-server, single-queue implementation? Do all threads run the same code?

    I have a way of doing those kinds of threads you might find interesting; it's sort of object oriented threads.

    ------------------------------

    set script {
    package require... ;# one time setups
    source ....

    global var1 var2 ...
    proc init {args} ;# like a constructor, one time to init the thread's var's
    lassign $args ::var1 ::var2 ....
    }
    proc work {arg1 arg2 ...} { ;# like a method
    ...
    return result
    }
    ... including any oo code ...

    thread::wait ;# don't exit, re-use

    }

    # now to create a worker thread(s) and init it:

    set tid(1) [thread::create $script] ;# 2,3, ...
    thread::send $tid(1) [list init value1 value2 ...] ;# similar to an OO constructor

    #Then sync or async job requests:

    set result($N) [thead::send $tid($N) [list work ...arglist...]] ;# sync call to thread N

    # async call
    unset -nocomplain result(1)
    set status [thread::send -async $tid(1) [list work ...arglist...] result(1)]
    # .... other stuff before waiting including other work and update or vwait calls ....
    if {![info exist result(1)]} {vwait result(1)} ;#conditional vwait

    ------------------------------


    Some notes:

    Each thread has it's own interpreter, so global data in a thread is not "global" to the program nor visible to other threads (or the main thread). So, unless you really need multiple namespaces in a thread, global variables might be simpler to use.

    Re-use threads; you can just "call" it (like calling a method) for each new job you want done. No need to package require or "source" code more than once (per thread).

    End the script with a thread::wait, with nothing after that (see manual's warning).

    The unset/if not exist technique protects against any entering of the event loop before vwait-ing. Can wait for 1 or all or any combo. Can also set a write trace on a result variable in lieu of doing a vwait.

    Each thread needs it's own result variable (which resides in the main thread) where it both signals a job is done, and also can return the job's value (scalars, array element, lists, dicts, etc).

    You will need to keep track of the tid's and the result's by using your own thread index. You might also need to create a job queue.

    This is essentially how my Tasks module works, except it hides all the above details. You can also use the tpool and ttrace packages, which has many of the same and lots more features.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Luis Mendes on Wed Jun 26 22:12:48 2024
    Luis Mendes <[email protected]> wrote:
    Hi Rich,

    Once again, thank you very much for your help.
    I could manage to application run in a multi-threaded way.

    Still, there are a couple of things that I haven't yet understood, maybe
    you (or other person) can help me figure this out.

    1. Regarding vwait
    As stated in https://www.tcl-lang.org/man/tcl/TclCmd/vwait.htm
    """It continues processing events until some event handler sets the value
    of the global variable varName. Once varName has been set, the vwait
    command will return as soon as the event handler that modified varName completes."""
    This was a difficulty I had before, maybe because English is not my main language.
    I thought that varName would have to change for every event handler that signaled the end of some operation, like:

    It is not the "value" that triggers vwait, it is the action of
    "writing" (any write of anything) to the variable.


    2. Regarding the script that you modified, I changed it a bit as well to
    show what I don't understand.
    The thread::names command shows all the threads that have been created
    from the start of execution and not only the ones that were created in the last cycle.
    And this is confirmed by thread::exists that show 1 for all of them.
    I was expecting that threads would no longer exist after an event handler sets a varName for vwait.
    Otherwise, we can end up with millions of threads existing at the same
    time.

    That is because that version does not "delete" the threads that finish (preferably you'd reuse threads rather than delete and recreate them
    over and over so I did not try to 'release' them when they finished).

    Creation of a thread takes some non-zero amount of time, so if you
    reuse existing threads you amortize the creation time across each
    usage. If you recreate them anew each time, you pay the cost (in time)
    to create them each time you use them.

    Below is a patch for the code you posted in the last message that
    actually "deletes" the threads when they 'complete':

    --- thread-test2.orig 2024-06-26 17:55:57.564273473 -0400
    +++ thread-test2 2024-06-26 18:04:58.037504557 -0400
    @@ -21,6 +21,7 @@
    proc sourceFiles {args} {
    source oo.tcl
    ns0::runAnsible $args
    + return [thread::id]
    }
    puts stderr "Thread: [thread::id] waiting"
    thread::wait
    @@ -33,13 +34,15 @@
    puts "++++++++ cycle [incr cycle]"
    while {$nr_live_threads < $nr_max_threads} {
    puts stderr "Main: live=$nr_live_threads max=$nr_max_threads"
    - set tid [thread::create $init_script]
    + set tid [thread::create -preserved $init_script]
    incr nr_live_threads
    thread::send -async $tid [list sourceFiles ....] sync
    }
    puts stderr "Main: waiting for a child to exit."
    vwait sync
    - puts stderr "Main: a child exited."
    + set exited $sync
    + puts stderr "Main: child '$exited' exited."
    + thread::release $exited
    foreach tn [thread::names] {
    puts "$tn\t\t[thread::exists $tn]"
  • From et99@21:1/5 to All on Wed Jun 26 16:56:10 2024
    Luis Mendes <[email protected]> wrote:

    1. Regarding vwait

    -snip-

    Care must be taken to -NOT- do any [update] calls or a [vwait] on another variable any time between the thread::send calls and the vwait on the variable sync - since all threads are setting the same variable. If the event loop is entered with a few
    queued up events to set the variable sync, then some of them will not be processed and the threads will not be killed off.

    Here's an example of that happening, where the timing is such that the threads return and set sync before they are vwait'd on:

    ------------------------

    set sync 0
    package require Thread

    for {set n 0} {$n < 5} {incr n} {
    set tid [thread::create]
    puts "created tid $tid"
    thread::send -async $tid {after 120; set foo [thread::id]} ::sync
    }

    puts "before waiting with sync = $::sync"

    set ::avar 0
    after 100 {set ::avar 1}
    vwait ::avar

    for {set m 0} {$m < 5} {incr m} {
    vwait ::sync
    puts "m=$m after waiting for sync with sync now = $::sync"
    }

    ------------------------

    And here is the output of two runs:

    created tid tid0000578C
    created tid tid00001A68
    created tid tid0000555C
    created tid tid00005B14
    created tid tid00000810
    before waiting with sync = 0
    m=0 after waiting for sync with sync now = tid00005B14
    m=1 after waiting for sync with sync now = tid00000810


    --------

    created tid tid00003464
    created tid tid0000558C
    created tid tid00002A90
    created tid tid000045F0
    created tid tid00003418
    before waiting with sync = 0
    m=0 after waiting for sync with sync now = tid0000558C
    m=1 after waiting for sync with sync now = tid00002A90
    m=2 after waiting for sync with sync now = tid000045F0
    m=3 after waiting for sync with sync now = tid00003418

    --------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Luis Mendes on Fri Jun 28 16:14:07 2024
    Luis Mendes <[email protected]> wrote:
    Hi et99,

    Thank you for your help.

    Please, read below.


    On Wed, 26 Jun 2024 16:56:10 -0700, et99 wrote:
    Luis Mendes <[email protected]> wrote:

    1. Regarding vwait

    -snip-

    Care must be taken to -NOT- do any [update] calls or a [vwait] on
    another variable any time between the thread::send calls and the vwait
    on the variable sync - since all threads are setting the same variable.
    Can you please elaborate on this?
    any calls?
    any update calls?

    [update] is the Tk command to explicitly reenter the event loop from
    Tcl code. vwait is an event loop wait command, and 'reentering' the
    event loop in the wrong place (per et99's info) may mess up the
    handling of the return events from the threads.

    the vwait I understand.

    vwaits nest, so a second vwait (if called) while an existing vwait is outstanding must itself first complete before the outer one can
    complete. This too might mess up the handling of the return events
    from the threads.

    I have a vwait inside the sourced file that is running under some child thread, it should be fine, right?

    If the vwait is in a separate thread, then it has no bearing on a vwait
    in "this" thread. Each thread in Tcl is more similar to a "process" in Linux/Windows than to a true "thread". The term used on the wiki is
    the "apartment model" of threading. Each thread is an independent
    interpreter that by default shares nothing with other interpreters.

    But, I placed some 'after xxxx' commands in between the thread::send and
    the vwait sync.
    Is it a mistake?

    after with just a number does not reenter the event loop, so there
    should be no problem there.

    The funny thing is that when I first tried this, it worked as you show
    above.
    But now, everytime I run it, all five threads finish their job.

    for f in {1..50}; do echo -n "$f -> "; ./et_thread1.tcl | grep 'm=4'; done

    The last line 'm=4' shows up every time.

    Threads (as well as cooperating processes) introduce the aspect of indeterminism to your code. Absent explict locking to control
    execution order (and if done, esp. incorrectly, this would often erase
    the parallelism available via threads) you have no control over the
    order that any thread executes with respect to others.

    So some runs, with a given order, finish off properly.

    Other runs, with a different ordering of execution, can produce other
    results.

    Which means that there might need to be some small amount of
    "synchronizing" that needs to be done to eliminate the orders that
    produce the unwanted results.

    Note that et99 and my examples are just that, examples, and don't take
    into account all the possibilities nor try to sand off any rough
    edges.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From et99@21:1/5 to Luis Mendes on Fri Jun 28 09:16:58 2024
    On 6/28/2024 6:52 AM, Luis Mendes wrote:
    Hi et99,

    Thank you for your help.

    Please, read below.


    On Wed, 26 Jun 2024 16:56:10 -0700, et99 wrote:
    Luis Mendes <[email protected]> wrote:

    1. Regarding vwait

    -snip-

    Care must be taken to -NOT- do any [update] calls or a [vwait] on
    another variable any time between the thread::send calls and the vwait
    on the variable sync - since all threads are setting the same variable.
    Can you please elaborate on this?
    any calls?
    any update calls?
    the vwait I understand.

    I have a vwait inside the sourced file that is running under some child thread, it should be fine, right?

    It depends on the timing. I can't say what will happen here. The timing issue I mentioned occurs in the main thread, not the child threads.



    But, I placed some 'after xxxx' commands in between the thread::send and
    the vwait sync.
    Is it a mistake?

    after xxxx

    alone (i.e. no script), does not enter the event loop, it merely causes the thread to sleep for xxxx ms.




    snip


    The funny thing is that when I first tried this, it worked as you show
    above.
    But now, everytime I run it, all five threads finish their job.

    for f in {1..50}; do echo -n "$f -> "; ./et_thread1.tcl | grep 'm=4'; done

    The last line 'm=4' shows up every time.

    It depends on the timing. I am assuming you are on linux and that is some shell script. I don't know what would occur in that case. However, if you change the time it waits in the main thread, I suspect it will never come back at all from the first vwait
    on sync. Try this, instead of the after 100 in my original posting.

    after 1000 {set ::avar 1}

    I don't know what your program is doing in each thread and what you are doing in the main thread while the child threads are processing. So, I can't help you further. I just know that if there's a race condition, your program can run fine for a long time
    and then might just deadlock.

    If it does ever deadlock, then it is likely for the reason I have given.

    To be certain that it won't deadlock, you should not do anything in the main thread that can cause the event loop to process more than one setting of the variable sync per time you wake to use the value placed in that variable.

    If it is set more than once, before you vwait, you will miss killing off a thread. Eventually, you will have many zombie threads. That may not be fatal, if say you are on a 64 bit system, where you can have lots of threads. If your program doesn't run
    for a very long time, you may not have any problems.

    I just wanted you to be aware of the potential problem here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)