On Tue, 2025-07-15 at 22:08 +0100, Rebecca N. Palmer wrote:
I could get test_nanny to never throw an error in the loop by
adding
await asyncio.sleep(0.1) into the function before it started trying
to
use the cluster
That's plausibly a better idea, but I haven't tried it.
Unfortunately it doesn't work on the ppc64el porterbox
platti.debian.org
I still get a fair number of timeout test failures when running
test_nanny in a loop.
I did manage to capture a little bit earlier from where the logs start
to look different for a failed case.
When it's about to fail I get something like the following, the first
error is the worker-handle-scheduler-connection-broken. Then it waits
for the nanny to shutdown and after that times out it starts to spew
stack traces and complains about 0 byte TLS responses.
2025-07-15 22:25:26,920 - distributed.core - INFO - Connection to tls://127.0.0.1:51942 has been closed.
2025-07-15 22:25:26,920 - distributed.scheduler - INFO - Remove worker
addr: tls://127.0.0.1:46743 name: 0 (stimulus_id='handle-worker- cleanup-1752618326.9205813') 2025-07-15 22:25:26,921 - distributed.core - INFO - Starting
established connection to tls://127.0.0.1:42399
2025-07-15 22:25:26,922 - distributed.core - INFO - Connection to tls://127.0.0.1:42399 has been closed.
2025-07-15 22:25:26,922 - distributed.worker - INFO - Stopping worker
at tls://127.0.0.1:46743. Reason: worker-handle-scheduler-connection-
broken
2025-07-15 22:25:26,969 - distributed.nanny - INFO - Closing Nanny
gracefully at 'tls://127.0.0.1:35207'. Reason: worker-handle-scheduler- connection-broken
2025-07-15 22:25:26,970 - distributed.worker - INFO - Removing Worker
plugin shuffle
2025-07-15 22:25:26,971 - distributed.nanny - INFO - Worker closed
2025-07-15 22:25:28,974 - distributed.nanny - ERROR - Worker process
died unexpectedly
2025-07-15 22:25:29,076 - distributed.nanny - INFO - Closing Nanny at 'tls://127.0.0.1:35207'. Reason: nanny-close-gracefully
2025-07-15 22:25:29,077 - distributed.nanny - INFO - Nanny at 'tls://127.0.0.1:35207' closed.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)