• Bug#1109107: libsoup3: intermittent test failure: memory corruption in

    From Simon McVittie@21:1/5 to All on Fri Jul 11 16:00:04 2025
    Source: libsoup3
    Version: 3.6.4-2
    Severity: important
    Tags: ftbfs help moreinfo
    Control: block 1035983 by -1

    In a previous build of libsoup3 on the official buildds,
    multithread-test failed with evidence of memory corruption:

    https://buildd.debian.org/status/fetch.php?pkg=libsoup3&arch=amd64&ver=3.6.4-2&stamp=1737574120&raw=0
    17/38 multithread-test RUNNING
    MALLOC_PERTURB_=181 G_TEST_SRCDIR=/build/reproducible-path/libsoup3-3.6.4/tests MESON_TEST_ITERATION=1 LD_LIBRARY_PATH=/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests:/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/
    libsoup MALLOC_CHECK_=2 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 G_TEST_BUILDDIR=/build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests G_
    DEBUG=gc-friendly UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /build/reproducible-path/libsoup3-3.6.4/obj-x86_64-linux-gnu/tests/multithread-test --debug
    ▶ 15/38 /misc/cancel-while-reading/msg OK
    ▶ 15/38 /misc/cancel-while-reading/req/immediate OK
    ▶ 17/38 /multithread/no-main-context OK
    ▶ 17/38 /multithread/basic/async OK
    ▶ 17/38 /multithread/basic/sync OK
    ▶ 17/38 /multithread/basic-ssl/async OK
    ▶ 17/38 /multithread/basic-ssl/sync OK
    ▶ 17/38 /multithread/basic-proxy/async OK
    ▶ 17/38 /multithread/basic-proxy/sync OK
    ▶ 17/38 /multithread/basic-no-main-thread/async OK
    ▶ 17/38 /multithread/basic-no-main-thread/sync OK
    ▶ 17/38 /multithread/basic-ssl-proxy/async OK
    ▶ 17/38 /multithread/basic-ssl-proxy/sync OK
    ▶ 17/38 /multithread/basic-http2/async OK
    17/38 multithread-test ERROR 0.09s killed by signal 6 SIGABRT
    ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
    stderr:
    malloc(): unsorted double linked list corrupted

    (test program exited with status code -6)

    When the build was retried, all tests succeeded, so this is presumably intermittent or otherwise unreproducible.

    This is **not** the same as the failure mode that has been the most common
    in the past, where tests that use Apache fail with "Address already in
    use: AH00072: make_sock: could not bind to address 127.0.0.1:xxx".

    Similarly when I tried to add Salsa-CI to this package, my first attempt
    failed with a different indication of memory corruption:

    https://salsa.debian.org/gnome-team/libsoup3/-/jobs/7814730
    17/38 multithread-test ERROR 15.76s killed by signal 6 SIGABRT
    ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
    stderr:
    tcache_thread_shutdown(): unaligned tcache chunk detected
    (test program exited with status code -6)
    TAP parsing error: Too few tests run (expected 21, got 11)

    I think we can probably treat any evidence of memory corruption in this
    test as being essentially equivalent - if we corrupt the heap, then
    glibc can fail in several different ways as a result, none of which are meaningfully different.

    There seems to be a second failure mode where multithread-test times
    out (the default timeout is 60 seconds, but we use a 6x multiplier in
    the Debian packaging to accommodate slower architectures). That
    failure mode should be treated as a separate bug and is out of scope for
    this particular bug report, although it's possible that it has the same
    root cause. I will report that failure mode as a separate bug.

    To get an idea of how frequent this is, I tried these steps on the amd64 porterbox, barriere:

    1. build libsoup3 (from unstable):

    schroot -c $chroot -r -- \
    env DEB_BUILD_PROFILES=noudeb \
    debuild -e CCACHE_DIR=$HOME/.ccache -e PATH=/usr/lib/ccache:$PATH -us -uc -B

    2. run multithread-test repeatedly:

    schroot -c $chroot -r -- \
    env -C obj-x86_64-linux-gnu \
    DEB_BUILD_PROFILES=noudeb CCACHE_DIR=$HOME/.ccache PATH=/usr/lib/ccache:$PATH \
    DEB_PYTHON_INSTALL_LAYOUT=deb LC_ALL=C.UTF-8 \
    meson test --repeat 100 -j1 multithread-test

    (I tried this 3 times; optionally add --timeout-multiplier=6 to the
    `meson test` command-line to emulate the original package build more
    accurately)

    3. read obj-x86_64-linux-gnu/meson-logs/testlog.txt for details of the
    failures, if any

    and my results were as follows:

    - 7 successes, 1 timeout, 1 failure with memory corruption
    - 19 successes, 1 timeout, 6 more successes, 1 more timeout, I cancelled
    the run at this point
    - 10 successes, 1 timeout, 15 more successes, 1 failure with
    memory corruption

    Anyone who wants libsoup3 tests to pass more often is invited to help to
    debug and fix this. If the failure is reproducible under valgrind,
    probably the easiest way is to build it in an environment that is
    suitable for interactive debugging, then run multithread-test repeatedly
    under valgrind, using something like

    meson test --repeat 100 --wrapper=./valgrind.sh multithread-test

    to get a backtrace for the memory corruption and figure out how it is happening. But this might not be possible if using valgrind perturbs the
    timing enough that the failure mode never actually happens.

    Or it might be possible to build libsoup3 (and ideally GLib too) with -fsanitize=address,undefined, and then run multithread-test repeatedly,
    as above; but, again, AddressSanitizer slows down the binaries, which
    could perturb the timing enough that the failure mode never actually
    happens.

    Annoyingly, it is not possible to run two or more copies of this test in parallel, so that cannot be used to get to a failure sooner (this is
    because each run of this test uses the same fixed filenames and port
    numbers).

    I am a member of the GNOME team, but not an Uploader of this particular package. I am aware that some project members believe that, because I
    have solved test issues it in the past, I should be held personally
    responsible for every test failure that occurs in GNOME. As per the
    Debian Social Contract §2.1.1, I decline that responsibility: I am not
    able to fix everything all of the time, and I'm sorry if the project
    considers my contributions to be inadequate.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)