• Bug#1035983: libsoup3 (and libsoup2) autopkgtests are flaky: Address al

    From Santiago Vila@21:1/5 to All on Fri May 16 12:50:01 2025
    found 1035983 3.6.5-1
    tags 1035983 ftbfs
    thanks

    Hi. For some reason the BTS thinks this is only a problem in stable.
    The above is my attempt at fixing that.

    I'm also tagging the bug as ftbfs because I'm getting build failures
    due to failing tests.

    Regarding the flakiness itself, I get a failure rate around 20%
    on machines with 1 CPU and 30% on machines with 2 CPUs. This is
    greater than the reference thresholds given by Paul in one
    of the gcr bugs.

    I'd like to propose a patch, but the tests which fail are
    different every time. On a sample of 200 build tries on
    different machines, I get the following failures these many times:

    26 multithread-test
    23 proxy-test
    22 range-test
    22 connection-test
    22 auth-test
    6 server-test
    1 timeout-test
    1 hsts-test

    If somebody wants to debug this (maybe Simon?), please contact me
    privately, as I can provide a VM.

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Santiago Vila on Mon May 19 16:50:04 2025
    On Fri, 16 May 2025 at 12:45:43 +0200, Santiago Vila wrote:
    Regarding the flakiness itself, I get a failure rate around 20%
    on machines with 1 CPU and 30% on machines with 2 CPUs. This is
    greater than the reference thresholds given by Paul in one
    of the gcr bugs.

    I'd like to propose a patch, but the tests which fail are
    different every time. On a sample of 200 build tries on
    different machines, I get the following failures these many times:

    26 multithread-test
    23 proxy-test
    22 range-test
    22 connection-test
    22 auth-test
    6 server-test
    1 timeout-test
    1 hsts-test

    Is this still the same failure mode described in the bug title, with
    "Address already in use" and "could not bind to address ..." being
    reported by Apache?

    Last time I looked at the libsoup* test suite, the actual tests were
    each reasonably reliable, but the reliability issue was with their setup/teardown. They run a temporary Apache web server, in order to
    have a realistic server to test against. I think what's happening is
    that sometimes, the web server port from one test (let's say test number
    5) is still considered by the kernel to be in use by the time we reach
    the setup stage of the next test (let's say test number 6).

    As a result, the Apache for test number 6 can't listen on the port it
    has been configured to use, and testing fails at that point. This is
    rare on a per-test basis, therefore difficult to reproduce on-demand -
    but running the whole test suite involves several setup/teardown cycles, resulting in a higher failure rate for the test suite as a whole. For
    example if you're seeing a 30% failure rate, that might be more like a
    2% failure rate for each of 15 test executables, or perhaps even a 0.2%
    failure rate for each of 150 smaller test-cases.

    If that's still what is happening, then it's expected that you will see failures in different tests (and even in different test-cases within
    those larger tests) on different occasions.

    Unfortunately, if that's the case, then skipping any specific test-case
    is not going to be a viable workaround, because it's the common
    setup/teardown done for each test-case that is the problem.

    If it's possible to configure Apache to set options like SO_REUSEADDR
    and/or SO_REUSEPORT then that might help (but I don't know whether
    that's possible).

    Or if it's possible to make the test suite use a different port for each
    test then that might help (but I don't know whether that will be
    feasible).

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Santiago Vila@21:1/5 to All on Mon May 19 18:10:01 2025
    El 19/5/25 a las 16:43, Simon McVittie escribió:
    Is this still the same failure mode described in the bug title, with "Address already in use" and "could not bind to address ..." being reported by Apache?

    That's a very good question and I'm glad that you asked :-)

    In some cases, yes, but not always. I've put a collection
    of failed build logs here:

    https://people.debian.org/~sanvila/build-logs/libsoup3/

    I usually try not to report FTBFS bugs when there is already another
    open bug about flaky tests that I can also reproduce, as such
    duplication is not very useful, but in this case you are right
    that there might be more than one issue, so feel free to
    clone if required.

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Santiago Vila on Sat Jul 12 19:20:01 2025
    Control: clone 1035983 -2
    Control: retitle 1035983 libsoup3: intermittent test failures: Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:xxx
    Control: retitle -2 libsoup3: [metabug] several intermittent test failures resulting in flaky autopkgtests and FTBFS
    Control: unblock 1035983 by 1109107 1109108
    Control: block -2 by 1035983

    On Mon, 19 May 2025 at 17:57:50 +0200, Santiago Vila wrote:
    El 19/5/25 a las 16:43, Simon McVittie escribi�:
    Is this still the same failure mode described in the bug title, with "Address already in use" and "could not bind to address ..." being reported by Apache?

    That's a very good question and I'm glad that you asked :-)

    In some cases, yes, but not always.

    Bug #1035983 has always mentioned the AH00072 issue in its title, so I
    think it's probably best if we consider any other sources of FTBFS or autopkgtest failures as out-of-scope for #1035983.

    Regarding the topic of flaky tests in general:

    Unfortunately I suspect that what's happening here is that we have a
    series of different test failures, each of them individually quite rare (therefore hard to reproduce or debug), which add up to a significant probability that at least one of the rare failures will happen at least
    once in any given test run and therefore the overall test suite fails.

    I've cloned a "metabug" (-2 above) to be blocked by #1035983 and other
    concrete and potentially actionable causes of test failures, but that
    metabug is not going to be directly actionable, because issues that
    can't be identified can't be fixed: the only way it can be solved is to
    chip away at its actionable dependencies until the failure rate becomes sufficiently low. I am not an expert on this package and I cannot commit
    to being able to achieve that.

    Individual tests that are sufficiently flaky can be worked around by
    disabling or ignoring the test if necessary (as was done for the tls_interaction test already), but the cost of disabling tests is that
    we can no longer use them to detect RC-severity regressions
    (particularly on architectures with few users where the buildds and
    autopkgtest are basically the only tools we have), so there's a
    trade-off here between breakage caused by false-positive failures and
    breakage caused by regressions that could have been caught by running
    the tests. As a non-expert trying to keep this package afloat, I don't
    feel that I am able to make high-quality uploads without automated tests
    to detect my inevitable mistakes. I'm sorry that this is disappointing,
    and I would be delighted to stop contributing to libsoup when someone
    can do a better job, but until then all I can do is to try to have a net-positive impact to the best of my limited ability.

    As mentioned previously, the AH00072 issue, #1035983, is particularly bad
    for this because it affects several tests equally, and disabling all of
    them would lose a lot of the overall test coverage.

    I've put a collection
    of failed build logs here:

    https://people.debian.org/~sanvila/build-logs/libsoup3/

    Thanks, hopefully someone can analyze those at some point and pick out
    the actionable equivalence classes. I cannot commit to being able to do
    this myself.

    I've reported some other sources of intermittent test failures as
    #1109107 (no solution known, help welcome), #1109108 (no solution known,
    help welcome) and #1109120 (fixed in the latest upload to unstable by an upstream change). None of these are, individually, a high probability of failure, but they add up.

    When I tried running the test suite repeatedly on barriere, the failure
    modes I saw intermittently were #1109107 and #1109108. I don't think I
    saw #1109120 or #1035983, so those might be less common, at least on
    that particular machine (if the failures are timing-dependent then they
    might behave differently elsewhere).

    Regarding #1035983 (the AH00072 issue) specifically:

    Last time I looked at the libsoup* test suite, the actual tests were
    each reasonably reliable, but the reliability issue was with their >setup/teardown. They run a temporary Apache web server, in order to
    have a realistic server to test against. I think what's happening is
    that sometimes, the web server port from one test (let's say test
    number 5) is still considered by the kernel to be in use by the time
    we reach the setup stage of the next test (let's say test number 6).

    As a result, the Apache for test number 6 can't listen on the port it
    has been configured to use, and testing fails at that point.

    I tried applying the attached patch as a brute-force attempt to solve
    the port-still-in-use problem (#1035983). (FYI this will not apply
    cleanly to upstream code, it requires other changes already in
    debian/patches to add more debug info, which I added last time I spent
    time on trying to figure this out.)

    Unfortunately it didn't work: the test made multiple attempts to start
    Apache, but they all failed with the same error message shown in the
    Subject, until the overall test timed out. That suggests that my theory
    about the web server port being in TIME_WAIT state might not have been
    correct. I don't know what else to try there.

    In 3.6.5-2 I added a patch fixing an upstream issue where one of the
    tests that used Apache was not marked "don't run in parallel", so it
    could end up being run in parallel with other tests - that could have
    resulted in a similar failure mode. We can see whether that helps. I
    think I've still seen the AH00072 error occasionally even after making
    that change, though, so it can't be the whole story.

    smcv

    From: Simon McVittie <[email protected]>
    Date: Fri, 11 Jul 2025 13:27:41 +0100
    Subject: tests: If we can't start Apache, wait a bit and try again

    Maybe helps: #1035983
    ---
    tests/test-utils.c | 23 +++++++++++++++++++----
    1 file changed, 19 insertions(+), 4 deletions(-)

    diff --git a/tests/test-utils.c b/tests/test-utils.c
    index 37cd00b..0de446c 100644
    --- a/tests/test-utils.c
    +++ b/tests/test-utils.c
    @@ -234,9 +234,13 @@ apache_cmd (const char *cmd)
    return ok;
    }

    +static const unsigned int MAX_START_APACHE_TRIES = 10;
    +
    void
    apache_init (void)
    {
    + unsigned int i = 0;
    +
    g_test_message ("[%f] enter %s", g_get_monotonic_time () / 1e6, G_STRFUNC);

    /* Set this environment variable if you are already running a
    @@ -246,11 +250,22 @@ apache_init (void)

    server_root = soup_test_build_filename_abs (G_TEST_BUILT, "", NULL);

    - if (!apache_cmd ("start")) {
    - g_printerr ("Could not start apache\n");
    - exit (1);
    + while (TRUE) {
    + if (apache_cmd ("start")) {
    + apache_running = TRUE;
    + goto out;
    + } else {
    + g_test_message ("[%f] Could not start Apache", g_get_monotonic_time () / 1e6);
    + }
    +
    + if (++i > MAX_START_APACHE_TRIES)