• Re: debugging gsasl autopkg test error on armhf

    From Mate Kukri@21:1/5 to Andreas Metzler on Sun Dec 3 19:10:01 2023
    Hello,

    That is the same issue we had in Ubuntu, and is caused by the recent
    addition of the -fstack-clash-protection compiler flag.

    There is an earlier thread discussing exactly this on debian-arm.

    Mate Kukri

    On 12/3/23 17:20, Andreas Metzler wrote:
    Hello,

    gnutls28 is currently blocked from testing because gsasl's autopkg test fails. I have played around on abel:

    Taking a trixie chroot and pulling in newer gnutls via LD_LIBRARY_PATH
    makes most of the testsuite fail, including this trivial test:

    8X------------------------------ LD_LIBRARY_PATH=~/BU/gnutls28-3.8.2/debian/tmp/usr/lib/arm-linux-gnueabihf valgrind --error-exitcode=1 /usr/bin/gsasl --mkpasswd --password password --mechanism SCRAM-SHA-1 ; echo $?
    ==16979==
    ==16979== Invalid write of size 4
    ==16979== at 0x48B9A5A: lib_init (global.c:499)
    ==16979== by 0x400354B: call_init (dl-init.c:90)
    ==16979== by 0x400354B: call_init (dl-init.c:27)
    ==16979== by 0x40035F9: _dl_init (dl-init.c:137)
    ==16979== by 0x400F3DF: ??? (in /usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3)
    ==16979== Address 0xbde0f468 is on thread 1's stack
    ==16979== 16 bytes below stack pointer
    ==16979==
    ==16979== Invalid write of size 4
    ==16979== at 0x48B9A5A: lib_init (global.c:499)
    ==16979== by 0x400354B: call_init (dl-init.c:90)
    ==16979== by 0x400354B: call_init (dl-init.c:27)
    ==16979== by 0x40035F9: _dl_init (dl-init.c:137)
    ==16979== by 0x400F3DF: ??? (in /usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3)
    ==16979== Address 0xbde0f468 is on thread 1's stack
    ==16979== 16 bytes below stack pointer
    ==16979==
    ==16979== Invalid write of size 4
    ==16979== at 0x48B9A5A: lib_init (global.c:499)
    ==16979== by 0x400354B: call_init (dl-init.c:90)
    ==16979== by 0x400354B: call_init (dl-init.c:27)
    ==16979== by 0x40035F9: _dl_init (dl-init.c:137)
    ==16979== by 0x400F3DF: ??? (in /usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3)
    ==16979== Address 0xbde0f468 is on thread 1's stack
    ==16979== 16 bytes below stack pointer
    [...]
    (Full log attached)
    8X------------------------------

    OTOH the test succeeds on sid.[1] I have checked the differences trixie/sid and tried pulling in the other newer libraries into the trixie chroot in vain. The only thing I could not test was valgrind, sid has 1:3.20.0-2, trixie 1:3.19.0-1. So I *suspect* valgrind/trixie to be slightly broken. Could this be true? Any better ideas?

    TIA, cu Andreas


    [1]
    ==17768== Memcheck, a memory error detector
    ==17768== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==17768== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info ==17768== Command: /usr/bin/gsasl --mkpasswd --password password --mechanism SCRAM-SHA-1
    ==17768== {SCRAM-SHA-1}65536,GiFRM7gH+lxu1r64,cGXqaDs3AxdxGl/Ia36IpYwHFrA=,pYycqZHy09aKZ9UK3hEIaT9XSls=
    ==17768==
    ==17768== HEAP SUMMARY:
    ==17768== in use at exit: 42 bytes in 4 blocks
    ==17768== total heap usage: 1,326 allocs, 1,322 frees, 99,720 bytes allocated
    ==17768==
    ==17768== LEAK SUMMARY:
    ==17768== definitely lost: 0 bytes in 0 blocks
    ==17768== indirectly lost: 0 bytes in 0 blocks
    ==17768== possibly lost: 0 bytes in 0 blocks
    ==17768== still reachable: 42 bytes in 4 blocks
    ==17768== suppressed: 0 bytes in 0 blocks
    ==17768== Rerun with --leak-check=full to see details of leaked memory ==17768==
    ==17768== For lists of detected and suppressed errors, rerun with: -s ==17768== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2733 from 34) 0
    -—-----------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Metzler@21:1/5 to All on Sun Dec 3 18:40:02 2023
    --BKfN9h4UN4Rf+mWi
    Content-Type: text/plain; charset=utf-8
    Content-Disposition: inline
    Content-Transfer-Encoding: quoted-printable

    Hello,

    gnutls28 is currently blocked from testing because gsasl's autopkg test
    fails. I have played around on abel:

    Taking a trixie chroot and pulling in newer gnutls via LD_LIBRARY_PATH
    makes most of the testsuite fail, including this trivial test:

    8X------------------------------ LD_LIBRARY_PATH=~/BU/gnutls28-3.8.2/debian/tmp/usr/lib/arm-linux-gnueabihf valgrind --error-exitcode=1 /usr/bin/gsasl --mkpasswd --password password --mechanism SCRAM-SHA-1 ; echo $?
    ==16979==
    ==16979== Invalid write of size 4
    ==16979== at 0x48B9A5A: lib_init (global.c:499)
    ==16979== by 0x400354B: call_init (dl-init.c:90)
    ==16979== by 0x400354B: call_init (dl-init.c:27)
    ==16979== by 0x40035F9: _dl_init (dl-init.c:137)
    ==16979== by 0x400F3DF: ??? (in /usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3)
    ==16979== Address 0xbde0f468 is on thread 1's stack
    ==16979== 16 bytes below stack pointer
    ==16979==
    ==16979== Invalid write of size 4
    ==16979== at 0x48B9A5A: lib_init (global.c:499)
    ==16979== by 0x400354B: call_init (dl-init.c:90)
    ==16979== by 0x400354B: call_init (dl-init.c:27)
    ==16979== by 0x40035F9: _dl_init (dl-init.c:137)
    ==16979== by 0x400F3DF: ??? (in /usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3)
    ==16979== Address 0xbde0f468 is on thread 1's stack
    ==16979== 16 bytes below stack pointer
    ==16979==
    ==16979== Invalid write of size 4
    ==16979== at 0x48B9A5A: lib_init (global.c:499)
    ==16979== by 0x400354B: call_init (dl-init.c:90)
    ==16979== by 0x400354B: call_init (dl-init.c:27)
    ==16979== by 0x40035F9: _dl_init (dl-init.c:137)
    ==16979== by 0x400F3DF: ??? (in /usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3)
    ==16979== Address 0xbde0f468 is on thread 1's stack
    ==16979== 16 bytes below stack pointer
    [...]
    (Full log attached)
    8X------------------------------

    OTOH the test succeeds on sid.[1] I have checked the differences trixie/sid
    and tried pulling in the other newer libraries into the trixie chroot in
    vain. The only thing I could not test was valgrind, sid has 1:3.20.0-2,
    trixie 1:3.19.0-1. So I *suspect* valgrind/trixie to be slightly broken.
    Could this be true? Any better ideas?

    TIA, cu Andreas


    [1]
    ==17768== Memcheck, a memory error detector
    ==17768== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==17768== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info ==17768== Command: /usr/bin/gsasl --mkpasswd --password password --mechanism SCRAM-SHA-1
    ==17768== {SCRAM-SHA-1}65536,GiFRM7gH+lxu1r64,cGXqaDs3AxdxGl/Ia36IpYwHFrA=,pYycqZHy09aKZ9UK3hEIaT9XSls=
    ==17768==
    ==17768== HEAP SUMMARY:
    ==17768== in use at exit: 42 bytes in 4 blocks
    ==17768== total heap usage: 1,326 allocs, 1,322 frees, 99,720 bytes allocated ==17768==
    ==17768== LEAK SUMMARY:
    ==17768== definitely lost: 0 bytes in 0 blocks
    ==17768== indirectly lost: 0 bytes in 0 blocks
    ==17768== possibly lost: 0 bytes in 0 blocks
    ==17768== still reachable: 42 bytes in 4 blocks
    ==17768== suppressed: 0 bytes in 0 blocks
    ==17768== Rerun with --leak-check=full to see details of leaked memory ==17768==
    ==17768== For lists of detected and suppressed errors, rerun with: -s
    ==17768== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2733 from 34)

    0
    -—-----------
    --
    `What a good friend you are to him, Dr. Maturin. His other friends are
    so grateful to you.'
    `I sew his ears on from time to time, sure'

    --BKfN9h4UN4Rf+mWi
    Content-Type: text/plain; charset=us-ascii
    Content-Disposition: attachment; filename="armhf.log" Content-Transfer-Encoding: quoted-printable


    { LD_LIBRARY_PATH=~/BU/gnutls28-3.8.2/debian/tmp/usr/lib/arm-linux-gnueabihf valgrind --error-exitcode=1 /usr/bin/gsasl --mkpasswd --password password --mechanism SCRAM-SHA-1 ; echo $? ; } > armhf.log 2>&1

    ==16979== Memcheck, a memory error detector
    ==16979== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==16979== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==16979== Command: /usr/bin/gsasl --mkpasswd --password password --mechanism SCRAM-SHA-1
    ==16979==
    ==16979== Invalid write of size 4
    ==16979== at 0x48B9A5A: lib_init (global.c:499)
    ==16979== by 0x400354B: call_init (dl-init.c:90)
    ==16979== by 0x400354B: call_init (dl-init.c:27)
    ==16979
  • From Adrien Nader@21:1/5 to Andreas Metzler on Sun Dec 3 19:20:01 2023
    Hi Andreas,

    On Sun, Dec 03, 2023, Andreas Metzler wrote:
    Hello,

    gnutls28 is currently blocked from testing because gsasl's autopkg test fails. I have played around on abel:

    Taking a trixie chroot and pulling in newer gnutls via LD_LIBRARY_PATH
    makes most of the testsuite fail, including this trivial test:

    This is due to -fstack-clash-protection now being enabled through
    buildflags (it was actually enabled on armhf after amd64 and arm64);
    gnutls was rebuilt with it and this reliably causes issues under
    valgrind.

    The actual cause is a bit unclear at the moment but it definitely
    triggers under valgrind.

    You can look at the "Really enable -fstack-clash-protection on
    armhf/armel?" on debian-arm@ (archive at https://lists.debian.org/debian-arm/2023/11/msg00015.html ).

    (and which I need to catch up since I'm falling being on diving more
    into the issue)

    On Ubuntu we disabled the flag on armhf at least as a stop-gap.

    Best,

    --
    Adrien

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Emanuele Rocca@21:1/5 to Andreas Metzler on Mon Dec 4 10:20:02 2023
    Hi Andreas!

    On 2023-12-03 06:20, Andreas Metzler wrote:
    gnutls28 is currently blocked from testing because gsasl's autopkg test fails.

    We recently enabled stack-clash-protection on all arm ports. On 32 bit
    arm the feature is implemented using stack probes, which valgrind flags
    as illegal accesses because they occur below the stack pointer address. However, stack probes don't actually care about the contents - just that
    the address is valid.

    When it comes to armel, valgrind is not supported altogether.

    I have added a suppression to valgrind 1:3.20.0-2 on armhf, which has
    now migrated to testing. gsasl tests pass on all arm ports (but fail on
    i386): https://ci.debian.net/packages/g/gsasl/

    The i386 failures are mostly due to the valgrind checks "Conditional
    jump or move depends on uninitialised value(s)". They did pass however
    with gsasl 2.2.0-1, which does not seem to have significant differences
    with 2.2.0-2 at first glance. The big one is that 2.2.0-1 was built with
    GCC 12, while 2.2.0-2 with GCC 13. Checking if the failures can be
    reproduced with GCC 12 may be worth a shot.

    Emanuele

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Metzler@21:1/5 to [email protected] on Mon Dec 4 18:30:01 2023
    On 2023-12-04 Emanuele Rocca <[email protected]> wrote:
    Hi Andreas!

    On 2023-12-03 06:20, Andreas Metzler wrote:
    gnutls28 is currently blocked from testing because gsasl's autopkg test fails.

    We recently enabled stack-clash-protection on all arm ports. On 32 bit
    arm the feature is implemented using stack probes, which valgrind flags
    as illegal accesses because they occur below the stack pointer address. However, stack probes don't actually care about the contents - just that
    the address is valid.

    Hello,

    thank you for the explanation and for already having a handle on the
    issue.

    cu Andreas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)