• core dump analysis, was Re: stack smashing detected

    From Finn Thain@21:1/5 to All on Tue Mar 28 06:00:01 2023
    On Sat, 18 Feb 2023, I wrote:

    On Fri, 17 Feb 2023, Stan Johnson wrote:


    That's not to say a SIGABRT is ignored, it just doesn't kill PID 1.


    I doubt that /sbin/init is generating the "stack smashing detected"
    error but you may need to modify it to find out. If you can't figure out which userland binary is involved, you'll have to focus on your custom
    kernel binary, just as I proposed in my message dated 8 Feb 2023.


    Using the core dump I generated on my Mac LC III, and using a workaround
    for the gdb regression, I was able to get the backtrace below.

    root@(none):/root# gdb
    GNU gdb (Debian 13.1-2) 13.1
    Copyright (C) 2023 Free Software Foundation, Inc.
    ...
    (gdb) set osabi GNU/Linux
    (gdb) exec /bin/dash
    (gdb) core /root/core.0
    warning: core file may not match specified executable file.
    [New LWP 366]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".
    Core was generated by `/bin/sh /etc/init.d/mountkernfs.sh reload'.
    Program terminated with signal SIGABRT, Aborted.
    #0 __pthread_kill_implementation (threadid=3222954656, signo=6, no_tid=0)
    at pthread_kill.c:44
    44 pthread_kill.c: No such file or directory.
    (gdb) bt
    #0 __pthread_kill_implementation (threadid=3222954656, signo=6, no_tid=0)
    at pthread_kill.c:44
    #1 0xc00a7080 in __pthread_kill_internal (signo=6, threadid=3222954656)
    at pthread_kill.c:78
    #2 __GI___pthread_kill (threadid=3222954656, signo=6) at pthread_kill.c:89
    #3 0xc0064c22 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
    #4 0xc0052faa in __GI_abort () at abort.c:79
    #5 0xc009b328 in __libc_message (action=<optimized out>, fmt=<optimized out>)
    at ../sysdeps/posix/libc_fatal.c:155
    #6 0xc012a3c2 in __GI___fortify_fail (
    msg=0xc0182c5e "stack smashing detected") at fortify_fail.c:26
    #7 0xc012a3a0 in __stack_chk_fail () at stack_chk_fail.c:24
    #8 0xc00e0172 in __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:41
    #9 0xd000c38e in ?? ()
    #10 0xefee111e in ?? ()
    #11 0x00000000 in ?? ()
    (gdb)

    It appears that the failure was in glibc (though I guess the root cause
    may lie elsewhere). I have two more core files generated by dash (actually
    by `/bin/sh /etc/rcS.d/S08mountall.sh start') that give the same
    backtrace. So even though the failure is intermittent, the site of the
    buffer overrun seems to be consistent.

    Looking at sysdeps/unix/sysv/linux/wait3.c, I guess the only possible
    place for a buffer overrun would be struct __rusage64 usage64. https://sources.debian.org/src/glibc/2.36-8/sysdeps/unix/sysv/linux/wait3.c/?hl=41#L41

    (gdb) select-frame 8
    (gdb) print usage64
    $3 = {ru_utime = {tv_sec = 6481621047248640, tv_usec = 91671782025504},
    ru_stime = {tv_sec = 25769811968, tv_usec = 8591449888}, {
    ru_maxrss = 1515296, __ru_maxrss_word = 1515296}, {ru_ixrss = 1515296,
    __ru_ixrss_word = 1515296}, {ru_idrss = 224, __ru_idrss_word = 224}, {
    ru_isrss = 224, __ru_isrss_word = 224}, {ru_minflt = 6,
    __ru_minflt_word = 6}, {ru_majflt = 4, __ru_majflt_word = 4}, {
    ru_nswap = 4, __ru_nswap_word = 4}, {ru_inblock = 372,
    __ru_inblock_word = 372}, {ru_oublock = 0, __ru_oublock_word = 0}, {
    ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 8,
    __ru_msgrcv_word = 8}, {ru_nsignals = 367, __ru_nsignals_word = 367}, {
    ru_nvcsw = 10, __ru_nvcsw_word = 10}, {ru_nivcsw = 0,
    __ru_nivcsw_word = 0}}
    (gdb)

    Of course, at this point the damage has already been done and the culprit
    has gone. I guess there was a buffer overrun during the call to __wait4_time64(). https://sources.debian.org/src/glibc/2.36-8/sysdeps/unix/sysv/linux/wait4.c/?hl=26#L26

    It's hard to read glibc source code without knowing what all the macros
    were set to (such as __KERNEL_OLD_TIMEVAL_MATCHES_TIMEVAL64 and
    __TIMESIZE).

    It would be disappointing if rusage64_to_rusage() in __wait3() was being applied to the result of rusage32_to_rusage64() from __wait4_time64().
    Perhaps the ifdefs are arranged in such a way that it doesn't happen...

    Anyway, does anyone know how to get a hex dump of the whole stack frame including the canary, in case there is something to be learned from that?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to All on Fri Mar 31 05:50:01 2023
    On 28 March 2023, I wrote,


    Looking at sysdeps/unix/sysv/linux/wait3.c, I guess the only possible
    place for a buffer overrun would be struct __rusage64 usage64. https://sources.debian.org/src/glibc/2.36-8/sysdeps/unix/sysv/linux/wait3.c/?hl=41#L41


    ... but now I see the usage64 variable is not involved at all because
    __wait3() was passed a NULL pointer: https://sources.debian.org/src/dash/0.5.12-2/src/jobs.c/?hl=1179#L1179

    So NULL (rather than &usage64) was passed to the wait4() syscall which
    means the kernel didn't invoke copy_to_user() at all. AFAICS there's no possible buffer overflow in __wait3(), __wait4_time64() etc.

    That suggests to a problem with GCC's SSP detector.

    Here's a more complete backtrace and some disassembly.

    # gdb
    GNU gdb (Debian 13.1-2) 13.1
    ...
    (gdb) set osabi GNU/Linux
    (gdb) file /bin/dash
    Reading symbols from /bin/dash...
    Reading symbols from /usr/lib/debug/.build-id/aa/4160f84f3eeee809c554cb9f3e1ef0686b8dcc.debug...
    (gdb)
    (gdb) core /root/core.0
    warning: core file may not match specified executable file.
    [New LWP 366]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".
    Core was generated by `/bin/sh /etc/init.d/mountkernfs.sh reload'.
    Program terminated with signal SIGABRT, Aborted.
    #0 __pthread_kill_implementation (threadid=3222954656, signo=6, no_tid=0)
    at pthread_kill.c:44
    44 pthread_kill.c: No such file or directory.
    (gdb) bt
    #0 __pthread_kill_implementation (threadid=3222954656, signo=6, no_tid=0)
    at pthread_kill.c:44
    #1 0xc00a7080 in __pthread_kill_internal (signo=6, threadid=3222954656)
    at pthread_kill.c:78
    #2 __GI___pthread_kill (threadid=3222954656, signo=6) at pthread_kill.c:89
    #3 0xc0064c22 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
    #4 0xc0052faa in __GI_abort () at abort.c:79
    #5 0xc009b328 in __libc_message (action=<optimized out>, fmt=<optimized out>)
    at ../sysdeps/posix/libc_fatal.c:155
    #6 0xc012a3c2 in __GI___fortify_fail (
    msg=0xc0182c5e "stack smashing detected") at fortify_fail.c:26
    #7 0xc012a3a0 in __stack_chk_fail () at stack_chk_fail.c:24
    #8 0xc00e0172 in __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:41
    #9 0xd000c38e in waitproc (status=0xefee110e, block=1) at jobs.c:1179
    #10 waitone (block=1, job=0xd0021930) at jobs.c:1055
    #11 0xd000c5b8 in dowait (block=1, jp=0xd0021930) at jobs.c:1137
    #12 0xd000ddb0 in waitforjob (jp=0xd0021930) at jobs.c:1014
    #13 0xd000aade in expbackq (flag=324, cmd=0xd00222c4) at expand.c:520
    #14 argstr (p=<optimized out>, flag=68) at expand.c:335
    #15 0xd000b5ce in expandarg (arg=0xd00222ac, arglist=0xefee13bc, flag=4)
    at expand.c:192
    #16 0xd0007e2a in evalcommand (cmd=<optimized out>, flags=<optimized out>)
    at eval.c:855
    #17 0xd0006ffc in evaltree (n=0xd0022294, flags=0) at eval.c:300
    #18 0xd0006e96 in evaltree (n=0xd0022294, flags=0) at eval.c:300
    #19 0xd0006e6a in evaltree (n=0xd002224c, flags=0) at eval.c:292
    #20 0xd0006e6a in evaltree (n=0xd00220d4, flags=0) at eval.c:292
    #21 0xd0006e6a in evaltree (n=0xd002208c, flags=0) at eval.c:292
    #22 0xd000746a in evalfun (func=0xd0022078, argc=<optimized out>,
    argv=0xd001e61c <stackbase+376>, flags=<optimized out>) at eval.c:1009
    #23 0xd0008176 in evalcommand (cmd=<optimized out>, flags=<optimized out>)
    at eval.c:921
    #24 0xd0006ffc in evaltree (n=0xd001e588 <stackbase+228>, flags=1)
    at eval.c:300
    #25 0xd00084c8 in evaltreenr (flags=1, n=0xd001e588 <stackbase+228>)
    at eval.c:347
    #26 evalbackcmd (n=<optimized out>, result=0xefee17d4) at eval.c:650
    #27 0xd000a984 in expbackq (flag=324, cmd=0xd001e588 <stackbase+228>)
    at expand.c:495
    #28 argstr (p=<optimized out>, flag=68) at expand.c:335
    #29 0xd000b5ce in expandarg (arg=0xd001e5b0 <stackbase+268>,
    arglist=0xefee191c, flag=4) at expand.c:192
    #30 0xd0007e2a in evalcommand (cmd=<optimized out>, flags=<optimized out>)
    at eval.c:855
    #31 0xd0006ffc in evaltree (n=0xd001e5c0 <stackbase+284>, flags=0)
    at eval.c:300
    #32 0xd000e3c0 in cmdloop (top=0) at main.c:246
    #33 0xd000e588 in dotcmd (argc=2, argv=<optimized out>) at main.c:341
    #34 0xd0007a12 in evalbltin (cmd=0xd001b598 <builtincmd>,
    argc=<optimized out>, argv=<optimized out>, flags=<optimized out>)
    at eval.c:967
    #35 0xd00080ca in evalcommand (cmd=<optimized out>, flags=<optimized out>)
    at eval.c:910
    #36 0xd0006ffc in evaltree (n=0xd001e4e8 <stackbase+68>, flags=0) at eval.c:300 #37 0xd000e3c0 in cmdloop (top=1) at main.c:246
    #38 0xd0005018 in main (argc=<optimized out>, argv=<optimized out>)
    at main.c:181
    (gdb) frame 8
    #8 0xc00e0172 in __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:41
    41 ../sysdeps/unix/sysv/linux/wait3.c: No such file or directory.
    (gdb) info frame
    Stack level 8, frame at 0xefee10e0:
    pc = 0xc00e0172 in __wait3 (../sysdeps/unix/sysv/linux/wait3.c:41);
    saved pc = 0xd000c38e
    called by frame at 0xefee11dc, caller of frame at 0xefee106c
    source language c.
    Arglist at 0xefee10d8, args: stat_loc=<optimized out>,
    options=<optimized out>, usage=<optimized out>
    Locals at 0xefee10d8, Previous frame's sp is 0xefee10e0
    Saved registers:
    a2 at 0xefee106c, a3 at 0xefee1070, a5 at 0xefee1074, fp at 0xefee10d8,
    pc at 0xefee10dc
    (gdb) x/32z 0xefee1060
    0xefee1060: 0xc0182c5e 0xc0198000 0xc00e0172 0xd001e718 0xefee1070: 0xd001e498 0xd001b874 0x00170700 0x00170700 0xefee1080: 0x00170700 0x00005360 0x0000e920 0x00000006 0xefee1090: 0x00002000 0x00000002 0x00171f20 0x00171f20 0xefee10a0: 0x00171f20 0x000000e0 0x000000e0 0x00000006 0xefee10b0: 0x00000004 0x00000004 0x00000174 0x00000000 0xefee10c0: 0x00000000 0x00000008 0x0000016f 0x0000000a 0xefee10d0: 0x00000000 0x00ac3dbe 0xd001c1ec 0xd000c38e (gdb)
    0xefee10e0: 0xefee111e 0x00000000 0x00000000 0x00000001 0xefee10f0: 0x00000001 0xefee1284 0x00000044 0xd0017714 0xefee1100: 0x00000100 0xd0021930 0xd001c1ec 0xd001e498 0xefee1110: 0xd001b874 0xefee1308 0xc0023e8c 0xefee0000 0xefee1120: 0x00000044 0xd0017714 0x00000100 0xefee1274 0xefee1130: 0xc0023e8c 0xd001c028 0xd001b874 0xefee1208 0xefee1140: 0x00000000 0xc0023e8c 0x00000000 0x00000000 0xefee1150: 0x00000000 0x00000000 0x00000000 0x00000000 (gdb) print &usage64
    $1 = (struct __rusage64 *) 0xefee107c
    (gdb) disass
    Dump of assembler code for function __wait3:
    0xc00e0070 <+0>: linkw %fp,#-96
    0xc00e0074 <+4>: moveml %a2-%a3/%a5,%sp@-
    0xc00e0078 <+8>: lea %pc@(0xc0198000),%a5
    0xc00e0080 <+16>: movel %fp@(8),%d0
    0xc00e0084 <+20>: moveal %fp@(16),%a2
    0xc00e0088 <+24>: moveal %a5@(108),%a3
    0xc00e008c <+28>: movel %a3@,%fp@(-4)
    0xc00e0090 <+32>: tstl %a2
    0xc00e0092 <+34>: beqw 0xc00e0152 <__wait3+226>
    0xc00e0096 <+38>: pea %fp@(-92)
    0xc00e009a <+42>: movel %fp@(12),%sp@-
    0xc00e009e <+46>: movel %d0,%sp@-
    0xc00e00a0 <+48>: pea 0xffffffff
    0xc00e00a4 <+52>: bsrl 0xc00e0174 <__GI___wait4_time64>
    0xc00e00aa <+58>: lea %sp@(16),%sp
    0xc00e00ae <+62>: tstl %d0
    0xc00e00b0 <+64>: bgts 0xc00e00c8 <__wait3+88>
    0xc00e00b2 <+66>: moveal %fp@(-4),%a0
    0xc00e00b6 <+70>: movel %a3@,%d1
    0xc00e00b8 <+72>: cmpl %a0,%d1
    0xc00e00ba <+74>: bnew 0xc00e016c <__wait3+252>
    0xc00e00be <+78>: moveml %fp@(-108),%a2-%a3/%a5
    0xc00e00c4 <+84>: unlk %fp
    0xc00e00c6 <+86>: rts
    0xc00e00c8 <+88>: pea 0x44
    0xc00e00cc <+92>: clrl %sp@-
    0xc00e00ce <+94>: pea %a2@(4)
    0xc00e00d2 <+98>: movel %d0,%fp@(-96)
    0xc00e00d6 <+102>: bsrl 0xc00b8850 <__GI_memset>
    0xc00e00dc <+108>: movel %fp@(-88),%a2@
    0xc00e00e0 <+112>: movel %fp@(-80),%a2@(4)
    0xc00e00e6 <+118>: movel %fp@(-72),%a2@(8)
    0xc00e00ec <+124>: movel %fp@(-64),%a2@(12)
    0xc00e00f2 <+130>: movel %fp@(-60),%a2@(16)
    0xc00e00f8 <+136>: movel %fp@(-56),%a2@(20)
    0xc00e00fe <+142>: movel %fp@(-52),%a2@(24)
    0xc00e0104 <+148>: movel %fp@(-48),%a2@(28)
    0xc00e010a <+154>: movel %fp@(-44),%a2@(32)
    0xc00e0110 <+160>: movel %fp@(-40),%a2@(36)
    0xc00e0116 <+166>: movel %fp@(-36),%a2@(40)
    0xc00e011c <+172>: movel %fp@(-32),%a2@(44)
    0xc00e0122 <+178>: movel %fp@(-28),%a2@(48)
    0xc00e0128 <+184>: movel %fp@(-24),%a2@(52)
    0xc00e012e <+190>: movel %fp@(-20),%a2@(56)
    0xc00e0134 <+196>: movel %fp@(-16),%a2@(60)
    0xc00e013a <+202>: movel %fp@(-12),%a2@(64)
    0xc00e0140 <+208>: movel %fp@(-8),%a2@(68)
    0xc00e0146 <+214>: lea %sp@(12),%sp
    0xc00e014a <+218>: movel %fp@(-96),%d0
    0xc00e014e <+222>: braw 0xc00e00b2 <__wait3+66>
    0xc00e0152 <+226>: clrl %sp@-
    0xc00e0154 <+228>: movel %fp@(12),%sp@-
    0xc00e0158 <+232>: movel %d0,%sp@-
    0xc00e015a <+234>: pea 0xffffffff
    0xc00e015e <+238>: bsrl 0xc00e0174 <__GI___wait4_time64>
    0xc00e0164 <+244>: lea %sp@(16),%sp
    0xc00e0168 <+248>: braw 0xc00e00b2 <__wait3+66>
    0xc00e016c <+252>: bsrl 0xc012a38c <__stack_chk_fail>
    End of assembler dump.
    (gdb)

    Note that __wait3(stat_loc, options, NULL) reduces to,

    return __wait4_time64(-1, stat_loc, options, NULL);

    So I think the branch at __wait3+34 was taken, and after bsr __GI___wait4_time64, the branch at __wait3+248 would have been taken. Then
    the canary located at %fp@(-4) was compared with %a3@. From the hex dump
    above, %fp@(-4) is 0xd000c38e.

    As for %a3, we know its value when SIGABRT was caught, and if I'm not
    mistaken, %a3 was not altered by __stack_chk_fail or
    __GI___fortify_fail...

    (gdb) frame 7
    #7 0xc012a3a0 in __stack_chk_fail () at stack_chk_fail.c:24
    24 stack_chk_fail.c: No such file or directory.
    (gdb) info frame
    Stack level 7, frame at 0xefee106c:
    pc = 0xc012a3a0 in __stack_chk_fail (stack_chk_fail.c:24);
    saved pc = 0xc00e0172
    called by frame at 0xefee10e0, caller of frame at 0xefee1060
    source language c.
    Arglist at 0xefee105c, args:
    Locals at 0xefee105c, Previous frame's sp is 0xefee106c
    Saved registers:
    a5 at 0xefee1064, pc at 0xefee1068
    (gdb) disass
    Dump of assembler code for function __stack_chk_fail:
    0xc012a38c <+0>: movel %a5,%sp@-
    0xc012a38e <+2>: lea %pc@(0xc0198000),%a5
    0xc012a396 <+10>: movel %a5@(10696),%sp@-
    0xc012a39a <+14>: bsrl 0xc012a3a0 <__GI___fortify_fail>
    End of assembler dump.
    (gdb) frame 6
    #6 0xc012a3c2 in __GI___fortify_fail (
    msg=0xc0182c5e "stack smashing detected") at fortify_fail.c:26
    26 fortify_fail.c: No such file or directory.
    (gdb) info frame
    Stack level 6, frame at 0xefee1060:
    pc = 0xc012a3c2 in __GI___fortify_fail (fortify_fail.c:26);
    saved pc = 0xc012a3a0
    called by frame at 0xefee106c, caller of frame at 0xefee1044
    source language c.
    Arglist at 0xefee1040, args: msg=0xc0182c5e "stack smashing detected"
    Locals at 0xefee1040, Previous frame's sp is 0xefee1060
    Saved registers:
    d2 at 0xefee1050, d3 at 0xefee1054, a5 at 0xefee1058, pc at 0xefee105c
    (gdb) disass
    Dump of assembler code for function __GI___fortify_fail:
    0xc012a3a0 <+0>: moveml %d2-%d3/%a5,%sp@-
    0xc012a3a4 <+4>: lea %pc@(0xc0198000),%a5
    0xc012a3ac <+12>: movel %sp@(16),%d3
    0xc012a3b0 <+16>: movel %a5@(10700),%d2
    0xc012a3b4 <+20>: movel %d3,%sp@-
    0xc012a3b6 <+22>: movel %d2,%sp@-
    0xc012a3b8 <+24>: pea 0x1
    0xc012a3bc <+28>: bsrl 0xc009b1d8 <__libc_message>
    0xc012a3c2 <+34>: lea %sp@(12),%sp
    0xc012a3c6 <+38>: movel %d3,%sp@-
    0xc012a3c8 <+40>: movel %d2,%sp@-
    0xc012a3ca <+42>: pea 0x1
    0xc012a3ce <+46>: bsrl 0xc009b1d8 <__libc_message>
    0xc012a3d4 <+52>: lea %sp@(12),%sp
    0xc012a3d8 <+56>: bras 0xc012a3b4 <__GI___fortify_fail+20>
    End of assembler dump.
    (gdb) info reg
    d0 0x0 0
    d1 0x16e 366
    d2 0xc0182c76 -1072157578
    d3 0xc0182c5e -1072157602
    d4 0xefee1122 -269610718
    d5 0x1 1
    d6 0xd0021930 -805168848
    d7 0x100 256
    a0 0xc01a62a0 0xc01a62a0
    a1 0xffffffe6 0xffffffe6
    a2 0x0 0x0
    a3 0xefee1068 0xefee1068
    a4 0xd001e71c 0xd001e71c <pending_sig>
    a5 0xc0198000 0xc0198000
    fp 0xefee10d8 0xefee10d8
    sp 0xefee1044 0xefee1044
    ps 0x10 [ X ]
    pc 0xc012a3c2 0xc012a3c2 <__GI___fortify_fail+34> fpcontrol 0x0 0
    fpstatus 0x0 0
    fpiaddr 0x0 0x0

    So %a3 was a pointer into stack frame 6??

    (gdb) x/z $a3
    0xefee1068: 0xc00e0172

    Clearly 0xd000c38e != 0xc00e0172 (that is, %fp@(-4) != %a3@) but did the
    canary value change? It rather looks like the canary pointer is wrong...

    Another way to find the value of %a3 during __wait3() execution is to look
    at its initialization: moveal %a5@(108),%a3. And we can see from 'info
    frame' above that __stack_chk_fail() saved %a5 at 0xefee1064.

    (gdb) x/4z 0xefee1060
    0xefee1060: 0xc0182c5e 0xc0198000 0xc00e0172 0xd001e718 (gdb) x/z *0xefee1064+108
    0xc019806c: Cannot access memory at address 0xc019806c

    Anyway, if the analysis is right (hopefully someone can confirm that)
    this looks like a GCC bug.

    I'm not sure why it only shows up during (sysvinit) init script execution.
    The canary value is derived from /dev/urandom so I guess the failure is intermittent because it is connected to kernel PRNG state during early
    startup.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to All on Sat Apr 1 11:50:01 2023
    On 31 Mar 2023, I wrote,

    ...

    So %a3 was a pointer into stack frame 6??

    (gdb) x/z $a3
    0xefee1068: 0xc00e0172

    Clearly 0xd000c38e != 0xc00e0172 (that is, %fp@(-4) != %a3@) but did the canary value change? It rather looks like the canary pointer is wrong...


    So the question is, why is %a3 wrong? We know it was wrong because %a3 and
    %fp got saved on the stack when __libc_message was called:

    (gdb) info frame 5
    Stack frame at 0xefee1044:
    pc = 0xc009b328 in __libc_message (../sysdeps/posix/libc_fatal.c:155);
    saved pc = 0xc012a3c2
    called by frame at 0xefee1060, caller of frame at 0xefee0fd0
    source language c.
    Arglist at 0xefee103c, args: action=<optimized out>, fmt=<optimized out>
    Locals at 0xefee103c, Previous frame's sp is 0xefee1044
    Saved registers:
    d2 at 0xefee100c, d3 at 0xefee1010, d4 at 0xefee1014, d5 at 0xefee1018,
    d6 at 0xefee101c, d7 at 0xefee1020, a2 at 0xefee1024, a3 at 0xefee1028,
    a4 at 0xefee102c, a5 at 0xefee1030, fp at 0xefee103c, pc at 0xefee1040
    (gdb) x/z 0xefee1028
    0xefee1028: 0xefee1068
    (gdb) x/z 0xefee1068
    0xefee1068: 0xc00e0172
    (gdb) x/z 0xefee103c
    0xefee103c: 0xefee10d8
    (gdb) x/z 0xefee10d8-4
    0xefee10d4: 0x00ac3dbe

    That value for %fp@(-4) is the correct canary value (see below) showing
    that there's no buffer overrun in __wait3, and also showing that %a3 was
    wrong when the canary was to be validated.

    Another way to find the value of %a3 during __wait3() execution is to look
    at its initialization: moveal %a5@(108),%a3. And we can see from 'info
    frame' above that __stack_chk_fail() saved %a5 at 0xefee1064.

    (gdb) x/4z 0xefee1060
    0xefee1060: 0xc0182c5e 0xc0198000 0xc00e0172 0xd001e718 (gdb) x/z *0xefee1064+108
    0xc019806c: Cannot access memory at address 0xc019806c


    That was a mistake on my part. It should have been,

    (gdb) x/z 0xefee1064
    0xefee1064: 0xc0198000
    (gdb) x/z 0xc0198000+108
    0xc019806c: 0xc0023e8c

    So that's the value %a3 was given at __wait3+24, and it looks fine:

    (gdb) x/z 0xc0023e8c
    0xc0023e8c <__stack_chk_guard>: 0x00ac3dbe

    ... the expected value for %a3 at __wait3+70 is <stack_chk_guard> as this
    is used to validate the canary at %fp-4.

    So, in summary, the canary validation failed in this case not because the canary got clobbered but because %a3 got clobbered, somewhere between __wait3+24 and __wait3+70 (below).

    The call to __GI___wait4_time64 causes %a3 to be saved to and restored
    from the stack, so stack corruption seems to be a strong possibility to
    explain the change in %a3.

    But if that's what happened, I'd expect __GI___wait4_time64 to report
    stack smashing, not __wait3... And it just begs the question, what then
    caused the corruption? Was it the wait4 syscall? Was it another thread?
    And why is it so rare?

    (gdb) disass __wait3
    Dump of assembler code for function __wait3:
    0xc00e0070 <+0>: linkw %fp,#-96
    0xc00e0074 <+4>: moveml %a2-%a3/%a5,%sp@-
    0xc00e0078 <+8>: lea %pc@(0xc0198000),%a5
    0xc00e0080 <+16>: movel %fp@(8),%d0
    0xc00e0084 <+20>: moveal %fp@(16),%a2
    0xc00e0088 <+24>: moveal %a5@(108),%a3
    0xc00e008c <+28>: movel %a3@,%fp@(-4)
    0xc00e0090 <+32>: tstl %a2
    0xc00e0092 <+34>: beqw 0xc00e0152 <__wait3+226>
    0xc00e0096 <+38>: pea %fp@(-92)
    0xc00e009a <+42>: movel %fp@(12),%sp@-
    0xc00e009e <+46>: movel %d0,%sp@-
    0xc00e00a0 <+48>: pea 0xffffffff
    0xc00e00a4 <+52>: bsrl 0xc00e0174 <__GI___wait4_time64>
    0xc00e00aa <+58>: lea %sp@(16),%sp
    0xc00e00ae <+62>: tstl %d0
    0xc00e00b0 <+64>: bgts 0xc00e00c8 <__wait3+88>
    0xc00e00b2 <+66>: moveal %fp@(-4),%a0
    0xc00e00b6 <+70>: movel %a3@,%d1
    0xc00e00b8 <+72>: cmpl %a0,%d1
    0xc00e00ba <+74>: bnew 0xc00e016c <__wait3+252>
    0xc00e00be <+78>: moveml %fp@(-108),%a2-%a3/%a5
    0xc00e00c4 <+84>: unlk %fp
    0xc00e00c6 <+86>: rts
    0xc00e00c8 <+88>: pea 0x44
    0xc00e00cc <+92>: clrl %sp@-
    0xc00e00ce <+94>: pea %a2@(4)
    0xc00e00d2 <+98>: movel %d0,%fp@(-96)
    0xc00e00d6 <+102>: bsrl 0xc00b8850 <__GI_memset>
    0xc00e00dc <+108>: movel %fp@(-88),%a2@
    0xc00e00e0 <+112>: movel %fp@(-80),%a2@(4)
    0xc00e00e6 <+118>: movel %fp@(-72),%a2@(8)
    0xc00e00ec <+124>: movel %fp@(-64),%a2@(12)
    0xc00e00f2 <+130>: movel %fp@(-60),%a2@(16)
    0xc00e00f8 <+136>: movel %fp@(-56),%a2@(20)
    0xc00e00fe <+142>: movel %fp@(-52),%a2@(24)
    0xc00e0104 <+148>: movel %fp@(-48),%a2@(28)
    0xc00e010a <+154>: movel %fp@(-44),%a2@(32)
    0xc00e0110 <+160>: movel %fp@(-40),%a2@(36)
    0xc00e0116 <+166>: movel %fp@(-36),%a2@(40)
    0xc00e011c <+172>: movel %fp@(-32),%a2@(44)
    0xc00e0122 <+178>: movel %fp@(-28),%a2@(48)
    0xc00e0128 <+184>: movel %fp@(-24),%a2@(52)
    0xc00e012e <+190>: movel %fp@(-20),%a2@(56)
    0xc00e0134 <+196>: movel %fp@(-16),%a2@(60)
    0xc00e013a <+202>: movel %fp@(-12),%a2@(64)
    0xc00e0140 <+208>: movel %fp@(-8),%a2@(68)
    0xc00e0146 <+214>: lea %sp@(12),%sp
    0xc00e014a <+218>: movel %fp@(-96),%d0
    0xc00e014e <+222>: braw 0xc00e00b2 <__wait3+66>
    0xc00e0152 <+226>: clrl %sp@-
    0xc00e0154 <+228>: movel %fp@(12),%sp@-
    0xc00e0158 <+232>: movel %d0,%sp@-
    0xc00e015a <+234>: pea 0xffffffff
    0xc00e015e <+238>: bsrl 0xc00e0174 <__GI___wait4_time64>
    0xc00e0164 <+244>: lea %sp@(16),%sp
    0xc00e0168 <+248>: braw 0xc00e00b2 <__wait3+66>
    0xc00e016c <+252>: bsrl 0xc012a38c <__stack_chk_fail>
    End of assembler dump.
    (gdb) disass __GI___wait4_time64
    Dump of assembler code for function __GI___wait4_time64:
    0xc00e0174 <+0>: lea %sp@(-80),%sp
    0xc00e0178 <+4>: moveml %d2-%d5/%a2-%a3/%a5,%sp@-
    0xc00e017c <+8>: lea %pc@(0xc0198000),%a5
    0xc00e0184 <+16>: movel %sp@(116),%d2
    0xc00e0188 <+20>: moveal %sp@(124),%a2
    0xc00e018c <+24>: moveal %a5@(108),%a3
    0xc00e0190 <+28>: movel %a3@,%sp@(104)
    0xc00e0194 <+32>: bsrl 0xc0052e2c <__m68k_read_tp@plt>
    0xc00e019a <+38>: movel %a0@(-29920),%d4
    0xc00e019e <+42>: bnew 0xc00e026c <__GI___wait4_time64+248>
    0xc00e01a2 <+46>: tstl %a2
    0xc00e01a4 <+48>: beqs 0xc00e01aa <__GI___wait4_time64+54>
    0xc00e01a6 <+50>: moveq #32,%d4
    0xc00e01a8 <+52>: addl %sp,%d4
    0xc00e01aa <+54>: movel %sp@(120),%d3
    0xc00e01ae <+58>: movel %sp@(112),%d1
    0xc00e01b2 <+62>: moveq #114,%d0
    0xc00e01b4 <+64>: trap #0
    0xc00e01b6 <+66>: cmpil #-4096,%d0
    0xc00e01bc <+72>: bhiw 0xc00e02a6 <__GI___wait4_time64+306>
    0xc00e01c0 <+76>: tstl %d0
    0xc00e01c2 <+78>: blew 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e01c6 <+82>: tstl %a2
    0xc00e01c8 <+84>: beqw 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e01cc <+88>: movel %sp@(32),%a2@(4)
    0xc00e01d2 <+94>: smi %d1
    0xc00e01d4 <+96>: extbl %d1
    0xc00e01d6 <+98>: movel %d1,%a2@
    0xc00e01d8 <+100>: movel %sp@(36),%a2@(12)
    0xc00e01de <+106>: smi %d1
    0xc00e01e0 <+108>: extbl %d1
    0xc00e01e2 <+110>: movel %d1,%a2@(8)
    0xc00e01e6 <+114>: movel %sp@(40),%a2@(20)
    0xc00e01ec <+120>: smi %d1
    0xc00e01ee <+122>: extbl %d1
    0xc00e01f0 <+124>: movel %d1,%a2@(16)
    0xc00e01f4 <+128>: movel %sp@(44),%a2@(28)
    0xc00e01fa <+134>: smi %d1
    0xc00e01fc <+136>: extbl %d1
    0xc00e01fe <+138>: movel %d1,%a2@(24)
    0xc00e0202 <+142>: movel %sp@(48),%a2@(32)
    0xc00e0208 <+148>: movel %sp@(52),%a2@(36)
    0xc00e020e <+154>: movel %sp@(56),%a2@(40)
    0xc00e0214 <+160>: movel %sp@(60),%a2@(44)
    0xc00e021a <+166>: movel %sp@(64),%a2@(48)
    0xc00e0220 <+172>: movel %sp@(68),%a2@(52)
    0xc00e0226 <+178>: movel %sp@(72),%a2@(56)
    0xc00e022c <+184>: movel %sp@(76),%a2@(60)
    0xc00e0232 <+190>: movel %sp@(80),%a2@(64)
    0xc00e0238 <+196>: movel %sp@(84),%a2@(68)
    0xc00e023e <+202>: movel %sp@(88),%a2@(72)
    0xc00e0244 <+208>: movel %sp@(92),%a2@(76)
    0xc00e024a <+214>: movel %sp@(96),%a2@(80)
    0xc00e0250 <+220>: movel %sp@(100),%a2@(84)
    0xc00e0256 <+226>: moveal %sp@(104),%a0
    0xc00e025a <+230>: movel %a3@,%d1
    0xc00e025c <+232>: cmpl %a0,%d1
    0xc00e025e <+234>: bnew 0xc00e02f2 <__GI___wait4_time64+382>
    0xc00e0262 <+238>: moveml %sp@+,%d2-%d5/%a2-%a3/%a5
    0xc00e0266 <+242>: lea %sp@(80),%sp
    0xc00e026a <+246>: rts
    0xc00e026c <+248>: bsrl 0xc00a1b88 <__GI___pthread_enable_asynccancel>
    0xc00e0272 <+254>: movel %d0,%d5
    0xc00e0274 <+256>: tstl %a2
    0xc00e0276 <+258>: beqs 0xc00e02c4 <__GI___wait4_time64+336>
    0xc00e0278 <+260>: moveq #32,%d4
    0xc00e027a <+262>: addl %sp,%d4
    0xc00e027c <+264>: movel %sp@(120),%d3
    0xc00e0280 <+268>: movel %sp@(112),%d1
    0xc00e0284 <+272>: moveq #114,%d0
    0xc00e0286 <+274>: trap #0
    0xc00e0288 <+276>: cmpil #-4096,%d0
    0xc00e028e <+282>: bhis 0xc00e02c8 <__GI___wait4_time64+340>
    0xc00e0290 <+284>: movel %d5,%sp@-
    0xc00e0292 <+286>: movel %d0,%sp@(32)
    0xc00e0296 <+290>: bsrl 0xc00a1bea <__GI___pthread_disable_asynccancel>
    0xc00e029c <+296>: addql #4,%sp
    0xc00e029e <+298>: movel %sp@(28),%d0
    0xc00e02a2 <+302>: braw 0xc00e01c0 <__GI___wait4_time64+76>
    0xc00e02a6 <+306>: movel %d0,%sp@(28)
    0xc00e02aa <+310>: bsrl 0xc0052e2c <__m68k_read_tp@plt>
    0xc00e02b0 <+316>: addal %a5@(2cf8),%a0
    0xc00e02b8 <+324>: movel %sp@(28),%d0
    0xc00e02bc <+328>: negl %d0
    0xc00e02be <+330>: movel %d0,%a0@
    0xc00e02c0 <+332>: moveq #-1,%d0
    0xc00e02c2 <+334>: bras 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e02c4 <+336>: clrl %d4
    0xc00e02c6 <+338>: bras 0xc00e027c <__GI___wait4_time64+264>
    0xc00e02c8 <+340>: movel %d0,%sp@(28)
    0xc00e02cc <+344>: bsrl 0xc0052e2c <__m68k_read_tp@plt>
    0xc00e02d2 <+350>: addal %a5@(2cf8),%a0
    0xc00e02da <+358>: movel %sp@(28),%d0
    0xc00e02de <+362>: negl %d0
    0xc00e02e0 <+364>: movel %d0,%a0@
    0xc00e02e2 <+366>: movel %d5,%sp@-
    0xc00e02e4 <+368>: bsrl 0xc00a1bea <__GI___pthread_disable_asynccancel>
    0xc00e02ea <+374>: addql #4,%sp
    0xc00e02ec <+376>: moveq #-1,%d0
    0xc00e02ee <+378>: braw 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e02f2 <+382>: bsrl 0xc012a38c <__stack_chk_fail>
    End of assembler dump.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Finn Thain on Sat Apr 1 12:40:01 2023
    On Apr 01 2023, Finn Thain wrote:

    So, in summary, the canary validation failed in this case not because the canary got clobbered but because %a3 got clobbered, somewhere between __wait3+24 and __wait3+70 (below).

    The call to __GI___wait4_time64 causes %a3 to be saved to and restored
    from the stack, so stack corruption seems to be a strong possibility to explain the change in %a3.

    But if that's what happened, I'd expect __GI___wait4_time64 to report
    stack smashing, not __wait3...

    The stask smashing probably didn't fire in __wait4_time64, because it
    hit the saved register area, not the canary (which reside on the
    opposite ends of the stack frame).

    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sun Apr 2 06:20:01 2023
    Hi Finn,

    nice work!

    Am 01.04.2023 um 22:27 schrieb Finn Thain:
    So, in summary, the canary validation failed in this case not because the canary got clobbered but because %a3 got clobbered, somewhere between __wait3+24 and __wait3+70 (below).

    The call to __GI___wait4_time64 causes %a3 to be saved to and restored
    from the stack, so stack corruption seems to be a strong possibility to explain the change in %a3.

    But if that's what happened, I'd expect __GI___wait4_time64 to report
    stack smashing, not __wait3... And it just begs the question, what then caused the corruption? Was it the wait4 syscall? Was it another thread?

    Saved registers are restored from the stack before return from __GI___wait4_time64 but we don't know which of the two wait4 call sites
    was used, do we?

    What registers does __m68k_read_tp@plt clobber?

    And why is it so rare?

    Maybe an interaction between (multiple?) signals and syscall return...
    depends on how long we sleep in wait4, and whether a signal happens just
    during that time.

    %a3 is the first register saved to the switch stack BTW.

    That kernel does contain Al Viro's patch that corrected our switch stack handling in the signal return path? I wonder whether there's a potential
    race lurking in there?

    And I just notice that we had had trouble with a copy_to_user in
    setup_frame() earlier (reason for my buserr handler patch). I wonder
    whether something's gone wrong there. Do you get a segfault instead of
    the abort signal if you drop my patch?

    Cheers,

    Michael




    (gdb) disass __wait3
    Dump of assembler code for function __wait3:
    0xc00e0070 <+0>: linkw %fp,#-96
    0xc00e0074 <+4>: moveml %a2-%a3/%a5,%sp@-
    0xc00e0078 <+8>: lea %pc@(0xc0198000),%a5
    0xc00e0080 <+16>: movel %fp@(8),%d0
    0xc00e0084 <+20>: moveal %fp@(16),%a2
    0xc00e0088 <+24>: moveal %a5@(108),%a3
    0xc00e008c <+28>: movel %a3@,%fp@(-4)
    0xc00e0090 <+32>: tstl %a2
    0xc00e0092 <+34>: beqw 0xc00e0152 <__wait3+226>
    0xc00e0096 <+38>: pea %fp@(-92)
    0xc00e009a <+42>: movel %fp@(12),%sp@-
    0xc00e009e <+46>: movel %d0,%sp@-
    0xc00e00a0 <+48>: pea 0xffffffff
    0xc00e00a4 <+52>: bsrl 0xc00e0174 <__GI___wait4_time64>
    0xc00e00aa <+58>: lea %sp@(16),%sp
    0xc00e00ae <+62>: tstl %d0
    0xc00e00b0 <+64>: bgts 0xc00e00c8 <__wait3+88>
    0xc00e00b2 <+66>: moveal %fp@(-4),%a0
    0xc00e00b6 <+70>: movel %a3@,%d1
    0xc00e00b8 <+72>: cmpl %a0,%d1
    0xc00e00ba <+74>: bnew 0xc00e016c <__wait3+252>
    0xc00e00be <+78>: moveml %fp@(-108),%a2-%a3/%a5
    0xc00e00c4 <+84>: unlk %fp
    0xc00e00c6 <+86>: rts
    0xc00e00c8 <+88>: pea 0x44
    0xc00e00cc <+92>: clrl %sp@-
    0xc00e00ce <+94>: pea %a2@(4)
    0xc00e00d2 <+98>: movel %d0,%fp@(-96)
    0xc00e00d6 <+102>: bsrl 0xc00b8850 <__GI_memset>
    0xc00e00dc <+108>: movel %fp@(-88),%a2@
    0xc00e00e0 <+112>: movel %fp@(-80),%a2@(4)
    0xc00e00e6 <+118>: movel %fp@(-72),%a2@(8)
    0xc00e00ec <+124>: movel %fp@(-64),%a2@(12)
    0xc00e00f2 <+130>: movel %fp@(-60),%a2@(16)
    0xc00e00f8 <+136>: movel %fp@(-56),%a2@(20)
    0xc00e00fe <+142>: movel %fp@(-52),%a2@(24)
    0xc00e0104 <+148>: movel %fp@(-48),%a2@(28)
    0xc00e010a <+154>: movel %fp@(-44),%a2@(32)
    0xc00e0110 <+160>: movel %fp@(-40),%a2@(36)
    0xc00e0116 <+166>: movel %fp@(-36),%a2@(40)
    0xc00e011c <+172>: movel %fp@(-32),%a2@(44)
    0xc00e0122 <+178>: movel %fp@(-28),%a2@(48)
    0xc00e0128 <+184>: movel %fp@(-24),%a2@(52)
    0xc00e012e <+190>: movel %fp@(-20),%a2@(56)
    0xc00e0134 <+196>: movel %fp@(-16),%a2@(60)
    0xc00e013a <+202>: movel %fp@(-12),%a2@(64)
    0xc00e0140 <+208>: movel %fp@(-8),%a2@(68)
    0xc00e0146 <+214>: lea %sp@(12),%sp
    0xc00e014a <+218>: movel %fp@(-96),%d0
    0xc00e014e <+222>: braw 0xc00e00b2 <__wait3+66>
    0xc00e0152 <+226>: clrl %sp@-
    0xc00e0154 <+228>: movel %fp@(12),%sp@-
    0xc00e0158 <+232>: movel %d0,%sp@-
    0xc00e015a <+234>: pea 0xffffffff
    0xc00e015e <+238>: bsrl 0xc00e0174 <__GI___wait4_time64>
    0xc00e0164 <+244>: lea %sp@(16),%sp
    0xc00e0168 <+248>: braw 0xc00e00b2 <__wait3+66>
    0xc00e016c <+252>: bsrl 0xc012a38c <__stack_chk_fail>
    End of assembler dump.
    (gdb) disass __GI___wait4_time64
    Dump of assembler code for function __GI___wait4_time64:
    0xc00e0174 <+0>: lea %sp@(-80),%sp
    0xc00e0178 <+4>: moveml %d2-%d5/%a2-%a3/%a5,%sp@-
    0xc00e017c <+8>: lea %pc@(0xc0198000),%a5
    0xc00e0184 <+16>: movel %sp@(116),%d2
    0xc00e0188 <+20>: moveal %sp@(124),%a2
    0xc00e018c <+24>: moveal %a5@(108),%a3
    0xc00e0190 <+28>: movel %a3@,%sp@(104)
    0xc00e0194 <+32>: bsrl 0xc0052e2c <__m68k_read_tp@plt>
    0xc00e019a <+38>: movel %a0@(-29920),%d4
    0xc00e019e <+42>: bnew 0xc00e026c <__GI___wait4_time64+248>
    0xc00e01a2 <+46>: tstl %a2
    0xc00e01a4 <+48>: beqs 0xc00e01aa <__GI___wait4_time64+54>
    0xc00e01a6 <+50>: moveq #32,%d4
    0xc00e01a8 <+52>: addl %sp,%d4
    0xc00e01aa <+54>: movel %sp@(120),%d3
    0xc00e01ae <+58>: movel %sp@(112),%d1
    0xc00e01b2 <+62>: moveq #114,%d0
    0xc00e01b4 <+64>: trap #0
    0xc00e01b6 <+66>: cmpil #-4096,%d0
    0xc00e01bc <+72>: bhiw 0xc00e02a6 <__GI___wait4_time64+306>
    0xc00e01c0 <+76>: tstl %d0
    0xc00e01c2 <+78>: blew 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e01c6 <+82>: tstl %a2
    0xc00e01c8 <+84>: beqw 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e01cc <+88>: movel %sp@(32),%a2@(4)
    0xc00e01d2 <+94>: smi %d1
    0xc00e01d4 <+96>: extbl %d1
    0xc00e01d6 <+98>: movel %d1,%a2@
    0xc00e01d8 <+100>: movel %sp@(36),%a2@(12)
    0xc00e01de <+106>: smi %d1
    0xc00e01e0 <+108>: extbl %d1
    0xc00e01e2 <+110>: movel %d1,%a2@(8)
    0xc00e01e6 <+114>: movel %sp@(40),%a2@(20)
    0xc00e01ec <+120>: smi %d1
    0xc00e01ee <+122>: extbl %d1
    0xc00e01f0 <+124>: movel %d1,%a2@(16)
    0xc00e01f4 <+128>: movel %sp@(44),%a2@(28)
    0xc00e01fa <+134>: smi %d1
    0xc00e01fc <+136>: extbl %d1
    0xc00e01fe <+138>: movel %d1,%a2@(24)
    0xc00e0202 <+142>: movel %sp@(48),%a2@(32)
    0xc00e0208 <+148>: movel %sp@(52),%a2@(36)
    0xc00e020e <+154>: movel %sp@(56),%a2@(40)
    0xc00e0214 <+160>: movel %sp@(60),%a2@(44)
    0xc00e021a <+166>: movel %sp@(64),%a2@(48)
    0xc00e0220 <+172>: movel %sp@(68),%a2@(52)
    0xc00e0226 <+178>: movel %sp@(72),%a2@(56)
    0xc00e022c <+184>: movel %sp@(76),%a2@(60)
    0xc00e0232 <+190>: movel %sp@(80),%a2@(64)
    0xc00e0238 <+196>: movel %sp@(84),%a2@(68)
    0xc00e023e <+202>: movel %sp@(88),%a2@(72)
    0xc00e0244 <+208>: movel %sp@(92),%a2@(76)
    0xc00e024a <+214>: movel %sp@(96),%a2@(80)
    0xc00e0250 <+220>: movel %sp@(100),%a2@(84)
    0xc00e0256 <+226>: moveal %sp@(104),%a0
    0xc00e025a <+230>: movel %a3@,%d1
    0xc00e025c <+232>: cmpl %a0,%d1
    0xc00e025e <+234>: bnew 0xc00e02f2 <__GI___wait4_time64+382>
    0xc00e0262 <+238>: moveml %sp@+,%d2-%d5/%a2-%a3/%a5
    0xc00e0266 <+242>: lea %sp@(80),%sp
    0xc00e026a <+246>: rts
    0xc00e026c <+248>: bsrl 0xc00a1b88 <__GI___pthread_enable_asynccancel>
    0xc00e0272 <+254>: movel %d0,%d5
    0xc00e0274 <+256>: tstl %a2
    0xc00e0276 <+258>: beqs 0xc00e02c4 <__GI___wait4_time64+336>
    0xc00e0278 <+260>: moveq #32,%d4
    0xc00e027a <+262>: addl %sp,%d4
    0xc00e027c <+264>: movel %sp@(120),%d3
    0xc00e0280 <+268>: movel %sp@(112),%d1
    0xc00e0284 <+272>: moveq #114,%d0
    0xc00e0286 <+274>: trap #0
    0xc00e0288 <+276>: cmpil #-4096,%d0
    0xc00e028e <+282>: bhis 0xc00e02c8 <__GI___wait4_time64+340>
    0xc00e0290 <+284>: movel %d5,%sp@-
    0xc00e0292 <+286>: movel %d0,%sp@(32)
    0xc00e0296 <+290>: bsrl 0xc00a1bea <__GI___pthread_disable_asynccancel>
    0xc00e029c <+296>: addql #4,%sp
    0xc00e029e <+298>: movel %sp@(28),%d0
    0xc00e02a2 <+302>: braw 0xc00e01c0 <__GI___wait4_time64+76>
    0xc00e02a6 <+306>: movel %d0,%sp@(28)
    0xc00e02aa <+310>: bsrl 0xc0052e2c <__m68k_read_tp@plt>
    0xc00e02b0 <+316>: addal %a5@(2cf8),%a0
    0xc00e02b8 <+324>: movel %sp@(28),%d0
    0xc00e02bc <+328>: negl %d0
    0xc00e02be <+330>: movel %d0,%a0@
    0xc00e02c0 <+332>: moveq #-1,%d0
    0xc00e02c2 <+334>: bras 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e02c4 <+336>: clrl %d4
    0xc00e02c6 <+338>: bras 0xc00e027c <__GI___wait4_time64+264>
    0xc00e02c8 <+340>: movel %d0,%sp@(28)
    0xc00e02cc <+344>: bsrl 0xc0052e2c <__m68k_read_tp@plt>
    0xc00e02d2 <+350>: addal %a5@(2cf8),%a0
    0xc00e02da <+358>: movel %sp@(28),%d0
    0xc00e02de <+362>: negl %d0
    0xc00e02e0 <+364>: movel %d0,%a0@
    0xc00e02e2 <+366>: movel %d5,%sp@-
    0xc00e02e4 <+368>: bsrl 0xc00a1bea <__GI___pthread_disable_asynccancel>
    0xc00e02ea <+374>: addql #4,%sp
    0xc00e02ec <+376>: moveq #-1,%d0
    0xc00e02ee <+378>: braw 0xc00e0256 <__GI___wait4_time64+226>
    0xc00e02f2 <+382>: bsrl 0xc012a38c <__stack_chk_fail>
    End of assembler dump.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Sun Apr 2 11:50:01 2023
    On Sun, 2 Apr 2023, Michael Schmitz wrote:

    Saved registers are restored from the stack before return from __GI___wait4_time64 but we don't know which of the two wait4 call sites
    was used, do we?

    What registers does __m68k_read_tp@plt clobber?


    But that won't matter to the caller, __wait3, right?

    I did check that %a3 was saved on entry, before any wait4 syscall or __m68k_read_tp call etc. I also looked at the rts and %a3 did get restored there. Is it worth the effort to trace every branch, in case there's some
    way to reach an rts without having first restored the saved registers?


    Maybe an interaction between (multiple?) signals and syscall return...

    When running dash from gdb in QEMU, there's only one signal (SIGCHLD) and
    it gets handled before __wait3() returns. (Of course, the "stack smashing detected" failure never shows up in QEMU.)

    depends on how long we sleep in wait4, and whether a signal happens just during that time.


    I agree, there seems to be a race condition there. (And dash's waitproc()
    seems to take pains to reap the child and handle the signal in any order.)
    I wouldn't be surprised if this race somehow makes the failure rare.

    I don't want to recompile any userland binaries at this stage, so it would
    be nice if we could modify the kernel to keep track of exactly how that
    race gets won and lost. Or perhaps there's an easy way to rig the outcome
    one way or the other.

    %a3 is the first register saved to the switch stack BTW.

    That kernel does contain Al Viro's patch that corrected our switch stack handling in the signal return path? I wonder whether there's a potential
    race lurking in there?


    I'm not sure which patch you're referring to, but I think Al's signal
    handling work appeared in v5.15-rc4. I have reproduced the "stack smashing detected" failure with v5.14.0 and with recent mainline (62bad54b26db from March 30th).

    And I just notice that we had had trouble with a copy_to_user in setup_frame() earlier (reason for my buserr handler patch). I wonder
    whether something's gone wrong there. Do you get a segfault instead of
    the abort signal if you drop my patch?


    Are you referring to e36a82bebbf7? I doubt that it's related. I believe
    that copy_to_user is not involved here for the reason already given i.e. wait3(status, flags, NULL) means wait4 gets a NULL pointer for the struct rusage * parameter. Also, Stan first reported this failure in December
    with v6.0.9.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Andreas Schwab on Sun Apr 2 13:10:01 2023
    On Sat, 1 Apr 2023, Andreas Schwab wrote:

    On Apr 01 2023, Finn Thain wrote:

    So, in summary, the canary validation failed in this case not because
    the canary got clobbered but because %a3 got clobbered, somewhere
    between __wait3+24 and __wait3+70 (below).

    The call to __GI___wait4_time64 causes %a3 to be saved to and restored
    from the stack, so stack corruption seems to be a strong possibility
    to explain the change in %a3.

    But if that's what happened, I'd expect __GI___wait4_time64 to report
    stack smashing, not __wait3...

    The stask smashing probably didn't fire in __wait4_time64, because it
    hit the saved register area, not the canary (which reside on the
    opposite ends of the stack frame).


    OK.

    This is odd:

    https://sources.debian.org/src/dash/0.5.12-2/src/jobs.c/?hl=1165#L1165

    1176 do {
    1177 gotsigchld = 0;
    1178 do
    1179 err = wait3(status, flags, NULL);
    1180 while (err < 0 && errno == EINTR);
    1181
    1182 if (err || (err = -!block))
    1183 break;
    1184
    1185 sigblockall(&oldmask);
    1186
    1187 while (!gotsigchld && !pending_sig)
    1188 sigsuspend(&oldmask);
    1189
    1190 sigclearmask();
    1191 } while (gotsigchld);
    1192
    1193 return err;

    Execution of dash under gdb doesn't seem to agree with the source code
    above.

    If wait3() returns the child pid then the break should execute. And it
    does return the pid (4107) but the while loop was not terminated. Hence
    wait3() was called again and the same breakpoint was hit again. Also, the
    while loop should have ended after the first iteration because gotsigchild should have been set by the signal handler which executed before wait3()
    even returned...

    ...
    (gdb) c
    Continuing.
    #
    #
    # x=$(:)
    [Detaching after fork from child process 4107]

    Program received signal SIGCHLD, Child status changed.
    0xc00e81b6 in __GI___wait4_time64 (pid=-1, stat_loc=0xeffff87a, options=2,
    usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:35
    35 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
    (gdb) c
    Continuing.

    Breakpoint 3, waitproc (status=0xeffff86a, block=1) at jobs.c:1180
    1180 jobs.c: No such file or directory.
    (gdb) info locals
    oldmask = {__val = {1101825, 3844132865, 2072969216, 192511, 4190371840,
    4509697, 3836788738, 1049415681, 3837317121, 3094671359, 4184080384,
    536870943, 717475840, 3485913089, 3836792833, 2072969216, 184321,
    3844141055, 4190425089, 4127248385, 3094659084, 597610497, 4137734145,
    3844079616, 131072, 269156352, 184320, 3878473729, 3844132865, 3094663168,
    3549089793, 3844132865}}
    flags = 2
    err = 4107
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) print errno
    $6 = 2
    (gdb) c
    Continuing.

    Breakpoint 3, waitproc (status=0xeffff86a, block=0) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1101825, 3844132865, 2072969216, 192511, 4190371840,
    4509697, 3836788738, 1049415681, 3837317121, 3094671359, 4184080384,
    536870943, 717475840, 3485913089, 3836792833, 2072969216, 184321,
    3844141055, 4190425089, 4127248385, 3094659084, 597610497, 4137734145,
    3844079616, 131072, 269156352, 184320, 3878473729, 3844132865, 3094663168,
    3549089793, 3844132865}}
    flags = 3
    err = -1
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) print errno
    $7 = 10
    (gdb)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Finn Thain on Mon Apr 3 00:10:01 2023
    Hi Finn,

    On 2/04/23 22:46, Finn Thain wrote:
    On Sat, 1 Apr 2023, Andreas Schwab wrote:

    On Apr 01 2023, Finn Thain wrote:

    So, in summary, the canary validation failed in this case not because
    the canary got clobbered but because %a3 got clobbered, somewhere
    between __wait3+24 and __wait3+70 (below).

    The call to __GI___wait4_time64 causes %a3 to be saved to and restored
    from the stack, so stack corruption seems to be a strong possibility
    to explain the change in %a3.

    But if that's what happened, I'd expect __GI___wait4_time64 to report
    stack smashing, not __wait3...
    The stask smashing probably didn't fire in __wait4_time64, because it
    hit the saved register area, not the canary (which reside on the
    opposite ends of the stack frame).

    OK.

    This is odd:

    https://sources.debian.org/src/dash/0.5.12-2/src/jobs.c/?hl=1165#L1165

    1176 do {
    1177 gotsigchld = 0;
    1178 do
    1179 err = wait3(status, flags, NULL);
    1180 while (err < 0 && errno == EINTR);
    1181
    1182 if (err || (err = -!block))
    1183 break;
    1184
    1185 sigblockall(&oldmask);
    1186
    1187 while (!gotsigchld && !pending_sig)
    1188 sigsuspend(&oldmask);
    1189
    1190 sigclearmask();
    1191 } while (gotsigchld);
    1192
    1193 return err;

    Execution of dash under gdb doesn't seem to agree with the source code
    above.

    If wait3() returns the child pid then the break should execute. And it
    does return the pid (4107) but the while loop was not terminated. Hence wait3() was called again and the same breakpoint was hit again. Also, the

    I wonder whether line 1182 got miscompiled by gcc. As err == 4107 it's >
    0 and the break clearly ought to have been taken, and the second
    condition (which changes err) does not need to be examined.  Do the same ordering constraints apply to '||' as to '&&' ?

    What does the disassembly of this section look like?

    while loop should have ended after the first iteration because gotsigchild should have been set by the signal handler which executed before wait3()
    even returned...

    Setting gotsigchild > 0 would cause the while loop to continue, no?

    The second wait3 call then returns an error (errno ECHILD), leaves
    gotsigchild clear and exits the loop (returning from waitproc with
    error, which it shouldn't have).


    ...
    (gdb) c
    Continuing.
    #
    #
    # x=$(:)
    [Detaching after fork from child process 4107]

    Program received signal SIGCHLD, Child status changed.
    0xc00e81b6 in __GI___wait4_time64 (pid=-1, stat_loc=0xeffff87a, options=2,
    usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:35
    35 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
    (gdb) c
    Continuing.

    Breakpoint 3, waitproc (status=0xeffff86a, block=1) at jobs.c:1180
    1180 jobs.c: No such file or directory.
    (gdb) info locals
    oldmask = {__val = {1101825, 3844132865, 2072969216, 192511, 4190371840,
    4509697, 3836788738, 1049415681, 3837317121, 3094671359, 4184080384,
    536870943, 717475840, 3485913089, 3836792833, 2072969216, 184321,
    3844141055, 4190425089, 4127248385, 3094659084, 597610497, 4137734145,
    3844079616, 131072, 269156352, 184320, 3878473729, 3844132865, 3094663168,
    3549089793, 3844132865}}
    flags = 2
    err = 4107
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) print errno
    $6 = 2

    Is that ENOENT the returned status from the child process?

    Cheers,

        Michael

    (gdb) c
    Continuing.

    Breakpoint 3, waitproc (status=0xeffff86a, block=0) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1101825, 3844132865, 2072969216, 192511, 4190371840,
    4509697, 3836788738, 1049415681, 3837317121, 3094671359, 4184080384,
    536870943, 717475840, 3485913089, 3836792833, 2072969216, 184321,
    3844141055, 4190425089, 4127248385, 3094659084, 597610497, 4137734145,
    3844079616, 131072, 269156352, 184320, 3878473729, 3844132865, 3094663168,
    3549089793, 3844132865}}
    flags = 3
    err = -1
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) print errno
    $7 = 10
    (gdb)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Mon Apr 3 10:30:01 2023
    Hi Finn,

    Am 02.04.2023 um 21:31 schrieb Finn Thain:
    On Sun, 2 Apr 2023, Michael Schmitz wrote:

    Saved registers are restored from the stack before return from
    __GI___wait4_time64 but we don't know which of the two wait4 call sites
    was used, do we?

    What registers does __m68k_read_tp@plt clobber?


    But that won't matter to the caller, __wait3, right?

    Not if they are properly restored ... best not go there.

    I did check that %a3 was saved on entry, before any wait4 syscall or __m68k_read_tp call etc. I also looked at the rts and %a3 did get restored there. Is it worth the effort to trace every branch, in case there's some
    way to reach an rts without having first restored the saved registers?

    No, I dont think that's possible - from inspection, I now see __GI___wait4_time64 does not allow that, and I think the same is true
    for wait3 (haven't spent quite long enough on that).



    Maybe an interaction between (multiple?) signals and syscall return...

    When running dash from gdb in QEMU, there's only one signal (SIGCHLD) and
    it gets handled before __wait3() returns. (Of course, the "stack smashing detected" failure never shows up in QEMU.)

    Might be a clue that we need multiple signals to force the stack
    smashing error. And we might not get that in QEMU, due to the faster
    execution in emulating on a modern processor.

    Thinking a bit more about interactions between signal delivery and
    syscall return, it turns out that we don't check for pending signals
    when returning from a syscall. That's OK on SMP systems, because we
    don't have another process running while we execute the syscall (and we
    _do_ run signal handling when scheduling, i.e. when wait4 sleeps or is
    woken up)?

    Seems we can forget about that interaction then.


    depends on how long we sleep in wait4, and whether a signal happens just
    during that time.


    I agree, there seems to be a race condition there. (And dash's waitproc() seems to take pains to reap the child and handle the signal in any order.)

    Yes, it makes sure the SIGCHLD is seen no matter in what order the
    signals are delivered ...

    I wouldn't be surprised if this race somehow makes the failure rare.

    I don't want to recompile any userland binaries at this stage, so it would
    be nice if we could modify the kernel to keep track of exactly how that
    race gets won and lost. Or perhaps there's an easy way to rig the outcome
    one way or the other.

    A race between syscall return due to child exit and signal delivery
    seems unlikely, but maybe there is a race between syscall return due to
    a timer firing and signal delivery. Are there any timers set to
    periodically interrupt wait3?


    %a3 is the first register saved to the switch stack BTW.

    That kernel does contain Al Viro's patch that corrected our switch stack
    handling in the signal return path? I wonder whether there's a potential
    race lurking in there?


    I'm not sure which patch you're referring to, but I think Al's signal handling work appeared in v5.15-rc4. I have reproduced the "stack smashing

    I have it in 5.15-rc2 in my tree but that's probably from my running
    tests on that patch series.

    detected" failure with v5.14.0 and with recent mainline (62bad54b26db from March 30th).

    OK, so it's not related (or the patch did not fix all the problems with multiple signals, but a) that's unlikely and b) signals during wait4
    should not matter, see above).

    So the fact that %a3 is involved here is probably just coincidence.


    And I just notice that we had had trouble with a copy_to_user in
    setup_frame() earlier (reason for my buserr handler patch). I wonder
    whether something's gone wrong there. Do you get a segfault instead of
    the abort signal if you drop my patch?


    Are you referring to e36a82bebbf7? I doubt that it's related. I believe
    that copy_to_user is not involved here for the reason already given i.e. wait3(status, flags, NULL) means wait4 gets a NULL pointer for the struct rusage * parameter. Also, Stan first reported this failure in December
    with v6.0.9.

    Can't be related then.

    Still no nearer to a solution - something smashes the stack near %sp,
    causes the %a3 register restore after __GI___wait4_time64 to return a
    wrong pointer to the stack canary, and triggers a stack smashing warning
    in this indirect way. But what??

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Tue Apr 4 02:30:01 2023
    On Mon, 3 Apr 2023, Michael Schmitz wrote:

    On 2/04/23 22:46, Finn Thain wrote:

    This is odd:

    https://sources.debian.org/src/dash/0.5.12-2/src/jobs.c/?hl=1165#L1165

    1176 do {
    1177 gotsigchld = 0;
    1178 do
    1179 err = wait3(status, flags, NULL);
    1180 while (err < 0 && errno == EINTR);
    1181
    1182 if (err || (err = -!block))
    1183 break;
    1184
    1185 sigblockall(&oldmask);
    1186
    1187 while (!gotsigchld && !pending_sig)
    1188 sigsuspend(&oldmask);
    1189
    1190 sigclearmask();
    1191 } while (gotsigchld);
    1192
    1193 return err;

    Execution of dash under gdb doesn't seem to agree with the source code above.

    If wait3() returns the child pid then the break should execute. And it
    does return the pid (4107) but the while loop was not terminated. Hence wait3() was called again and the same breakpoint was hit again. Also, the

    I wonder whether line 1182 got miscompiled by gcc. As err == 4107 it's >
    0 and the break clearly ought to have been taken, and the second
    condition (which changes err) does not need to be examined. Do the same ordering constraints apply to '||' as to '&&' ?


    AFAICT, the source code is valid. This article has some information: https://stackoverflow.com/questions/628526/is-short-circuiting-logical-operators-mandated-and-evaluation-order

    It looks like I messed up. waitproc() appears to have been invoked
    twice, which is why wait3 was invoked twice...

    GNU gdb (Debian 13.1-2) 13.1
    ...
    (gdb) set osabi GNU/Linux
    (gdb) file /bin/dash
    Reading symbols from /bin/dash...
    Reading symbols from /usr/lib/debug/.build-id/aa/4160f84f3eeee809c554cb9f3e1ef0686b8dcc.debug...
    (gdb) b waitproc
    Breakpoint 1 at 0xc346: file jobs.c, line 1168.
    (gdb) b jobs.c:1180
    Breakpoint 2 at 0xc390: file jobs.c, line 1180.
    (gdb) run
    Starting program: /usr/bin/dash
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".
    # x=$(:)
    [Detaching after fork from child process 570]

    Breakpoint 1, waitproc (status=0xeffff86a, block=1) at jobs.c:1168
    1168 jobs.c: No such file or directory.
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=1) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183,
    4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 2
    err = 570
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.

    Breakpoint 1, waitproc (status=0xeffff86a, block=0) at jobs.c:1168
    1168 in jobs.c
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=0) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183,
    4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 3
    err = -1
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.
    #

    What does the disassembly of this section look like?

    while loop should have ended after the first iteration because gotsigchild should have been set by the signal handler which executed before wait3() even returned...

    Setting gotsigchild > 0 would cause the while loop to continue, no?


    Right. Sorry for the noise.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Tue Apr 4 06:20:01 2023
    On Mon, 3 Apr 2023, Michael Schmitz wrote:

    Am 02.04.2023 um 21:31 schrieb Finn Thain:


    Maybe an interaction between (multiple?) signals and syscall
    return...

    When running dash from gdb in QEMU, there's only one signal (SIGCHLD)
    and it gets handled before __wait3() returns. (Of course, the "stack smashing detected" failure never shows up in QEMU.)

    Might be a clue that we need multiple signals to force the stack
    smashing error. And we might not get that in QEMU, due to the faster execution in emulating on a modern processor.


    Right -- being that the failure is intermittent on real hardware, it's not surprising that I can't make it show up in QEMU or Aranym.

    But no-one has reproduced it on Atari or Amiga hardware yet so I guess it
    could be a driver issue...

    I wonder whether anyone else is actually running recent Debian/SID with sysvinit and without a Debian initrd on a Motorola 68030 system.

    Thinking a bit more about interactions between signal delivery and
    syscall return, it turns out that we don't check for pending signals
    when returning from a syscall. That's OK on SMP systems, because we
    don't have another process running while we execute the syscall (and we
    _do_ run signal handling when scheduling, i.e. when wait4 sleeps or is
    woken up)?

    Seems we can forget about that interaction then.


    depends on how long we sleep in wait4, and whether a signal happens
    just during that time.


    I agree, there seems to be a race condition there. (And dash's
    waitproc() seems to take pains to reap the child and handle the signal
    in any order.)

    Yes, it makes sure the SIGCHLD is seen no matter in what order the
    signals are delivered ...

    I wouldn't be surprised if this race somehow makes the failure rare.

    I don't want to recompile any userland binaries at this stage, so it
    would be nice if we could modify the kernel to keep track of exactly
    how that race gets won and lost. Or perhaps there's an easy way to rig
    the outcome one way or the other.

    A race between syscall return due to child exit and signal delivery
    seems unlikely, but maybe there is a race between syscall return due to
    a timer firing and signal delivery. Are there any timers set to
    periodically interrupt wait3?


    I searched the source code and SIGALRM appears to be unused by dash. And 'timeout' is not a dash builtin. But that doesn't mean we don't get
    multiple signals. One crashy script looks like this:

    TMPFS_SIZE="$(tmpfs_size_vm "$TMPFS_SIZE")"
    RUN_SIZE="$(tmpfs_size_vm "$RUN_SIZE")"
    LOCK_SIZE="$(tmpfs_size_vm "$LOCK_SIZE")"
    SHM_SIZE="$(tmpfs_size_vm "$SHM_SIZE")"
    TMP_SIZE="$(tmpfs_size_vm "$TMP_SIZE")"

    Is it possible that the SIGCHLD from the first sub-shell got delayed?


    Still no nearer to a solution - something smashes the stack near %sp,
    causes the %a3 register restore after __GI___wait4_time64 to return a
    wrong pointer to the stack canary, and triggers a stack smashing warning
    in this indirect way. But what??


    I've no idea.

    The actual corruption might offer a clue here. I believe the saved %a3 was clobbered with the value 0xefee1068 which seems to be a pointer into some
    stack frame that would have come into existence shortly after __GI___wait4_time64 was called. That stack frame is gone by the time the
    core dump was made. Was it dash's signal handler, onsig(), or some libc subroutine called by __GI___wait4_time64(), or was it something that the
    kernel put there?

    Dash's SIGCHLD handler looks safe enough -- I don't see how it could
    corrupt the saved registers in the __GI___wait4_time64 stack frame (it's
    not like 1 was stored in the wrong place). https://sources.debian.org/src/dash/0.5.12-2/src/trap.c/?hl=285#L285

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to All on Tue Apr 4 13:20:02 2023
    On Tue, 4 Apr 2023, I wrote:


    The actual corruption might offer a clue here. I believe the saved %a3
    was clobbered with the value 0xefee1068 which seems to be a pointer into
    some stack frame that would have come into existence shortly after __GI___wait4_time64 was called.

    Wrong... it is a pointer to the location below the __wait3 stack frame.

    (gdb) info frame
    Stack level 8, frame at 0xefee10e0:
    pc = 0xc00e0172 in __wait3 (../sysdeps/unix/sysv/linux/wait3.c:41);
    saved pc = 0xd000c38e
    called by frame at 0xefee11dc, caller of frame at 0xefee106c
    source language c.
    Arglist at 0xefee10d8, args: stat_loc=<optimized out>,
    options=<optimized out>, usage=<optimized out>
    Locals at 0xefee10d8, Previous frame's sp is 0xefee10e0
    Saved registers:
    a2 at 0xefee106c, a3 at 0xefee1070, a5 at 0xefee1074, fp at 0xefee10d8,
    pc at 0xefee10dc

    That shows %a2 was saved at 0xefee106c, and the address of interest is the stack location immediately below that. But it has no particular
    significance: it holds a NULL pointer when the struct __rusage64 *usage argument to __wait4_time64() gets pushed there:

    0xc00e8152 <__wait3+226>: clrl %sp@-
    0xc00e8154 <__wait3+228>: movel %fp@(12),%sp@-
    0xc00e8158 <__wait3+232>: movel %d0,%sp@-
    0xc00e815a <__wait3+234>: pea 0xffffffff
    0xc00e815e <__wait3+238>: bsrl 0xc00e8174 <__GI___wait4_time64>

    But it's no longer a NULL pointer at the time of the crash, though it
    should be, since that stack frame is still active.

    (gdb) x/16z 0xefee1068
    0xefee1068: 0xc00e0172 0xd001e718 0xd001e498 0xd001b874 0xefee1078: 0x00170700 0x00170700 0x00170700 0x00005360 0xefee1088: 0x0000e920 0x00000006 0x00002000 0x00000002 0xefee1098: 0x00171f20 0x00171f20 0x00171f20 0x000000e0

    Beats me.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Finn Thain on Wed Apr 5 01:30:01 2023
    Hi Finn,

    On 4/04/23 12:13, Finn Thain wrote:
    It looks like I messed up. waitproc() appears to have been invoked
    twice, which is why wait3 was invoked twice...

    GNU gdb (Debian 13.1-2) 13.1
    ...
    (gdb) set osabi GNU/Linux
    (gdb) file /bin/dash
    Reading symbols from /bin/dash...
    Reading symbols from /usr/lib/debug/.build-id/aa/4160f84f3eeee809c554cb9f3e1ef0686b8dcc.debug...
    (gdb) b waitproc
    Breakpoint 1 at 0xc346: file jobs.c, line 1168.
    (gdb) b jobs.c:1180
    Breakpoint 2 at 0xc390: file jobs.c, line 1180.
    (gdb) run
    Starting program: /usr/bin/dash
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".
    # x=$(:)
    [Detaching after fork from child process 570]

    Breakpoint 1, waitproc (status=0xeffff86a, block=1) at jobs.c:1168
    1168 jobs.c: No such file or directory.
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=1) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183,
    4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 2
    err = 570
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.

    Breakpoint 1, waitproc (status=0xeffff86a, block=0) at jobs.c:1168
    1168 in jobs.c
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=0) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183,
    4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 3
    err = -1
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.
    #

    That means we may well see both signals delivered at the same time if
    the parent shell wasn't scheduled to run until the second subshell
    terminated (answering the question I was about to ask on your other
    mail, the one about the crashy script with multiple subshells).

    Now does waitproc() handle that case correctly? The first signal
    delivered results in err == child PID so the break is taken, causing
    exit from waitproc(). Does waitproc() get called repeatedly until an
    error is returned?

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Wed Apr 5 04:20:01 2023
    On Wed, 5 Apr 2023, Michael Schmitz wrote:

    On 4/04/23 12:13, Finn Thain wrote:
    It looks like I messed up. waitproc() appears to have been invoked
    twice, which is why wait3 was invoked twice...

    GNU gdb (Debian 13.1-2) 13.1
    ...
    (gdb) set osabi GNU/Linux
    (gdb) file /bin/dash
    Reading symbols from /bin/dash...
    Reading symbols from /usr/lib/debug/.build-id/aa/4160f84f3eeee809c554cb9f3e1ef0686b8dcc.debug... (gdb) b waitproc
    Breakpoint 1 at 0xc346: file jobs.c, line 1168.
    (gdb) b jobs.c:1180
    Breakpoint 2 at 0xc390: file jobs.c, line 1180.
    (gdb) run
    Starting program: /usr/bin/dash
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".
    # x=$(:)
    [Detaching after fork from child process 570]

    Breakpoint 1, waitproc (status=0xeffff86a, block=1) at jobs.c:1168
    1168 jobs.c: No such file or directory.
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=1) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183,
    4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 2
    err = 570
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.

    Breakpoint 1, waitproc (status=0xeffff86a, block=0) at jobs.c:1168
    1168 in jobs.c
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=0) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183,
    4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 3
    err = -1
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.
    #

    That means we may well see both signals delivered at the same time if the parent shell wasn't scheduled to run until the second subshell terminated (answering the question I was about to ask on your other mail, the one about the crashy script with multiple subshells).


    How is that possible? If the parent does not get scheduled, the second
    fork will not take place.

    Now does waitproc() handle that case correctly? The first signal
    delivered results in err == child PID so the break is taken, causing
    exit from waitproc().

    I don't follow. Can you rephrase that perhaps?

    For a single subshell, the SIGCHLD signal can be delivered before wait4 is called or after it returns. For example, $(sleep 5) seems to produce the
    latter whereas $(:) tends to produce the former.

    Does waitproc() get called repeatedly until an error is returned?


    It's complicated... https://sources.debian.org/src/dash/0.5.12-2/src/jobs.c/?hl=1122#L1122

    I don't care that much what dash does as long as it isn't corrupting it's
    own stack, which is a real possibility, and one which gdb's data watch
    point would normally resolve. And yet I have no way to tackle that.

    I've been running gdb under QEMU, where the failure is not reproducible. Running dash under gdb on real hardware is doable (RAM permitting). But
    the failure is intermittent even then -- it only happens during execution
    of certain init scripts, and I can't reproduce it by manually running
    those scripts.

    (Even if I could reproduce the failure under gdb, instrumenting execution
    in gdb can alter timing in undesirable ways...)

    So, again, the best avenue I can think of for such experiments to modify
    the kernel to either keep track of the times of the wait4 syscalls and
    signal delivery and/or push the timing one way or the other e.g. by
    delaying signal delivery, altering scheduler behaviour, etc. But I don't
    have code for that. I did try adding random delays around kernel_wait4()
    but it didn't have any effect...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Fri Apr 7 04:00:01 2023
    Hi Finn,

    Am 05.04.2023 um 14:00 schrieb Finn Thain:
    On Wed, 5 Apr 2023, Michael Schmitz wrote:

    On 4/04/23 12:13, Finn Thain wrote:
    It looks like I messed up. waitproc() appears to have been invoked
    twice, which is why wait3 was invoked twice...

    GNU gdb (Debian 13.1-2) 13.1
    ...
    (gdb) set osabi GNU/Linux
    (gdb) file /bin/dash
    Reading symbols from /bin/dash...
    Reading symbols from
    /usr/lib/debug/.build-id/aa/4160f84f3eeee809c554cb9f3e1ef0686b8dcc.debug... >>> (gdb) b waitproc
    Breakpoint 1 at 0xc346: file jobs.c, line 1168.
    (gdb) b jobs.c:1180
    Breakpoint 2 at 0xc390: file jobs.c, line 1180.
    (gdb) run
    Starting program: /usr/bin/dash
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1". >>> # x=$(:)
    [Detaching after fork from child process 570]

    Breakpoint 1, waitproc (status=0xeffff86a, block=1) at jobs.c:1168
    1168 jobs.c: No such file or directory.
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=1) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183, >>> 4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 2
    err = 570
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.

    Breakpoint 1, waitproc (status=0xeffff86a, block=0) at jobs.c:1168
    1168 in jobs.c
    (gdb) c
    Continuing.

    Breakpoint 2, waitproc (status=0xeffff86a, block=0) at jobs.c:1180
    1180 in jobs.c
    (gdb) info locals
    oldmask = {__val = {1997799424, 49154, 396623872, 184321, 3223896090, 53249,
    3836788738, 1049411610, 867225601, 3094609920, 0, 1048580, 2857693183, >>> 4184129547, 3435708442, 863764480, 184321, 3844141055, 4190425089,
    4127248385, 3094659084, 597610497, 4135112705, 3844079616, 131072,
    37355520, 184320, 3878473729, 3844132865, 3094663168, 3549089793,
    3844132865}}
    flags = 3
    err = -1
    oldmask = <optimized out>
    flags = <optimized out>
    err = <optimized out>
    (gdb) c
    Continuing.
    #

    That means we may well see both signals delivered at the same time if the
    parent shell wasn't scheduled to run until the second subshell terminated
    (answering the question I was about to ask on your other mail, the one about >> the crashy script with multiple subshells).


    How is that possible? If the parent does not get scheduled, the second
    fork will not take place.

    I assumed subshells could run asynchronously, and the parent shell
    continue until it hits a statement that needs the result of one of the subshells.

    What is the point of subshells, if not to allow this?


    Now does waitproc() handle that case correctly? The first signal
    delivered results in err == child PID so the break is taken, causing
    exit from waitproc().

    I don't follow. Can you rephrase that perhaps?

    The first subshell exiting causes the wait3() to return, and the return
    code is > 0 (child PID). The break statement executes and waitproc()
    returns that PID.

    I was wondering how multiple child processes exiting would be handled,
    but I now see that repeatedly calling waitproc() until all outstanding
    jobs have completed is the only way. dash implements this method, so it
    must expect multiple jobs / subshells to run concurrently.

    For a single subshell, the SIGCHLD signal can be delivered before wait4 is called or after it returns. For example, $(sleep 5) seems to produce the latter whereas $(:) tends to produce the former.

    I don't think wait4 can return with success before SIGCHLD has been
    delivered. Delivery of SIGCHLD is what makes the wait syscalls unblock
    as far as I understand.

    Does waitproc() get called repeatedly until an error is returned?


    It's complicated...

    Yep, but I'm now satisfied that is the only way ... sorry for the noise.

    https://sources.debian.org/src/dash/0.5.12-2/src/jobs.c/?hl=1122#L1122

    I don't care that much what dash does as long as it isn't corrupting it's
    own stack, which is a real possibility, and one which gdb's data watch
    point would normally resolve. And yet I have no way to tackle that.

    I've been running gdb under QEMU, where the failure is not reproducible. Running dash under gdb on real hardware is doable (RAM permitting). But
    the failure is intermittent even then -- it only happens during execution
    of certain init scripts, and I can't reproduce it by manually running
    those scripts.

    (Even if I could reproduce the failure under gdb, instrumenting execution
    in gdb can alter timing in undesirable ways...)

    So, again, the best avenue I can think of for such experiments to modify
    the kernel to either keep track of the times of the wait4 syscalls and

    The easiest way to do that is to log all wait and signal syscalls, as
    well as process exit. That might alter timing if these log messages go
    to the serial console though. Is that what you have in mind?

    signal delivery and/or push the timing one way or the other e.g. by
    delaying signal delivery, altering scheduler behaviour, etc. But I don't
    have code for that. I did try adding random delays around kernel_wait4()
    but it didn't have any effect...


    I wonder whether it's possible to delay process exit (and parent process signaling) by placing the exit syscall on a timer workqueue. But the
    same effect could be had by inserting a sleep before subshell exit ...

    And causing a half-dead task to schedule in order to delay signaling
    doesn't seem safe to me ...

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Fri Apr 7 06:20:01 2023
    On Fri, 7 Apr 2023, Michael Schmitz wrote:

    Am 05.04.2023 um 14:00 schrieb Finn Thain:
    On Wed, 5 Apr 2023, Michael Schmitz wrote:

    That means we may well see both signals delivered at the same time if
    the parent shell wasn't scheduled to run until the second subshell
    terminated (answering the question I was about to ask on your other
    mail, the one about the crashy script with multiple subshells).


    How is that possible? If the parent does not get scheduled, the second
    fork will not take place.

    I assumed subshells could run asynchronously, and the parent shell
    continue until it hits a statement that needs the result of one of the subshells.


    That would be nice but I don't think dash is so sophisticated as to keep
    track of data dependencies between the various expressions and commands in
    a shell script.

    What is the point of subshells, if not to allow this?


    $ x=$(exit 123)
    $ echo $?
    123

    $ set -e
    $ x=$(false)

    Anyway, my gut says that we're barking up the wrong tree. My recent tests
    show that the failure is not uniformly random. Either the script fails
    often or it fails not at all. It's as if there was some unknown variable
    that caused dash to corrupt its own stack.

    Running dash under gdb on real hardware is doable (RAM permitting).
    But the failure is intermittent even then -- it only happens during execution of certain init scripts, and I can't reproduce it by
    manually running those scripts.

    (Even if I could reproduce the failure under gdb, instrumenting
    execution in gdb can alter timing in undesirable ways...)

    So, again, the best avenue I can think of for such experiments to
    modify the kernel to either keep track of the times of the wait4
    syscalls and

    The easiest way to do that is to log all wait and signal syscalls, as
    well as process exit. That might alter timing if these log messages go
    to the serial console though. Is that what you have in mind?


    What I had in mind was collecting measurements in such way that would not impact timing, perhaps by storing them somewhere they could be retrieved
    from the process core dump.

    But that's probably not realistic and it's probably pointless anyway -- I
    don't expect to find an old bug in common code like kernel/exit.c, or in a
    hot path like those in arch/m68k/kernel/entry.S.

    More likely is that some kind of bug in dash causes it to corrupt its own
    stack when conditions are just right. I just need to figure out how to
    recreate those conditions. :-/

    When dash is feeling crashy, you can get results like this:

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~#

    But when it's not feeling crashy, you can't:

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument.

    The only way I have found to alter dash's inclination to crash is to
    reboot. (I said previously I was unable to reproduce this in a single user
    mode shell but it turned out to be more subtle.)

    signal delivery and/or push the timing one way or the other e.g. by delaying signal delivery, altering scheduler behaviour, etc. But I
    don't have code for that. I did try adding random delays around kernel_wait4() but it didn't have any effect...


    I wonder whether it's possible to delay process exit (and parent process signaling) by placing the exit syscall on a timer workqueue. But the
    same effect could be had by inserting a sleep before subshell exit ...

    And causing a half-dead task to schedule in order to delay signaling
    doesn't seem safe to me ...


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geert Uytterhoeven@21:1/5 to [email protected] on Fri Apr 7 14:10:01 2023
    Hi Michael,

    On Fri, Apr 7, 2023 at 3:58 AM Michael Schmitz <[email protected]> wrote:
    The easiest way to do that is to log all wait and signal syscalls, as
    well as process exit. That might alter timing if these log messages go
    to the serial console though. Is that what you have in mind?

    Store to RAM, retrieve through a new /proc file?

    Gr{oetje,eeting}s,

    Geert

    --
    Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

    In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that.
    -- Linus Torvalds

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geert Uytterhoeven@21:1/5 to [email protected] on Fri Apr 7 14:20:02 2023
    Hi Finn,

    On Fri, Apr 7, 2023 at 6:08 AM Finn Thain <[email protected]> wrote:
    When dash is feeling crashy, you can get results like this:

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~#

    But when it's not feeling crashy, you can't:

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument.

    The only way I have found to alter dash's inclination to crash is to
    reboot. (I said previously I was unable to reproduce this in a single user mode shell but it turned out to be more subtle.)

    That sounds like memory corruption somewhere else, e.g.
    in the buffer cache...

    Can you reproduce with CONFIG_DEBUG_SLAB=y?

    Gr{oetje,eeting}s,

    Geert

    --
    Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

    In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that.
    -- Linus Torvalds

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Geert Uytterhoeven on Sat Apr 8 07:50:01 2023
    On Fri, 7 Apr 2023, Geert Uytterhoeven wrote:


    The only way I have found to alter dash's inclination to crash is to reboot. (I said previously I was unable to reproduce this in a single
    user mode shell but it turned out to be more subtle.)

    That sounds like memory corruption somewhere else, e.g. in the buffer cache...


    If so, once the corruption showed up, you would expect the same crash next time...

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# echo 3 > /proc/sys/vm/drop_caches
    [ 937.250000] bash (717): drop_caches: 3
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    *** stack smashing detected ***: terminated
    Aborted (core dumped)

    I'd say it's probably not buffer cache corruption causing this because we
    can see two subshells fail, then just one.

    For that build I enabled SLUB_DEBUG but forgot to enable SLUB_DEBUG_ON --

    CONFIG_DEBUG_PAGEALLOC=y
    CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
    CONFIG_SLUB_DEBUG=y
    # CONFIG_SLUB_DEBUG_ON is not set
    CONFIG_PAGE_POISONING=y
    CONFIG_SCHED_STACK_END_CHECK=y

    Below you can see the console log for a build with CONFIG_SLUB_DEBUG_ON=y.

    BTW, these builds have a hacked RNG which produces only zeros, to try to
    get more consistent behaviour. It seems to have made no difference. None
    of the kernel changes I've made seem to make much difference, besides perturbing timing slightly. I tend to think this is a dash bug that's
    confined to slow machines only because it's timing sensitive.

    ABCFGHIJK
    [ 0.000000] Linux version 6.3.0-rc5-mac-00001-gfaf1ca712d79 (fthain@nippy) (m68k-unknown-linux-musl-gcc (Gentoo 12.2.1_p20221224 p7) 12.2.1 20221224, GNU ld (Gentoo 2.39 p5) 2.39.0) #4 Sat Apr 8 14:25:16 AEST 2023
    [ 0.000000] Saving 402 bytes of bootinfo
    [ 0.000000] printk: debug: ignoring loglevel setting.
    [ 0.000000] printk: bootconsole [debug0] enabled
    [ 0.000000] Detected Macintosh model: 27
    [ 0.000000] Penguin bootinfo data:
    [ 0.000000] Video: addr 0x60b00000 row 0x50 depth 1 dimensions 640 x 480
    [ 0.000000] Videological 0xf0300000 phys. 0x60b00000, SCC at 0x50f04000
    [ 0.000000] Boottime: 0x73e GMTBias: 0x0
    [ 0.000000] Machine ID: 27 CPUid: 0x1 memory size: 0x24
    [ 0.000000] Apple Macintosh LC III
    [ 0.000000] Zone ranges:
    [ 0.000000] DMA [mem 0x0000000000000000-0x00000023ffffffff]
    [ 0.000000] Normal empty
    [ 0.000000] Movable zone start for each node
    [ 0.000000] Early memory node ranges
    [ 0.000000] node 0: [mem 0x0000000000000000-0x00000000023fffff]
    [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000023fffff] [ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
    [ 0.000000] pcpu-alloc: [0] 0
    [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 9135
    [ 0.000000] Kernel command line: console=ttyS0 ignore_loglevel ro earlyprintk fbcon=font:ProFont6x11 coherent_pool=4k root=/dev/sda6 single
    [ 0.000000] Unknown kernel command line parameters "single", will be passed to user space.
    [ 0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
    [ 0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
    [ 0.000000] Sorting __ex_table...
    [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
    [ 0.000000] Memory: 31008K/36864K available (4193K kernel code, 283K rwdata, 808K rodata, 132K init, 109K bss, 5856K reserved, 0K cma-reserved)
    [ 0.000000] **********************************************************
    [ 0.000000] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
    [ 0.000000] ** **
    [ 0.000000] ** This system shows unhashed kernel memory addresses **
    [ 0.000000] ** via the console, logs, and other interfaces. This **
    [ 0.000000] ** might reduce the security of your system. **
    [ 0.000000] ** **
    [ 0.000000] ** If you see this message and you are not debugging **
    [ 0.000000] ** the kernel, report this immediately to your system **
    [ 0.000000] ** administrator! **
    [ 0.000000] ** **
    [ 0.000000] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
    [ 0.000000] **********************************************************
    [ 0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
    [ 0.000000] NR_IRQS: 72
    [ 0.000000] clocksource: via1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 2439823894983 ns
    [ 0.050000] Console: colour dummy device 80x25
    [ 0.060000] printk: console [ttyS0] enabled
    [ 0.060000] printk: console [ttyS0] enabled
    [ 0.080000] printk: bootconsole [debug0] disabled
    [ 0.080000] printk: bootconsole [debug0] disabled
    [ 0.110000] Calibrating delay loop... 5.83 BogoMIPS (lpj=29184)
    [ 0.270000] pid_max: default: 32768 minimum: 301
    [ 0.320000] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
    [ 0.340000] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
    [ 0.730000] devtmpfs: initialized
    [ 1.060000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
    [ 1.070000] futex hash table entries: 256 (order: -1, 3072 bytes, linear)
    [ 1.280000] NET: Registered PF_NETLINK/PF_ROUTE protocol family
    [ 1.340000] DMA: preallocated 4 KiB GFP_KERNEL pool for atomic allocations
    [ 1.360000] DMA: preallocated 4 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
    [ 1.940000] NuBus: Scanning NuBus slots.
    [ 2.180000] SCSI subsystem initialized
    [ 2.250000] libata version 3.00 loaded.
    [ 2.510000] clocksource: Switched to clocksource via1
    [ 5.540000] NET: Registered PF_INET protocol family
    [ 5.590000] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear)
    [ 5.780000] tcp_listen_portaddr_hash hash table entries: 1024 (order: 0, 4096 bytes, linear)
    [ 5.790000] Table-perturb hash table entries: 1024 (order: 0, 4096 bytes, linear)
    [ 5.810000] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
    [ 5.820000] TCP bind hash table entries: 1024 (order: 1, 8192 bytes, linear) [ 5.830000] TCP: Hash tables configured (established 1024 bind 1024)
    [ 5.880000] UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
    [ 5.890000] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear) [ 5.930000] NET: Registered PF_UNIX/PF_LOCAL protocol family
    [ 6.090000] RPC: Registered named UNIX socket transport module.
    [ 6.110000] RPC: Registered udp transport module.
    [ 6.120000] RPC: Registered tcp transport module.
    [ 6.130000] RPC: Registered tcp NFSv4.1 backchannel transport module.
    [ 6.430000] workingset: timestamp_bits=30 max_order=13 bucket_order=0
    [ 6.910000] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
    [ 6.920000] io scheduler mq-deadline registered
    [ 6.930000] macfb: framebuffer at 0x60b00000, mapped to 0xd0000000, size 37k [ 6.940000] macfb: mode is 640x480x1, linelength=80
    [ 8.510000] Console: switching to mono frame buffer device 106x43
    [ 9.670000] fb0: Sonora frame buffer device
    [ 9.750000] scc.0: ttyS0 at MMIO 0x50f04002 (irq = 4, base_baud = 230400) is a Z85c30 ESCC - Serial port
    [ 9.820000] scc.1: ttyS1 at MMIO 0x50f04000 (irq = 4, base_baud = 230400) is a Z85c30 ESCC - Serial port
    [ 9.950000] Non-volatile memory driver v1.3
    [ 9.980000] SWIM floppy driver Version 0.2 (2008-10-30)
    [ 10.670000] brd: module loaded
    [ 10.690000] cuda_input: 01 00 03 7c 25 b7 d6
    [ 10.710000] Macintosh Cuda and Egret driver.
    [ 11.270000] scsi host0: Macintosh NCR5380 SCSI, irq 19, io_port 0x0, base 0x50f10000, can_queue 16, cmd_per_lun 2, sg_tablesize 1, this_id 7, flags { }
    [ 11.420000] scsi 0:0:0:0: Direct-Access codesrc SCSI2SD 4.2 PQ: 0 ANSI: 2
    [ 11.800000] adb device [2]: 2 0x2
    [ 11.830000] adb device [3]: 3 0x1
    [ 11.880000] ADB keyboard at 2 has handler 0x3
    [ 11.900000] Detected ADB keyboard, type ANSI.
    [ 11.960000] input: ADB keyboard as /devices/virtual/input/input0
    [ 12.180000] ADB mouse (standard) at 3 has handler 0x2
    [ 12.240000] input: ADB mouse as /devices/virtual/input/input1
    [ 13.720000] sd 0:0:0:0: [sda] 4194303 512-byte logical blocks: (2.15 GB/2.00 GiB)
    [ 13.750000] sd 0:0:0:0: Attached scsi generic sg0 type 0
    [ 13.800000] mac8390 slot.E: Memory length resource not found, probing
    [ 13.810000] mac8390 slot.E: Ethernet A-Series (type farallon)
    [ 13.820000] mac8390 slot.E: MAC 00:40:10:07:60:c8, IRQ 61, 32 KB shared memory at 0xfeed0000, 32-bit access.
    [ 14.150000] sd 0:0:0:0: [sda] Write Protect is off
    [ 14.160000] sd 0:0:0:0: [sda] Mode Sense: 8d 00 00 08
    [ 14.180000] sd 0:0:0:0: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
    [ 14.480000] sda: [mac] sda1 sda2 sda3 sda4 sda5 sda6 sda7
    [ 14.980000] sd 0:0:0:0: [sda] Attached SCSI disk
    [ 15.050000] aoe: AoE v85 initialised.
    [ 15.190000] mousedev: PS/2 mouse device common for all mice
    [ 15.320000] rtc-generic rtc-generic: registered as rtc0
    [ 15.410000] NET: Registered PF_PACKET protocol family
    [ 15.440000] printk: bootconsole [debug0] disabled
    [ 21.960000] EXT4-fs (sda6): mounting ext2 file system using the ext4 subsystem
    [ 22.620000] EXT4-fs (sda6): mounted filesystem d9738810-a43e-43d5-8eea-fd50d23f69b4 without journal. Quota mode: disabled.
    [ 22.650000] VFS: Mounted root (ext2 filesystem) readonly on device 8:6.
    [ 22.690000] Freeing unused kernel image (initmem) memory: 132K
    [ 22.700000] This architecture does not have kernel memory protection.
    [ 22.710000] Run /sbin/init as init process
    [ 22.720000] with arguments:
    [ 22.730000] /sbin/init
    [ 22.740000] single
    [ 22.750000] with environment:
    [ 22.760000] HOME=/
    [ 22.770000] TERM=linux
    INIT: version 3.06 booting
    INIT: No inittab.d directory found
    [ 40.210000] random: crng init done
    Setting hostname to 'debian'...done.
    findfs: unable to resolve 'UUID=d9738810-a43e-43d5-8eea-fd50d23f69b4'
    Files under mount point '/run' will be hidden. ... (warning).
    Starting hotplug events dispatcher: systemd-udevd.
    Synthesizing the initial hotplug events (subsystems)...done.
    Synthesizing the initial hotplug events (devices)...done.
    Waiting for /dev to be fully populated...Failed to wait for daemon to reply: Connection timed out
    done (timeout).
    Press Enter for maintenance
    (or press Control-D to continue):
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~#
    exit
    Activating swap:swapon: /dev/sda5: found signature [pagesize=4096, signature=swap]
    swapon: /dev/sda5: pagesize=4096, swapsize=268435456, devsize=268435456
    swapon /dev/sda5
    [ 1768.650000] Adding 262140k swap on /dev/sda5. Priority:-2 extents:1 across:262140k
    .
    [ 1778.640000] EXT4-fs (sda6): re-mounted d9738810-a43e-43d5-8eea-fd50d23f69b4. Quota mode: disabled.
    Will now check root file system:[/sbin/fsck.ext2 (1) -- /] fsck.ext2 -a -C0 /dev/sda6
    /dev/sda6: clean, 24004/425152 files, 234859/424872 blocks
    .
    [ 1797.940000] EXT4-fs (sda6): re-mounted d9738810-a43e-43d5-8eea-fd50d23f69b4. Quota mode: disabled.
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Will now check all file systems.
    Checking all file systems.
    UUID=d9738810-a43e-43d5-8eea-fd50d23f69b4 is mounted
    Done checking file systems.
    Log is being saved in /var/log/fsck/checkfs if that location is writable. Cleaning up temporary files...Cleaning /tmp...done.
    /tmp.
    Will now mount local filesystems:.
    Will now activate swapfile swap, if any:swapon: /dev/sda5: already active -- ignored
    done.
    Checking minimum space in /tmp...done.
    Cleaning up temporary files....
    Starting Setting kernel variables: sysctl.
    Initializing random number generator...done.
    Configuring network interfaces...done.
    Starting nftables: nonemnl.c:60: Unable to initialize Netlink socket: Protocol not supported
    failed!
    Cleaning up temporary files....
    Press Enter for maintenance
    (or press Control-D to continue):
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# reboot
    INIT: Switching to runlevel: 6
    INIT: No inittab.d directory found
    INIT: Sending processes configured via /etc/inittab the TERM signal
    INIT: Sending processes configured via /etc/inittab the KILL signal
    INIT: Pid 1006 [id ~~] seems to hang
    Stopping Flexible I/O Tester as service: fio is not running.
    Stopping nftables: nonemnl.c:60: Unable to initialize Netlink socket: Protocol not supported
    failed!
    Asking all remaining processes to terminate...done.
    All processes ended within 1 seconds...done.
    Stopping hotplug events dispatcher: systemd-udevd.
    Saving random seed...done.
    Stopping enhanced syslogd: rsyslogd already stopped.
    Saving the system clock to /dev/rtc0.
    Hardware Clock updated to Thu Jan 1 11:13:47 AEST 1970.
    Deconfiguring network interfaces...done.
    Will now deactivate swap:swapoff /dev/sda5
    .
    Mounting root filesystem read-only...[ 2590.610000] EXT4-fs (sda6): re-mounted d9738810-a43e-43d5-8eea-fd50d23f69b4. Quota mode: disabled.
    done.
    Will now restart.
    [ 2597.720000] reboot: Restarting system

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sun Apr 9 04:00:01 2023
    Hi Geert,

    Am 08.04.2023 um 00:06 schrieb Geert Uytterhoeven:
    Hi Michael,

    On Fri, Apr 7, 2023 at 3:58 AM Michael Schmitz <[email protected]> wrote:
    The easiest way to do that is to log all wait and signal syscalls, as
    well as process exit. That might alter timing if these log messages go
    to the serial console though. Is that what you have in mind?

    Store to RAM, retrieve through a new /proc file?

    Yes, that could be done, though I'd rather avoid duplicating a lot of
    the generic message formatting code (printk and friends).

    I'll have a look around ...

    Cheers,

    Michael


    Gr{oetje,eeting}s,

    Geert


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sun Apr 9 05:50:01 2023
    Hi Finn,

    Am 07.04.2023 um 16:03 schrieb Finn Thain:
    So, again, the best avenue I can think of for such experiments to
    modify the kernel to either keep track of the times of the wait4
    syscalls and

    The easiest way to do that is to log all wait and signal syscalls, as
    well as process exit. That might alter timing if these log messages go
    to the serial console though. Is that what you have in mind?


    What I had in mind was collecting measurements in such way that would not impact timing, perhaps by storing them somewhere they could be retrieved
    from the process core dump.

    Storing such information to the process core dump would need
    modifications to dash (allocate log buffer and pass the address to the
    kernel) in addition to the kernel. Taking Geert's suggestions to have
    the kernel allocate a log buffer (similar to the kernel message log) and
    read that out through procfs after the tests would be a lot easier.


    But that's probably not realistic and it's probably pointless anyway -- I don't expect to find an old bug in common code like kernel/exit.c, or in a hot path like those in arch/m68k/kernel/entry.S.

    More likely is that some kind of bug in dash causes it to corrupt its own stack when conditions are just right. I just need to figure out how to recreate those conditions. :-/

    When dash is feeling crashy, you can get results like this:

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~#

    But when it's not feeling crashy, you can't:

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument. root@debian:~# sh /etc/init.d/mountdevsubfs.sh
    Warning: mountdevsubfs should be called with the 'start' argument.

    The only way I have found to alter dash's inclination to crash is to
    reboot. (I said previously I was unable to reproduce this in a single user mode shell but it turned out to be more subtle.)

    I wonder what could change from one boot to another - can you have dash
    (and its subshells) dump /proc/self/maps and see whether there's any
    variation in that? But what we really need is the physical mappings. How
    can we find those?

    With the kernel RNG disabled, I would expect neither of these mappings
    to change between boots?

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sun Apr 9 07:00:01 2023
    Hi Finn,

    Am 09.04.2023 um 16:42 schrieb Finn Thain:
    On Sun, 9 Apr 2023, Michael Schmitz wrote:


    The only way I have found to alter dash's inclination to crash is to
    reboot. (I said previously I was unable to reproduce this in a single
    user mode shell but it turned out to be more subtle.)

    I wonder what could change from one boot to another - can you have dash
    (and its subshells) dump /proc/self/maps and see whether there's any
    variation in that? But what we really need is the physical mappings. How
    can we find those?

    With the kernel RNG disabled, I would expect neither of these mappings
    to change between boots?


    It looks like the stack area still changes across invocations:

    Yep, but running the same commands in the same order across different
    boots, does it still change?

    (I'm making a huge assumption here - that timing of the boot process and
    hence evolution of the kernel RNG is sufficiently deterministic. And
    this might apply only to the shells run from sysvinit, since that does
    require no keyboard input ...)

    Looks like cat < /proc/self/maps | grep stack would give us enough
    information without overwhelming the serial console?

    OTOH - if you can show the error is gone without stack address
    randomization, that would be a hint maybe?

    Cheers,

    Michael

    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    eff9f000-effc0000 rw-p 00000000 00:00 0 [stack]
    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effd8000-efff9000 rw-p 00000000 00:00 0 [stack]
    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effdf000-f0000000 rw-p 00000000 00:00 0 [stack]
    #

    That can be disabled easily though (see below). I'll have to modify some
    init scripts to find out what effect it has.

    # setarch -R sh
    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effdf000-f0000000 rw-p 00000000 00:00 0 [stack]
    # md5sum < /proc/self/maps
    baacbaf944fb01d3200d924da7f7a815 -
    # sh
    # md5sum < /proc/self/maps
    baacbaf944fb01d3200d924da7f7a815 -
    # sh
    # md5sum < /proc/self/maps
    baacbaf944fb01d3200d924da7f7a815 -


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to All on Sun Apr 9 06:20:01 2023
    On Tue, 4 Apr 2023, I wrote:

    On Tue, 4 Apr 2023, I wrote:


    The actual corruption might offer a clue here. I believe the saved %a3
    was clobbered with the value 0xefee1068 which seems to be a pointer into some stack frame that would have come into existence shortly after __GI___wait4_time64 was called.

    Wrong... it is a pointer to the location below the __wait3 stack frame.

    (gdb) info frame
    Stack level 8, frame at 0xefee10e0:
    pc = 0xc00e0172 in __wait3 (../sysdeps/unix/sysv/linux/wait3.c:41);
    saved pc = 0xd000c38e
    called by frame at 0xefee11dc, caller of frame at 0xefee106c
    source language c.
    Arglist at 0xefee10d8, args: stat_loc=<optimized out>,
    options=<optimized out>, usage=<optimized out>
    Locals at 0xefee10d8, Previous frame's sp is 0xefee10e0
    Saved registers:
    a2 at 0xefee106c, a3 at 0xefee1070, a5 at 0xefee1074, fp at 0xefee10d8,
    pc at 0xefee10dc

    That shows %a2 was saved at 0xefee106c, and the address of interest is the stack location immediately below that. But it has no particular
    significance: it holds a NULL pointer when the struct __rusage64 *usage argument to __wait4_time64() gets pushed there:

    0xc00e8152 <__wait3+226>: clrl %sp@-
    0xc00e8154 <__wait3+228>: movel %fp@(12),%sp@-
    0xc00e8158 <__wait3+232>: movel %d0,%sp@-
    0xc00e815a <__wait3+234>: pea 0xffffffff
    0xc00e815e <__wait3+238>: bsrl 0xc00e8174 <__GI___wait4_time64>

    But it's no longer a NULL pointer at the time of the crash, though it
    should be, since that stack frame is still active.

    (gdb) x/16z 0xefee1068
    0xefee1068: 0xc00e0172 0xd001e718 0xd001e498 0xd001b874 0xefee1078: 0x00170700 0x00170700 0x00170700 0x00005360 0xefee1088: 0x0000e920 0x00000006 0x00002000 0x00000002 0xefee1098: 0x00171f20 0x00171f20 0x00171f20 0x000000e0

    Beats me.


    At the time of the crash, the corrupted %a3 was a pointer to location in __wait3's stack. That location was a NULL pointer (the *usage parameter)
    when __GI___wait4_time64 was called but now points to 0xc00e0172, which is
    just after the __wait3 text and just before __GI___wait4_time64 text.

    (gdb) disass __wait3
    Dump of assembler code for function __wait3:
    ...
    0xc00e015e <+238>: bsrl 0xc00e0174 <__GI___wait4_time64>
    0xc00e0164 <+244>: lea %sp@(16),%sp
    0xc00e0168 <+248>: braw 0xc00e00b2 <__wait3+66>
    0xc00e016c <+252>: bsrl 0xc012a38c <__stack_chk_fail>
    End of assembler dump.
    (gdb) disass __GI___wait4_time64
    Dump of assembler code for function __GI___wait4_time64:
    0xc00e0174 <+0>: lea %sp@(-80),%sp
    0xc00e0178 <+4>: moveml %d2-%d5/%a2-%a3/%a5,%sp@-
    0xc00e017c <+8>: lea %pc@(0xc0198000),%a5
    0xc00e0184 <+16>: movel %sp@(116),%d2
    ...

    But I realize now that this stack location gets overwritten with the
    return address for bsrl __stack_chk_fail, so there's nothing wrong there.

    Perhaps its just a coincidence that the saved %a3, once corrupted, ended
    up pointing to the *usage parameter... I don't know what to make of that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Sun Apr 9 07:00:01 2023
    On Sun, 9 Apr 2023, Michael Schmitz wrote:


    The only way I have found to alter dash's inclination to crash is to reboot. (I said previously I was unable to reproduce this in a single
    user mode shell but it turned out to be more subtle.)

    I wonder what could change from one boot to another - can you have dash
    (and its subshells) dump /proc/self/maps and see whether there's any variation in that? But what we really need is the physical mappings. How
    can we find those?

    With the kernel RNG disabled, I would expect neither of these mappings
    to change between boots?


    It looks like the stack area still changes across invocations:

    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    eff9f000-effc0000 rw-p 00000000 00:00 0 [stack]
    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effd8000-efff9000 rw-p 00000000 00:00 0 [stack]
    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effdf000-f0000000 rw-p 00000000 00:00 0 [stack]
    #

    That can be disabled easily though (see below). I'll have to modify some
    init scripts to find out what effect it has.

    # setarch -R sh
    # sh
    # cat < /proc/self/maps
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effdf000-f0000000 rw-p 00000000 00:00 0 [stack]
    # md5sum < /proc/self/maps
    baacbaf944fb01d3200d924da7f7a815 -
    # sh
    # md5sum < /proc/self/maps
    baacbaf944fb01d3200d924da7f7a815 -
    # sh
    # md5sum < /proc/self/maps
    baacbaf944fb01d3200d924da7f7a815 -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Sun Apr 9 09:50:02 2023
    On Sun, 9 Apr 2023, Michael Schmitz wrote:

    Am 09.04.2023 um 16:42 schrieb Finn Thain:
    On Sun, 9 Apr 2023, Michael Schmitz wrote:


    The only way I have found to alter dash's inclination to crash is to
    reboot. (I said previously I was unable to reproduce this in a
    single user mode shell but it turned out to be more subtle.)

    I wonder what could change from one boot to another - can you have
    dash (and its subshells) dump /proc/self/maps and see whether there's
    any variation in that? But what we really need is the physical
    mappings. How can we find those?

    With the kernel RNG disabled, I would expect neither of these
    mappings to change between boots?


    It looks like the stack area still changes across invocations:

    Yep, but running the same commands in the same order across different
    boots, does it still change?

    (I'm making a huge assumption here - that timing of the boot process and hence evolution of the kernel RNG is sufficiently deterministic. And
    this might apply only to the shells run from sysvinit, since that does require no keyboard input ...)

    Looks like cat < /proc/self/maps | grep stack would give us enough information without overwhelming the serial console?

    OTOH - if you can show the error is gone without stack address
    randomization, that would be a hint maybe?


    The results below were produced with 'norandmaps' added to the kernel parameters to avoid ASLR.

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# echo 3 > /proc/sys/vm/drop_caches
    [ 913.560000] bash (1024): drop_caches: 3
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "cat < /proc/self/maps"
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effdf000-f0000000 rw-p 00000000 00:00 0 [stack]


    So I guess this bug has more to do with timing and little to do with
    state, contrary to my guesswork above. And no doubt I will have to
    contradict myself again if/when it turns out that uninitialized memory is
    a factor :-/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geert Uytterhoeven@21:1/5 to [email protected] on Sun Apr 9 10:40:01 2023
    Hi Finn,

    On Sat, Apr 8, 2023 at 7:26 AM Finn Thain <[email protected]> wrote:
    On Fri, 7 Apr 2023, Geert Uytterhoeven wrote:
    The only way I have found to alter dash's inclination to crash is to reboot. (I said previously I was unable to reproduce this in a single user mode shell but it turned out to be more subtle.)

    That sounds like memory corruption somewhere else, e.g. in the buffer cache...


    If so, once the corruption showed up, you would expect the same crash next time...

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# echo 3 > /proc/sys/vm/drop_caches
    [ 937.250000] bash (717): drop_caches: 3
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    *** stack smashing detected ***: terminated
    Aborted (core dumped)

    I'd say it's probably not buffer cache corruption causing this because we
    can see two subshells fail, then just one.

    OK.

    For that build I enabled SLUB_DEBUG but forgot to enable SLUB_DEBUG_ON --

    FWIW, unfortunately not all SL*B_DEBUG are equal. I've seen
    memory corruptions that were identified by SLAB_DEBUG, but not by
    the other allocators.

    Gr{oetje,eeting}s,

    Geert

    --
    Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

    In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that.
    -- Linus Torvalds

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Mon Apr 10 10:20:02 2023
    Hi Finn,

    Am 09.04.2023 um 19:32 schrieb Finn Thain:
    Looks like cat < /proc/self/maps | grep stack would give us enough
    information without overwhelming the serial console?

    OTOH - if you can show the error is gone without stack address
    randomization, that would be a hint maybe?


    The results below were produced with 'norandmaps' added to the kernel parameters to avoid ASLR.

    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# echo 3 > /proc/sys/vm/drop_caches
    [ 913.560000] bash (1024): drop_caches: 3
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    root@debian:~# sh /etc/init.d/mountdevsubfs.sh start
    *** stack smashing detected ***: terminated
    Aborted (core dumped)
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "md5sum < /proc/self/maps" baacbaf944fb01d3200d924da7f7a815 -
    root@debian:~# sh -c "cat < /proc/self/maps"
    c0000000-c0021000 r-xp 00000000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0021000-c0023000 rw-p 00000000 00:00 0
    c0023000-c0024000 r--p 00021000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c0024000-c0026000 rw-p 00022000 08:06 38780 /usr/lib/m68k-linux-gnu/ld.so.1
    c002a000-c0199000 r-xp 00000000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c0199000-c019a000 ---p 0016f000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019a000-c019c000 r--p 00170000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c019c000-c01a0000 rw-p 00172000 08:06 38786 /usr/lib/m68k-linux-gnu/libc.so.6
    c01a0000-c01aa000 rw-p 00000000 00:00 0
    d0000000-d0019000 r-xp 00000000 08:06 32713 /usr/bin/dash d001b000-d001c000 r--p 00019000 08:06 32713 /usr/bin/dash d001c000-d001d000 rw-p 0001a000 08:06 32713 /usr/bin/dash d001d000-d001f000 rwxp 00000000 00:00 0 [heap]
    d001f000-d0040000 rwxp 00000000 00:00 0 [heap]
    effdf000-f0000000 rw-p 00000000 00:00 0 [stack]


    So I guess this bug has more to do with timing and little to do with
    state, contrary to my guesswork above. And no doubt I will have to

    What may still vary is physical mapping - I remember you had used some
    tool before to parse proc/<pid>/pagemap to determine the physical
    addresses for task stack areas? Or am I misremembering that from some
    other bug?

    contradict myself again if/when it turns out that uninitialized memory is
    a factor :-/

    I haven't found a config option to initialize memory returned by the
    kernel page allocators, so not sure how to test that ...

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Tue Apr 11 02:40:01 2023
    This message is in MIME format. The first part should be readable text,
    while the remaining parts are likely unreadable without MIME-aware tools.

    On Sun, 9 Apr 2023, Michael Schmitz wrote:

    Am 08.04.2023 um 00:06 schrieb Geert Uytterhoeven:

    On Fri, Apr 7, 2023 at 3:58 AM Michael Schmitz <wrote:
    The easiest way to do that is to log all wait and signal syscalls, as
    well as process exit. That might alter timing if these log messages
    go to the serial console though. Is that what you have in mind?

    Store to RAM, retrieve through a new /proc file?

    Yes, that could be done, though I'd rather avoid duplicating a lot of
    the generic message formatting code (printk and friends).

    I'll have a look around ...


    A better solution might be be to port the existing instrumentation like ftrace, kprobes, uprobes etc. Might be a lot of work though. I wonder
    how portable that stuff is.

    If you use printk, you could probably avoid most of the delays by enabling
    the dummy console. Then the kernel messages would be collected with dmesg, given a sufficiently large CONFIG_LOG_BUF_SHIFT. But it would be
    inconvenient to have no serial console available for the usual purposes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Tue Apr 11 07:00:01 2023
    Hi Finn,

    Am 11.04.2023 um 12:20 schrieb Finn Thain:
    On Sun, 9 Apr 2023, Michael Schmitz wrote:

    Am 08.04.2023 um 00:06 schrieb Geert Uytterhoeven:

    On Fri, Apr 7, 2023 at 3:58 AM Michael Schmitz <wrote:
    The easiest way to do that is to log all wait and signal syscalls, as
    well as process exit. That might alter timing if these log messages
    go to the serial console though. Is that what you have in mind?

    Store to RAM, retrieve through a new /proc file?

    Yes, that could be done, though I'd rather avoid duplicating a lot of
    the generic message formatting code (printk and friends).

    I'll have a look around ...


    A better solution might be be to port the existing instrumentation like ftrace, kprobes, uprobes etc. Might be a lot of work though. I wonder
    how portable that stuff is.

    If you use printk, you could probably avoid most of the delays by enabling the dummy console. Then the kernel messages would be collected with dmesg, given a sufficiently large CONFIG_LOG_BUF_SHIFT. But it would be
    inconvenient to have no serial console available for the usual purposes.

    Can we disable the serial console after boot, by registering the dummy
    console? Or will that just log messages to both?

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Tue Apr 11 08:10:01 2023
    This message is in MIME format. The first part should be readable text,
    while the remaining parts are likely unreadable without MIME-aware tools.

    On Tue, 11 Apr 2023, Michael Schmitz wrote:

    Am 11.04.2023 um 12:20 schrieb Finn Thain:
    On Sun, 9 Apr 2023, Michael Schmitz wrote:
    Am 08.04.2023 um 00:06 schrieb Geert Uytterhoeven:
    On Fri, Apr 7, 2023 at 3:58 AM Michael Schmitz <wrote:
    The easiest way to do that is to log all wait and signal syscalls,
    as well as process exit. That might alter timing if these log
    messages go to the serial console though. Is that what you have in
    mind?

    Store to RAM, retrieve through a new /proc file?

    Yes, that could be done, though I'd rather avoid duplicating a lot of
    the generic message formatting code (printk and friends).

    I'll have a look around ...


    A better solution might be be to port the existing instrumentation
    like ftrace, kprobes, uprobes etc. Might be a lot of work though. I
    wonder how portable that stuff is.

    If you use printk, you could probably avoid most of the delays by
    enabling the dummy console. Then the kernel messages would be
    collected with dmesg, given a sufficiently large CONFIG_LOG_BUF_SHIFT.
    But it would be inconvenient to have no serial console available for
    the usual purposes.

    Can we disable the serial console after boot, by registering the dummy console? Or will that just log messages to both?


    I don't know. I'd have to experiment in QEMU or Aranym.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geert Uytterhoeven@21:1/5 to [email protected] on Tue Apr 11 09:20:02 2023
    Hi Michael,

    On Tue, Apr 11, 2023 at 6:56 AM Michael Schmitz <[email protected]> wrote:
    Am 11.04.2023 um 12:20 schrieb Finn Thain:
    On Sun, 9 Apr 2023, Michael Schmitz wrote:
    Am 08.04.2023 um 00:06 schrieb Geert Uytterhoeven:
    On Fri, Apr 7, 2023 at 3:58 AM Michael Schmitz <wrote:
    The easiest way to do that is to log all wait and signal syscalls, as >>>> well as process exit. That might alter timing if these log messages
    go to the serial console though. Is that what you have in mind?

    Store to RAM, retrieve through a new /proc file?

    Yes, that could be done, though I'd rather avoid duplicating a lot of
    the generic message formatting code (printk and friends).

    I'll have a look around ...


    A better solution might be be to port the existing instrumentation like ftrace, kprobes, uprobes etc. Might be a lot of work though. I wonder
    how portable that stuff is.

    If you use printk, you could probably avoid most of the delays by enabling the dummy console. Then the kernel messages would be collected with dmesg, given a sufficiently large CONFIG_LOG_BUF_SHIFT. But it would be inconvenient to have no serial console available for the usual purposes.

    Can we disable the serial console after boot, by registering the dummy console? Or will that just log messages to both?

    You can increase loglevel.

    Gr{oetje,eeting}s,

    Geert


    --
    Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

    In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that.
    -- Linus Torvalds

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Tue Apr 11 10:30:01 2023
    Hi Geert,

    Am 11.04.2023 um 19:19 schrieb Geert Uytterhoeven:
    Hi Michael,

    On Tue, Apr 11, 2023 at 6:56 AM Michael Schmitz <[email protected]> wrote:
    Am 11.04.2023 um 12:20 schrieb Finn Thain:
    On Sun, 9 Apr 2023, Michael Schmitz wrote:
    Am 08.04.2023 um 00:06 schrieb Geert Uytterhoeven:
    On Fri, Apr 7, 2023 at 3:58 AM Michael Schmitz <wrote:
    The easiest way to do that is to log all wait and signal syscalls, as >>>>>> well as process exit. That might alter timing if these log messages >>>>>> go to the serial console though. Is that what you have in mind?

    Store to RAM, retrieve through a new /proc file?

    Yes, that could be done, though I'd rather avoid duplicating a lot of
    the generic message formatting code (printk and friends).

    I'll have a look around ...


    A better solution might be be to port the existing instrumentation like
    ftrace, kprobes, uprobes etc. Might be a lot of work though. I wonder
    how portable that stuff is.

    Looking at a few arch implementations, I'm utterly confused. Wouldn't
    know where to start.


    If you use printk, you could probably avoid most of the delays by enabling >>> the dummy console. Then the kernel messages would be collected with dmesg, >>> given a sufficiently large CONFIG_LOG_BUF_SHIFT. But it would be
    inconvenient to have no serial console available for the usual purposes.

    Can we disable the serial console after boot, by registering the dummy
    console? Or will that just log messages to both?

    You can increase loglevel.

    Yes, we could do that. I thought /proc has to be mounted for the
    loglevel to be changed, but I might be wrong.

    OTOH, if we log wait and signal actions at level debug, and set loglevel
    to info or notice through the command line, we won't miss much of the
    usual boot messages on the serial console ...

    Cheers,

    Michael



    Gr{oetje,eeting}s,

    Geert


    --
    Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

    In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that.
    -- Linus Torvalds


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eero Tamminen@21:1/5 to Michael Schmitz on Thu Apr 13 01:40:01 2023
    Hi Michael,

    On 11.4.2023 11.24, Michael Schmitz wrote:
    Am 11.04.2023 um 19:19 schrieb Geert Uytterhoeven:
    On Tue, Apr 11, 2023 at 6:56 AM Michael Schmitz <[email protected]>
    Am 11.04.2023 um 12:20 schrieb Finn Thain:
    A better solution might be be to port the existing instrumentation like >>>> ftrace, kprobes, uprobes etc. Might be a lot of work though. I wonder
    how portable that stuff is.

    Another possibility is just using manually added kernel tracepoints: https://www.kernel.org/doc/html/latest/trace/tracepoints.html

    "git grep TRACE_EVENT" did not given any hits for arch/m68k/ though.

    (I don't think those need any particular architecture support. One just
    adds them to locations where it makes sense to trace activity that is important, but which does not happen too often for the very small
    overhead from disabled static trace point to be significant.)


    Looking at a few arch implementations, I'm utterly confused. Wouldn't
    know where to start.

    Good article: https://lwn.net/Articles/132196/

    (Ignore jprobes stuff as that's deprecated.)


    Because probe replaces instruction at the probe point with a breakpoint: https://www.kernel.org/doc/html/latest/trace/kprobes.html#how-does-a-kprobe-work

    It needs architecture specific code to decode and simulate the replaced instruction (its side-effects) when it is later executed "out-of-line".

    This also causes architecture specific limitations on where the probe
    can be set. Architecture may not support simulating all possible
    instructions, just the most common ones (e.g. ones typically used on
    function entry and exit points).

    Besides that, AFAIK plain kprobes support should not have much
    architecture specific stuff.


    (I have no experience with kernel code, I just remember some LWN
    articles about kprobes during past 2 decades, and I was involved with a
    tool that did something similar in user-space, before Linux got uprobes support.)



    As to what architecture to use as an example, I think it would be best
    to ask tracing maintainers for advice, as there seems to be a lot of
    code which has been copied from each other at different points in time,
    that might be nowadays better as shared rather than under each arch...

    These architectures support kprobes:
    ------------------------------------
    $ grep " ok " Documentation/features/debug/kprobes/arch-support.txt
    | arc: | ok |
    | arm: | ok |
    | arm64: | ok |
    | csky: | ok |
    | ia64: | ok |
    | mips: | ok |
    | parisc: | ok |
    | powerpc: | ok |
    | riscv: | ok |
    | s390: | ok |
    | sh: | ok |
    | sparc: | ok |
    | x86: | ok |
    ------------------------------------

    But only these seem to have probing functionality isolated nicely to its
    own directory:
    ------------------------------------
    $ ls arch/*/kernel/probes/

    arch/arm64/kernel/probes/:
    decode-insn.c decode-insn.h kprobes.c kprobes_trampoline.S Makefile simulate-insn.c simulate-insn.h uprobes.c

    arch/csky/kernel/probes/:
    decode-insn.c decode-insn.h ftrace.c kprobes.c kprobes_trampoline.S Makefile simulate-insn.c simulate-insn.h uprobes.c

    arch/riscv/kernel/probes/:
    decode-insn.c decode-insn.h ftrace.c kprobes.c Makefile rethook.c rethook.h rethook_trampoline.S simulate-insn.c simulate-insn.h uprobes.c ------------------------------------


    As to testing whether kprobes work, that could be easier with ftrace
    support also present: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/ftrace/README


    - Eero

    PS. Looking at the kernel features list, m68k arch seems to be missing a
    lot of other features too: https://www.kernel.org/doc/html/latest/m68k/features.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to All on Fri Apr 14 11:50:01 2023
    On Wed, 5 Apr 2023, I wrote:


    I don't care that much what dash does as long as it isn't corrupting
    it's own stack, which is a real possibility, and one which gdb's data
    watch point would normally resolve. And yet I have no way to tackle
    that.

    I've been running gdb under QEMU, where the failure is not reproducible. Running dash under gdb on real hardware is doable (RAM permitting). But
    the failure is intermittent even then -- it only happens during
    execution of certain init scripts, and I can't reproduce it by manually running those scripts.

    (Even if I could reproduce the failure under gdb, instrumenting
    execution in gdb can alter timing in undesirable ways...)


    Somewhat optimistically, I upgraded the RAM on this system to 36 MB so I
    can run dash under gdb (20 MB was not enough). But, as expected, the crash
    went away when I did so.

    Outside of gdb, I was able to reproduce the same failure with a clean
    build from the dash repo (commit b00288f). I can get a crash with
    optimization level -O1 and -O though it becomes even more rare. So it's
    easier to use Debian's build (-O2).

    One of the difficulties with the core dump is that it happens too late.
    After the canary check fails, __stack_chk_fail() is called, which then
    calls a bunch of other stuff until finally abort() is called. This
    obliterates whatever was below the stack pointer at the time of the
    failure.

    So I modified libc.so.6 and now it just crashes with an illegal
    instruction in __wait3 rather than branching to __stack_chk_fail. This let
    me see whatever was left behind in stack memory by __wait4_time64() etc.

    __wait4_time64() calls __m68k_read_tp(), and the return address from __m68k_read_tp() can still be seen in stack memory, which suggests that
    the stack never grew after that call. (So __m68k_read_tp() is implicated.)

    Would signal delivery erase any of the memory immediately below the USP?
    If so, it would erase those old stack frames, which would give some
    indication of the timing of signal delivery.

    If I run dash under gdb under QEMU, I can break on entry to onsig() and
    find the signal frame on the stack. But when I examine stack memory from
    the core dump, I can't find 0x70774e40 (i.e. moveq __NR_sigreturn,%d0 ;
    trap #0) which the kernel puts on the stack in my QEMU experiments.

    That suggests that no signal was delivered... and yet gotsigchld == 1 at
    the time of the coredump, after having been initialized by waitproc()
    prior to calling __wait3(). So the signal handler onsig() must have
    executed during __wait3() or __wait4_time64(). I can't explain this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eero Tamminen@21:1/5 to Eero Tamminen on Sat Apr 15 18:00:01 2023
    Hi,

    On 13.4.2023 2.22, Eero Tamminen wrote:
    On 11.4.2023 11.24, Michael Schmitz wrote:
    Am 11.04.2023 um 19:19 schrieb Geert Uytterhoeven:
    On Tue, Apr 11, 2023 at 6:56 AM Michael Schmitz <[email protected]> >>>> Am 11.04.2023 um 12:20 schrieb Finn Thain:
    A better solution might be be to port the existing instrumentation
    like
    ftrace, kprobes, uprobes etc. Might be a lot of work though. I wonder >>>>> how portable that stuff is.

    Another possibility is just using manually added kernel tracepoints: https://www.kernel.org/doc/html/latest/trace/tracepoints.html

    "git grep TRACE_EVENT" did not given any hits for arch/m68k/ though.

    (I don't think those need any particular architecture support. One just
    adds them to locations where it makes sense to trace activity that is important, but which does not happen too often for the very small
    overhead from disabled static trace point to be significant.)


    Looking at a few arch implementations, I'm utterly confused. Wouldn't
    know where to start.

    Good article: https://lwn.net/Articles/132196/

    (Ignore jprobes stuff as that's deprecated.)


    Because probe replaces instruction at the probe point with a breakpoint: https://www.kernel.org/doc/html/latest/trace/kprobes.html#how-does-a-kprobe-work

    It needs architecture specific code to decode and simulate the replaced instruction (its side-effects) when it is later executed "out-of-line".

    This also causes architecture specific limitations on where the probe
    can be set.  Architecture may not support simulating all possible instructions, just the most common ones (e.g. ones typically used on
    function entry and exit points).

    These are the kernel symbol entry point instructions that get executed
    when booting Atari Falcon from IDE (using emulator LILO) to Busybox init:
    4524 link.w
    4 movea.l
    3 move.l
    3 clr.l
    2 movem.l
    2 dc.w
    2 btst.b
    1 subq.l
    1 ori.w
    1 move.w
    1 lea.l
    1 jsr
    1 bra.b

    For Linus' 6.2 kernel built with this config: https://git.tuxfamily.org/hatari/hatari.git/tree/tools/linux/kernel.config

    Using "gcc-m68k-linux-gnu" (4:10.2.1-1) in Debian Stable.


    Count is for how many of the executed kernel symbol addresses had given instruction.

    I'm a bit surprised how many functions (symbol addresses) start with
    link.w instruction, instead of e.g. move* instruction.

    (dc.w is check for emulator NatFeats API.)


    - Eero

    PS. Above info is produced by building & booting Linux under Hatari
    emulator as documented here:
    https://hatari.tuxfamily.org/doc/m68k-linux.txt

    And using:
    -------------------
    hatari ... --parse sym-breakpoint.ini > instr.txt
    -------------------

    Where "sym-breakpoint.ini" is:
    -------------------
    b PConSymbol > 0 :trace :quiet :file pc-disasm.ini
    -------------------

    And "pc-disasm.ini" file is:
    -------------------
    d pc-"pc+2"
    -------------------

    Then to get counts from the output:
    -------------------
    $ cat instr.txt | grep -E -e '^\$' | sort -u |\
    cut -b46-52 | sort | uniq -c | sort -nr
    -------------------

    (Match disassembly lines, drop duplicate function calls, count uniq
    entry point instructions.)


    Besides that, AFAIK plain kprobes support should not have much
    architecture specific stuff.


    (I have no experience with kernel code, I just remember some LWN
    articles about kprobes during past 2 decades, and I was involved with a
    tool that did something similar in user-space, before Linux got uprobes support.)



    As to what architecture to use as an example, I think it would be best
    to ask tracing maintainers for advice, as there seems to be a lot of
    code which has been copied from each other at different points in time,
    that might be nowadays better as shared rather than under each arch...

    These architectures support kprobes:
    ------------------------------------
    $ grep " ok " Documentation/features/debug/kprobes/arch-support.txt
        |         arc: |  ok  |
        |         arm: |  ok  |
        |       arm64: |  ok  |
        |        csky: |  ok  |
        |        ia64: |  ok  |
        |        mips: |  ok  |
        |      parisc: |  ok  |
        |     powerpc: |  ok  |
        |       riscv: |  ok  |
        |        s390: |  ok  |
        |          sh: |  ok  |
        |       sparc: |  ok  |
        |         x86: |  ok  | ------------------------------------

    But only these seem to have probing functionality isolated nicely to its
    own directory:
    ------------------------------------
    $ ls arch/*/kernel/probes/

    arch/arm64/kernel/probes/:
    decode-insn.c  decode-insn.h  kprobes.c  kprobes_trampoline.S  Makefile simulate-insn.c  simulate-insn.h  uprobes.c

    arch/csky/kernel/probes/:
    decode-insn.c  decode-insn.h  ftrace.c  kprobes.c  kprobes_trampoline.S Makefile  simulate-insn.c  simulate-insn.h  uprobes.c

    arch/riscv/kernel/probes/:
    decode-insn.c  decode-insn.h  ftrace.c  kprobes.c  Makefile  rethook.c rethook.h  rethook_trampoline.S  simulate-insn.c  simulate-insn.h uprobes.c
    ------------------------------------


    As to testing whether kprobes work, that could be easier with ftrace
    support also present: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/ftrace/README


        - Eero

    PS. Looking at the kernel features list, m68k arch seems to be missing a
    lot of other features too: https://www.kernel.org/doc/html/latest/m68k/features.html


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brad Boyer@21:1/5 to Eero Tamminen on Sun Apr 16 00:40:01 2023
    On Sat, Apr 15, 2023 at 06:50:08PM +0300, Eero Tamminen wrote:
    I'm a bit surprised how many functions (symbol addresses) start with link.w instruction, instead of e.g. move* instruction.

    That seems pretty normal to me. The LINK instruction sets up a stack
    frame for a function. It takes an address register and size (the .w
    suffix indicates a 16-bit size, while .l would indicate a 32-bit value
    of size). The old value of the register is pushed to the stack, the
    current stack pointer is saved into the given register, then the stack
    is increased by the given size. There is a corresponding UNLK
    instruction to tear down the stack frame which will frequently be the
    last instruction before returning to the caller. The compiler will use
    these in the function prologue and epilogue in all but the most
    trivial functions.

    Brad Boyer
    [email protected]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sun Apr 16 04:00:01 2023
    Hi Finn,

    Am 14.04.2023 um 21:30 schrieb Finn Thain:
    Would signal delivery erase any of the memory immediately below the USP?
    If so, it would erase those old stack frames, which would give some indication of the timing of signal delivery.

    The signal stack is set up immediately below USP, from my reading of signal.c:setup_frame(). Old stack frames will be overwritten.


    If I run dash under gdb under QEMU, I can break on entry to onsig() and
    find the signal frame on the stack. But when I examine stack memory from
    the core dump, I can't find 0x70774e40 (i.e. moveq __NR_sigreturn,%d0 ;
    trap #0) which the kernel puts on the stack in my QEMU experiments.

    As I understand this, the call to sys_sigreturn() removes both this code (signal trampoline IIRC) and the signal stack. At that point, the kernel
    signal handler (which in turn calls onsig()) has already been run.

    Again as far as I understand, the core dump happens on process exit.
    Stack smashing is detected and process exit is forced only at exit from __wait3() or __wait4_time64(), and those would have only been woken by receiving a signal. Both signal delivery and removal of signal stack by sys_sigreturn() must have happened before return from the wait syscalls.

    Cheers,

    Michael


    That suggests that no signal was delivered... and yet gotsigchld == 1 at
    the time of the coredump, after having been initialized by waitproc()
    prior to calling __wait3(). So the signal handler onsig() must have
    executed during __wait3() or __wait4_time64(). I can't explain this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Sun Apr 16 09:00:01 2023
    On Sun, 16 Apr 2023, Michael Schmitz wrote:

    Am 14.04.2023 um 21:30 schrieb Finn Thain:
    Would signal delivery erase any of the memory immediately below the
    USP? If so, it would erase those old stack frames, which would give
    some indication of the timing of signal delivery.

    The signal stack is set up immediately below USP, from my reading of signal.c:setup_frame(). Old stack frames will be overwritten.


    OK.


    If I run dash under gdb under QEMU, I can break on entry to onsig()
    and find the signal frame on the stack. But when I examine stack
    memory from the core dump, I can't find 0x70774e40 (i.e. moveq __NR_sigreturn,%d0 ; trap #0) which the kernel puts on the stack in my
    QEMU experiments.

    As I understand this, the call to sys_sigreturn() removes both this code (signal trampoline IIRC) and the signal stack...

    I don't see that stuff getting removed when I run dash under gdb under
    QEMU. With breakpoints at the head of onsig() and the tail of __wait3(),
    the memory under USP is the same when examined at either juncture.

    The backtrace confirms that this signal was delivered during execution of __wait3(). (Delivery can happen during execution of __libc_fork() but I
    just repeat the test until I get these ducks in a row.)

    (gdb) c
    Continuing.
    # x=$(:)
    [Detaching after fork from child process 1055]

    Breakpoint 6.1, onsig (signo=17) at trap.c:286
    286 trap.c: No such file or directory.
    (gdb) bt
    #0 onsig (signo=17) at trap.c:286
    #1 <signal handler called>
    #2 0xc00e81b6 in __GI___wait4_time64 (pid=-1, stat_loc=0xeffff86a, options=2,
    usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:35
    #3 0xc00e8164 in __GI___wait3_time64 (usage=0x0, options=<optimized out>,
    stat_loc=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:26
    #4 __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:35
    #5 0xd000c38e in waitproc (status=0xeffff85a, block=1) at jobs.c:1179
    #6 waitone (block=1, job=0xd001f618) at jobs.c:1055
    #7 0xd000c5b8 in dowait (block=1, jp=0xd001f618) at jobs.c:1137
    #8 0xd000ddb0 in waitforjob (jp=0xd001f618) at jobs.c:1014
    #9 0xd000aade in expbackq (flag=68, cmd=0xd001e4c8 <stackbase+36>)
    at expand.c:520
    #10 argstr (p=<optimized out>, flag=68) at expand.c:335
    #11 0xd000b5ce in expandarg (arg=0xd001e4e8 <stackbase+68>,
    arglist=0xeffffb08, flag=4) at expand.c:192
    #12 0xd0007e2a in evalcommand (cmd=<optimized out>, flags=<optimized out>)
    at eval.c:855
    #13 0xd0006ffc in evaltree (n=0xd001e4f8 <stackbase+84>, flags=0) at eval.c:300 #14 0xd000e3c0 in cmdloop (top=1) at main.c:246
    #15 0xd0005018 in main (argc=<optimized out>, argv=<optimized out>)
    at main.c:181


    0xeffff750: 0xc01a0000 saved $a5 == libc .got 0xeffff74c: 0xc0023e8c saved $a3 == &__stack_chk_guard 0xeffff748: 0x00000000 saved $a2
    0xeffff744: 0x00000001 saved $d5
    0xeffff740: 0xeffff86e saved $d4
    0xeffff73c: 0xeffff86a saved $d3
    0xeffff738: 0x00000002 saved $d2
    0xeffff734: 0x00000000
    0xeffff730: 0x00000000
    0xeffff72c: 0x00000000
    0xeffff728: 0x00000000
    0xeffff724: 0x00000000
    0xeffff720: 0x00000000
    0xeffff71c: 0x00000000
    0xeffff718: 0x00000000
    0xeffff714: 0x00000000
    0xeffff710: 0x00000000
    0xeffff70c: 0x00000000
    0xeffff708: 0x00000000
    0xeffff704: 0x00000000
    0xeffff700: 0x00000000
    0xeffff6fc: 0x00000000
    0xeffff6f8: 0x00000000
    0xeffff6f4: 0x00000000
    0xeffff6f0: 0x00000000
    0xeffff6ec: 0x00000000
    0xeffff6e8: 0x00000000
    0xeffff6e4: 0x00000000
    0xeffff6e0: 0x00000000
    0xeffff6dc: 0x00000000
    0xeffff6d8: 0x00000000
    0xeffff6d4: 0x00000000
    0xeffff6d0: 0x00000000
    0xeffff6cc: 0x00000000
    0xeffff6c8: 0x00000000
    0xeffff6c4: 0x00000000
    0xeffff6c0: 0x00000000
    0xeffff6bc: 0x00000000
    0xeffff6b8: 0x00000000
    0xeffff6b4: 0x00000000
    0xeffff6b0: 0x00000000
    0xeffff6ac: 0x00000000
    0xeffff6a8: 0x00000000
    0xeffff6a4: 0x00000000
    0xeffff6a0: 0x00000000
    0xeffff69c: 0x00000000
    0xeffff698: 0x00000000
    0xeffff694: 0x00000000
    0xeffff690: 0x00000000
    0xeffff68c: 0x00000000
    0xeffff688: 0x00000000
    0xeffff684: 0x00000000
    0xeffff680: 0x00000000
    0xeffff67c: 0x00000000
    0xeffff678: 0x00000000
    0xeffff674: 0x00000000
    0xeffff670: 0x00000000
    0xeffff66c: 0x00000000
    0xeffff668: 0x00000000
    0xeffff664: 0x00000000
    0xeffff660: 0x41000000
    0xeffff65c: 0x00000000
    0xeffff658: 0x00000000
    0xeffff654: 0x00000000
    0xeffff650: 0x00000000
    0xeffff64c: 0x80000000
    0xeffff648: 0x3fff0000
    0xeffff644: 0x00000000
    0xeffff640: 0xd0000000
    0xeffff63c: 0x40020000
    0xeffff638: 0x81b60080
    0xeffff634: 0x0000c00e
    0xeffff630: 0xd001e4e3 saved a1?
    0xeffff62c: 0xc0028780 saved a0?
    0xeffff628: 0xffffffff saved d1?
    0xeffff624: 0x0000041f saved d0?
    0xeffff620: 0xeffff738 saved sp?
    0xeffff61c: 0x00000000
    0xeffff618: 0x00000000
    0xeffff614: 0x00000000
    0xeffff610: 0x70774e40 moveq #119,%d0 ; trap #0
    0xeffff60c: 0xeffff61c
    0xeffff608: 0x00000080
    0xeffff604: 0x00000011
    0xeffff600: 0xeffff610 return address

    The above comes from dash running under gdb under qemu, which does not
    exhibit the failure but is convenient for that kind of experiment.

    Again as far as I understand, the core dump happens on process exit.
    Stack smashing is detected and process exit is forced only at exit from __wait3() or __wait4_time64(),

    I placed an illegal instruction in __wait3. This executes instead of the
    call to __stack_chk_fail because that obliterates stack memory of
    interest.

    Consequently the latest core dump still contains dead stack frames (see
    below) of subroutines that returned before __wait3() dumped core. You can
    see the return address for the branch to __wait4_time64() and below that
    you can see the return address for the branch to __m68k_read_tp().

    (gdb) disas __wait4_time64
    Dump of assembler code for function __GI___wait4_time64:
    0xc00e4174 <+0>: lea %sp@(-80),%sp
    0xc00e4178 <+4>: moveml %d2-%d5/%a2-%a3/%a5,%sp@-
    0xc00e417c <+8>: lea %pc@(0xc019c000),%a5
    0xc00e4184 <+16>: movel %sp@(116),%d2
    0xc00e4188 <+20>: moveal %sp@(124),%a2
    0xc00e418c <+24>: moveal %a5@(108),%a3
    0xc00e4190 <+28>: movel %a3@,%sp@(104)
    0xc00e4194 <+32>: bsrl 0xc0056e2c <__m68k_read_tp@plt>

    I gather the signal was delivered before __wait4_time64+38, otherwise the return address 0xc00e419a (which appears below) would have been
    overwritten by the signal frame. The signal must have been delivered after waitproc() initialized gotsigchld = 0 since gotsigchld is 1 at the time of
    the coredump.

    I assume the %a3 corruption happened after __wait4_time64+8 because that's
    when %a3 first appears on the stack. And the corruption must have happened before __wait4_time64+238, which is when %a3 was restored.

    If it was the signal which somehow corrupted the saved %a3, there's only a small window for that. The only syscall in that window is get_thread_area.

    Here's some stack memory from the core dump.

    0xeffff0dc: 0xd000c38e return address waitproc+124
    0xeffff0d8: 0xd001c1ec frame 0 $fp == &suppressint 0xeffff0d4: 0x00add14b canary
    0xeffff0d0: 0x00000000
    0xeffff0cc: 0x0000000a
    0xeffff0c8: 0x00000202
    0xeffff0c4: 0x00000008
    0xeffff0c0: 0x00000000
    0xeffff0bc: 0x00000000
    0xeffff0b8: 0x00000174
    0xeffff0b4: 0x00000004
    0xeffff0b0: 0x00000004
    0xeffff0ac: 0x00000006
    0xeffff0a8: 0x000000e0
    0xeffff0a4: 0x000000e0
    0xeffff0a0: 0x00171f20
    0xeffff09c: 0x00171f20
    0xeffff098: 0x00171f20
    0xeffff094: 0x00000002
    0xeffff090: 0x00002000
    0xeffff08c: 0x00000006
    0xeffff088: 0x0000e920
    0xeffff084: 0x00005360
    0xeffff080: 0x00170700
    0xeffff07c: 0x00170700
    0xeffff078: 0x00170700 frame 0 $fp - 96
    0xeffff074: 0xd001b874 saved $a5 == dash .got 0xeffff070: 0xd001e498 saved $a3 == &dash_errno 0xeffff06c: 0xd001e718 frame 0 $sp saved $a2 == &gotsigchld 0xeffff068: 0x00000000
    0xeffff064: 0x00000000
    0xeffff060: 0xeffff11e
    0xeffff05c: 0xffffffff
    0xeffff058: 0xc00e4164 return address __wait3+244
    0xeffff054: 0x00add14b canary
    0xeffff050: 0x00000001
    0xeffff04c: 0x00000004
    0xeffff048: 0x0000000d
    0xeffff044: 0x0000000d
    0xeffff040: 0x0015ef82
    0xeffff03c: 0x0015ef82
    0xeffff038: 0x0015ef82
    0xeffff034: 0x00000003
    0xeffff030: 0x00000004
    0xeffff02c: 0x00000004
    0xeffff028: 0x00000140
    0xeffff024: 0x00000140
    0xeffff020: 0x00000034
    0xeffff01c: 0x00000034
    0xeffff018: 0x00000034
    0xeffff014: 0x00000006
    0xeffff010: 0x003b003a
    0xeffff00c: 0x000a0028
    0xeffff008: 0x00340020
    0xeffff004: 0xc019c000 saved $a5 == libc .got 0xeffff000: 0xeffff068 saved $a3 (corrupted) 0xefffeffc: 0x00000000 saved $a2
    0xefffeff8: 0x00000001 saved $d5
    0xefffeff4: 0xeffff122 saved $d4
    0xefffeff0: 0xeffff11e saved $d3
    0xefffefec: 0x00000000 saved $d2
    0xefffefe8: 0xc00e419a return address __GI___wait4_time64+38 0xefffefe4: 0xc0028780
    0xefffefe0: 0x3c344bfb
    0xefffefdc: 0x000af353
    0xefffefd8: 0x3c340170
    0xefffefd4: 0x00000000
    0xefffefd0: 0xc00e417c
    0xefffefcc: 0xc00e417e
    0xefffefc8: 0xc00e4180
    0xefffefc4: 0x48e73c34
    0xefffefc0: 0x00000000
    0xefffefbc: 0xefffeff8
    0xefffefb8: 0xefffeffc
    0xefffefb4: 0x4bfb0170
    0xefffefb0: 0x0eee0709
    0xefffefac: 0x00000000
    0xefffefa8: 0x00000000
    0xefffefa4: 0x00000000
    0xefffefa0: 0x00000000
    0xefffef9c: 0x00000000
    0xefffef98: 0x00000000
    0xefffef94: 0x00000000
    0xefffef90: 0x00000000
    0xefffef8c: 0x00000000
    0xefffef88: 0x00000000
    0xefffef84: 0x00000000
    0xefffef80: 0x00000000
    0xefffef7c: 0x00000000
    0xefffef78: 0x00000000
    0xefffef74: 0x00000000
    0xefffef70: 0x00000000
    0xefffef6c: 0x00000000
    0xefffef68: 0x00000000
    0xefffef64: 0x00000000
    0xefffef60: 0x00000000
    0xefffef5c: 0x00000000
    0xefffef58: 0x00000000
    0xefffef54: 0x00000000
    0xefffef50: 0x00000000
    0xefffef4c: 0x00000000
    0xefffef48: 0x00000000
    0xefffef44: 0x00000000
    0xefffef40: 0x00000000
    0xefffef3c: 0x00000000
    0xefffef38: 0x00000000
    0xefffef34: 0x00000000
    0xefffef30: 0x00000000
    0xefffef2c: 0x00000000
    0xefffef28: 0x00000000
    0xefffef24: 0x00000000
    0xefffef20: 0x00000000
    0xefffef1c: 0x00000000
    0xefffef18: 0x00000000
    0xefffef14: 0x00000000
    0xefffef10: 0x7c0effff
    0xefffef0c: 0xffffffff
    0xefffef08: 0xaaaaaaaa
    0xefffef04: 0xaf54eaaa
    0xefffef00: 0x40040000
    0xefffeefc: 0x40040000
    0xefffeef8: 0x2b000000
    0xefffeef4: 0x00000000
    0xefffeef0: 0x00000000
    0xefffeeec: 0x408ece9a
    0xefffeee8: 0x00000000
    0xefffeee4: 0xf0ff0000
    0xefffeee0: 0x0f800000
    0xefffeedc: 0xf0fff0ff
    0xefffeed8: 0x1f380000
    0xefffeed4: 0x00000000
    0xefffeed0: 0x00000000
    0xefffeecc: 0x00000000
    0xefffeec8: 0xffffffff
    0xefffeec4: 0xffffffff
    0xefffeec0: 0x7fff0000
    0xefffeebc: 0xffffffff
    0xefffeeb8: 0xffffffff
    0xefffeeb4: 0x7fff0000

    The signal frame is not readily apparent (to me).

    Also, stack memory in the core dump and stack memory as observed upon
    signal handler breakpoint do not agree very well... I'd expect the values immediately below the stack pointer to be zeros once the signal frame was
    put there.

    I can't explain all of that unless it's discontiguous stack corruption.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Tue Apr 18 01:40:01 2023
    Hi Finn,

    Am 16.04.2023 um 18:44 schrieb Finn Thain:
    As I understand this, the call to sys_sigreturn() removes both this code
    (signal trampoline IIRC) and the signal stack...

    I don't see that stuff getting removed when I run dash under gdb under
    QEMU. With breakpoints at the head of onsig() and the tail of __wait3(),
    the memory under USP is the same when examined at either juncture.

    OK - with usp restored to what it was before the signal stack was added,
    the stack below original usp can be left as is (will be overwritten eventually).


    The backtrace confirms that this signal was delivered during execution of __wait3(). (Delivery can happen during execution of __libc_fork() but I
    just repeat the test until I get these ducks in a row.)

    (gdb) c
    Continuing.
    # x=$(:)
    [Detaching after fork from child process 1055]

    Breakpoint 6.1, onsig (signo=17) at trap.c:286
    286 trap.c: No such file or directory.
    (gdb) bt
    #0 onsig (signo=17) at trap.c:286
    #1 <signal handler called>
    #2 0xc00e81b6 in __GI___wait4_time64 (pid=-1, stat_loc=0xeffff86a, options=2,
    usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:35
    #3 0xc00e8164 in __GI___wait3_time64 (usage=0x0, options=<optimized out>,
    stat_loc=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:26

    Where did that one come from? I don't think we saw __GI___wait3_time64
    called in your disassembly of __wait3 ...

    #4 __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:35
    #5 0xd000c38e in waitproc (status=0xeffff85a, block=1) at jobs.c:1179
    #6 waitone (block=1, job=0xd001f618) at jobs.c:1055
    #7 0xd000c5b8 in dowait (block=1, jp=0xd001f618) at jobs.c:1137
    #8 0xd000ddb0 in waitforjob (jp=0xd001f618) at jobs.c:1014
    #9 0xd000aade in expbackq (flag=68, cmd=0xd001e4c8 <stackbase+36>)
    at expand.c:520
    #10 argstr (p=<optimized out>, flag=68) at expand.c:335
    #11 0xd000b5ce in expandarg (arg=0xd001e4e8 <stackbase+68>,
    arglist=0xeffffb08, flag=4) at expand.c:192
    #12 0xd0007e2a in evalcommand (cmd=<optimized out>, flags=<optimized out>)
    at eval.c:855
    #13 0xd0006ffc in evaltree (n=0xd001e4f8 <stackbase+84>, flags=0) at eval.c:300
    #14 0xd000e3c0 in cmdloop (top=1) at main.c:246
    #15 0xd0005018 in main (argc=<optimized out>, argv=<optimized out>)
    at main.c:181


    0xeffff750: 0xc01a0000 saved $a5 == libc .got 0xeffff74c: 0xc0023e8c saved $a3 == &__stack_chk_guard
    0xeffff748: 0x00000000 saved $a2
    0xeffff744: 0x00000001 saved $d5
    0xeffff740: 0xeffff86e saved $d4
    0xeffff73c: 0xeffff86a saved $d3
    0xeffff738: 0x00000002 saved $d2
    0xeffff734: 0x00000000
    0xeffff730: 0x00000000
    0xeffff72c: 0x00000000
    0xeffff728: 0x00000000
    0xeffff724: 0x00000000
    0xeffff720: 0x00000000
    0xeffff71c: 0x00000000
    0xeffff718: 0x00000000
    0xeffff714: 0x00000000
    0xeffff710: 0x00000000
    0xeffff70c: 0x00000000
    0xeffff708: 0x00000000
    0xeffff704: 0x00000000
    0xeffff700: 0x00000000
    0xeffff6fc: 0x00000000
    0xeffff6f8: 0x00000000
    0xeffff6f4: 0x00000000
    0xeffff6f0: 0x00000000
    0xeffff6ec: 0x00000000
    0xeffff6e8: 0x00000000
    0xeffff6e4: 0x00000000
    0xeffff6e0: 0x00000000
    0xeffff6dc: 0x00000000
    0xeffff6d8: 0x00000000
    0xeffff6d4: 0x00000000
    0xeffff6d0: 0x00000000
    0xeffff6cc: 0x00000000
    0xeffff6c8: 0x00000000
    0xeffff6c4: 0x00000000
    0xeffff6c0: 0x00000000
    0xeffff6bc: 0x00000000
    0xeffff6b8: 0x00000000
    0xeffff6b4: 0x00000000
    0xeffff6b0: 0x00000000
    0xeffff6ac: 0x00000000
    0xeffff6a8: 0x00000000
    0xeffff6a4: 0x00000000
    0xeffff6a0: 0x00000000
    0xeffff69c: 0x00000000
    0xeffff698: 0x00000000
    0xeffff694: 0x00000000
    0xeffff690: 0x00000000
    0xeffff68c: 0x00000000
    0xeffff688: 0x00000000
    0xeffff684: 0x00000000
    0xeffff680: 0x00000000
    0xeffff67c: 0x00000000
    0xeffff678: 0x00000000
    0xeffff674: 0x00000000
    0xeffff670: 0x00000000
    0xeffff66c: 0x00000000
    0xeffff668: 0x00000000
    0xeffff664: 0x00000000
    0xeffff660: 0x41000000
    0xeffff65c: 0x00000000
    0xeffff658: 0x00000000
    0xeffff654: 0x00000000
    0xeffff650: 0x00000000
    0xeffff64c: 0x80000000
    0xeffff648: 0x3fff0000
    0xeffff644: 0x00000000
    0xeffff640: 0xd0000000
    0xeffff63c: 0x40020000 <= (sc.formatvec& 0xffff) << 16; fpregs from here on
    0xeffff638: 0x81b60080 <= (sc.pc & 0xffff) << 16 | sc.formatvec >> 16
    0xeffff634: 0x0000c00e <= sc.sr << 16 sc.pc >> 16
    0xeffff630: 0xd001e4e3 saved a1? <= sc.a1
    0xeffff62c: 0xc0028780 saved a0? <= sc.a0
    0xeffff628: 0xffffffff saved d1? <= sc.d1
    0xeffff624: 0x0000041f saved d0? <= sc.d0
    0xeffff620: 0xeffff738 saved sp? <= sc.usp
    0xeffff61c: 0x00000000 <= sc.mask
    0xeffff618: 0x00000000 <= extramask 0xeffff614: 0x00000000 <= frame.retcode[1] 0xeffff610: 0x70774e40 moveq #119,%d0 ; trap #0
    0xeffff60c: 0xeffff61c <= frame->sc 0xeffff608: 0x00000080 <= tregs->vector 0xeffff604: 0x00000011 <= signal no. 0xeffff600: 0xeffff610 return address

    The above comes from dash running under gdb under qemu, which does not exhibit the failure but is convenient for that kind of experiment.

    I would have expected to see a different signal trampoline (for sys_rt_sigreturn) ... But anyway:

    The saved pc is 0xc00e81b6 which does match the backtrace above. Vector
    offset 80 matches trap 0 which suggests 0xc00e81b6 should be the
    instruction after a trap 0 instruction. d0 is 1055 which is not a signal
    number I recognize.


    Again as far as I understand, the core dump happens on process exit.
    Stack smashing is detected and process exit is forced only at exit from
    __wait3() or __wait4_time64(),

    I placed an illegal instruction in __wait3. This executes instead of the
    call to __stack_chk_fail because that obliterates stack memory of
    interest.

    OK.


    Consequently the latest core dump still contains dead stack frames (see below) of subroutines that returned before __wait3() dumped core. You can
    see the return address for the branch to __wait4_time64() and below that
    you can see the return address for the branch to __m68k_read_tp().

    (gdb) disas __wait4_time64
    Dump of assembler code for function __GI___wait4_time64:
    0xc00e4174 <+0>: lea %sp@(-80),%sp
    0xc00e4178 <+4>: moveml %d2-%d5/%a2-%a3/%a5,%sp@-
    0xc00e417c <+8>: lea %pc@(0xc019c000),%a5
    0xc00e4184 <+16>: movel %sp@(116),%d2
    0xc00e4188 <+20>: moveal %sp@(124),%a2
    0xc00e418c <+24>: moveal %a5@(108),%a3
    0xc00e4190 <+28>: movel %a3@,%sp@(104)
    0xc00e4194 <+32>: bsrl 0xc0056e2c <__m68k_read_tp@plt>

    I gather the signal was delivered before __wait4_time64+38, otherwise the return address 0xc00e419a (which appears below) would have been
    overwritten by the signal frame. The signal must have been delivered after waitproc() initialized gotsigchld = 0 since gotsigchld is 1 at the time of the coredump.

    I assume the %a3 corruption happened after __wait4_time64+8 because that's when %a3 first appears on the stack. And the corruption must have happened before __wait4_time64+238, which is when %a3 was restored.

    If it was the signal which somehow corrupted the saved %a3, there's only a small window for that. The only syscall in that window is get_thread_area.

    I see sys_wait4 called in two places (0xc00e01b4, and then 0xc00e0286
    depending on the return code of the first). The second one again would
    have called __m68k_read_tp so would have left a return address on the
    stack (0xc00e02d2). Leaves the first.


    Here's some stack memory from the core dump.

    0xeffff0dc: 0xd000c38e return address waitproc+124
    0xeffff0d8: 0xd001c1ec frame 0 $fp == &suppressint 0xeffff0d4: 0x00add14b canary
    0xeffff0d0: 0x00000000
    0xeffff0cc: 0x0000000a
    0xeffff0c8: 0x00000202
    0xeffff0c4: 0x00000008
    0xeffff0c0: 0x00000000
    0xeffff0bc: 0x00000000
    0xeffff0b8: 0x00000174
    0xeffff0b4: 0x00000004
    0xeffff0b0: 0x00000004
    0xeffff0ac: 0x00000006
    0xeffff0a8: 0x000000e0
    0xeffff0a4: 0x000000e0
    0xeffff0a0: 0x00171f20
    0xeffff09c: 0x00171f20
    0xeffff098: 0x00171f20
    0xeffff094: 0x00000002
    0xeffff090: 0x00002000
    0xeffff08c: 0x00000006
    0xeffff088: 0x0000e920
    0xeffff084: 0x00005360
    0xeffff080: 0x00170700
    0xeffff07c: 0x00170700
    0xeffff078: 0x00170700 frame 0 $fp - 96
    0xeffff074: 0xd001b874 saved $a5 == dash .got 0xeffff070: 0xd001e498 saved $a3 == &dash_errno 0xeffff06c: 0xd001e718 frame 0 $sp saved $a2 == &gotsigchld 0xeffff068: 0x00000000
    0xeffff064: 0x00000000
    0xeffff060: 0xeffff11e
    0xeffff05c: 0xffffffff
    0xeffff058: 0xc00e4164 return address __wait3+244
    0xeffff054: 0x00add14b canary
    0xeffff050: 0x00000001
    0xeffff04c: 0x00000004
    0xeffff048: 0x0000000d
    0xeffff044: 0x0000000d
    0xeffff040: 0x0015ef82
    0xeffff03c: 0x0015ef82
    0xeffff038: 0x0015ef82
    0xeffff034: 0x00000003
    0xeffff030: 0x00000004
    0xeffff02c: 0x00000004
    0xeffff028: 0x00000140
    0xeffff024: 0x00000140
    0xeffff020: 0x00000034
    0xeffff01c: 0x00000034
    0xeffff018: 0x00000034
    0xeffff014: 0x00000006
    0xeffff010: 0x003b003a
    0xeffff00c: 0x000a0028
    0xeffff008: 0x00340020
    0xeffff004: 0xc019c000 saved $a5 == libc .got 0xeffff000: 0xeffff068 saved $a3 (corrupted) 0xefffeffc: 0x00000000 saved $a2
    0xefffeff8: 0x00000001 saved $d5
    0xefffeff4: 0xeffff122 saved $d4
    0xefffeff0: 0xeffff11e saved $d3
    0xefffefec: 0x00000000 saved $d2
    0xefffefe8: 0xc00e419a return address __GI___wait4_time64+38 0xefffefe4: 0xc0028780
    0xefffefe0: 0x3c344bfb
    0xefffefdc: 0x000af353
    0xefffefd8: 0x3c340170
    0xefffefd4: 0x00000000
    0xefffefd0: 0xc00e417c
    0xefffefcc: 0xc00e417e
    0xefffefc8: 0xc00e4180
    0xefffefc4: 0x48e73c34
    0xefffefc0: 0x00000000
    0xefffefbc: 0xefffeff8
    0xefffefb8: 0xefffeffc
    0xefffefb4: 0x4bfb0170
    0xefffefb0: 0x0eee0709
    0xefffefac: 0x00000000
    0xefffefa8: 0x00000000
    0xefffefa4: 0x00000000
    0xefffefa0: 0x00000000
    0xefffef9c: 0x00000000
    0xefffef98: 0x00000000
    0xefffef94: 0x00000000
    0xefffef90: 0x00000000
    0xefffef8c: 0x00000000
    0xefffef88: 0x00000000
    0xefffef84: 0x00000000
    0xefffef80: 0x00000000
    0xefffef7c: 0x00000000
    0xefffef78: 0x00000000
    0xefffef74: 0x00000000
    0xefffef70: 0x00000000
    0xefffef6c: 0x00000000
    0xefffef68: 0x00000000
    0xefffef64: 0x00000000
    0xefffef60: 0x00000000
    0xefffef5c: 0x00000000
    0xefffef58: 0x00000000
    0xefffef54: 0x00000000
    0xefffef50: 0x00000000
    0xefffef4c: 0x00000000
    0xefffef48: 0x00000000
    0xefffef44: 0x00000000
    0xefffef40: 0x00000000
    0xefffef3c: 0x00000000
    0xefffef38: 0x00000000
    0xefffef34: 0x00000000
    0xefffef30: 0x00000000
    0xefffef2c: 0x00000000
    0xefffef28: 0x00000000
    0xefffef24: 0x00000000
    0xefffef20: 0x00000000
    0xefffef1c: 0x00000000
    0xefffef18: 0x00000000
    0xefffef14: 0x00000000
    0xefffef10: 0x7c0effff
    0xefffef0c: 0xffffffff
    0xefffef08: 0xaaaaaaaa
    0xefffef04: 0xaf54eaaa
    0xefffef00: 0x40040000
    0xefffeefc: 0x40040000
    0xefffeef8: 0x2b000000
    0xefffeef4: 0x00000000
    0xefffeef0: 0x00000000
    0xefffeeec: 0x408ece9a
    0xefffeee8: 0x00000000
    0xefffeee4: 0xf0ff0000
    0xefffeee0: 0x0f800000
    0xefffeedc: 0xf0fff0ff
    0xefffeed8: 0x1f380000
    0xefffeed4: 0x00000000
    0xefffeed0: 0x00000000
    0xefffeecc: 0x00000000
    0xefffeec8: 0xffffffff
    0xefffeec4: 0xffffffff
    0xefffeec0: 0x7fff0000
    0xefffeebc: 0xffffffff
    0xefffeeb8: 0xffffffff
    0xefffeeb4: 0x7fff0000

    The signal frame is not readily apparent (to me).

    From looking at the above stack dump, sc ought to start at 0xefffee90,
    and the trampoline would be three words below that. The last address you
    show corresponds to 0xeffff640 in first dump above, which is at the
    start of the saved fpregs. I'd say we just miss the beginning of the
    signal frame?

    (My reasoning is that copy_siginfo_to_user clears the end of the signal
    stack, which is what we can see in both cases.)

    Can't explain the 14 words below the saved return address though.


    Also, stack memory in the core dump and stack memory as observed upon
    signal handler breakpoint do not agree very well... I'd expect the values immediately below the stack pointer to be zeros once the signal frame was
    put there.

    Yes, but what you get to see in the core dump is the stack after return
    from __GI___wait4_time64 ... and differences in timing may have resulted
    in different code executed (assuming the gdb trace is from the first
    subshell signaling, and the dump from one of the later ones ...).


    I can't explain all of that unless it's discontiguous stack corruption.


    Me neither, sorry.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Tue Apr 18 04:20:01 2023
    On Tue, 18 Apr 2023, Michael Schmitz wrote:

    Am 16.04.2023 um 18:44 schrieb Finn Thain:


    The backtrace confirms that this signal was delivered during execution
    of __wait3(). (Delivery can happen during execution of __libc_fork()
    but I just repeat the test until I get these ducks in a row.)

    (gdb) c
    Continuing.
    # x=$(:)
    [Detaching after fork from child process 1055]

    Breakpoint 6.1, onsig (signo=17) at trap.c:286
    286 trap.c: No such file or directory.
    (gdb) bt
    #0 onsig (signo=17) at trap.c:286
    #1 <signal handler called>
    #2 0xc00e81b6 in __GI___wait4_time64 (pid=-1, stat_loc=0xeffff86a, options=2,
    usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:35
    #3 0xc00e8164 in __GI___wait3_time64 (usage=0x0, options=<optimized out>,
    stat_loc=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:26

    Where did that one come from? I don't think we saw __GI___wait3_time64
    called in your disassembly of __wait3 ...

    #4 __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:35


    Well spotted. However, it turns out there is a good explanation for that:

    (gdb) print __GI___wait3_time64
    $3 = {pid_t (int *, int, struct __rusage64 *)} 0xc00e4054 <__GI___wait3_time64> (gdb) disass __GI___wait3_time64
    Dump of assembler code for function __GI___wait3_time64:
    0xc00e4054 <+0>: movel %sp@(12),%sp@-
    0xc00e4058 <+4>: movel %sp@(12),%sp@-
    0xc00e405c <+8>: movel %sp@(12),%sp@-
    0xc00e4060 <+12>: pea 0xffffffff
    0xc00e4064 <+16>: bsrl 0xc00e4174 <__GI___wait4_time64>
    0xc00e406a <+22>: lea %sp@(16),%sp
    0xc00e406e <+26>: rts
    End of assembler dump.
    (gdb) print __wait3
    $2 = {pid_t (int *, int, struct rusage *)} 0xc00e4070 <__wait3>
    (gdb) disass __wait3
    Dump of assembler code for function __wait3:
    0xc00e4070 <+0>: linkw %fp,#-96
    0xc00e4074 <+4>: moveml %a2-%a3/%a5,%sp@-
    0xc00e4078 <+8>: lea %pc@(0xc019c000),%a5
    0xc00e4080 <+16>: movel %fp@(8),%d0
    0xc00e4084 <+20>: moveal %fp@(16),%a2
    0xc00e4088 <+24>: moveal %a5@(108),%a3
    0xc00e408c <+28>: movel %a3@,%fp@(-4)
    0xc00e4090 <+32>: tstl %a2
    0xc00e4092 <+34>: beqw 0xc00e4152 <__wait3+226>
    0xc00e4096 <+38>: pea %fp@(-92)
    0xc00e409a <+42>: movel %fp@(12),%sp@-
    0xc00e409e <+46>: movel %d0,%sp@-
    0xc00e40a0 <+48>: pea 0xffffffff
    0xc00e40a4 <+52>: bsrl 0xc00e4174 <__GI___wait4_time64>
    0xc00e40aa <+58>: lea %sp@(16),%sp
    0xc00e40ae <+62>: tstl %d0
    0xc00e40b0 <+64>: bgts 0xc00e40c8 <__wait3+88>
    0xc00e40b2 <+66>: moveal %fp@(-4),%a0
    0xc00e40b6 <+70>: movel %a3@,%d1
    0xc00e40b8 <+72>: cmpl %a0,%d1
    0xc00e40ba <+74>: bnew 0xc00e416c <__wait3+252>
    0xc00e40be <+78>: moveml %fp@(-108),%a2-%a3/%a5
    0xc00e40c4 <+84>: unlk %fp
    0xc00e40c6 <+86>: rts
    0xc00e40c8 <+88>: pea 0x44
    0xc00e40cc <+92>: clrl %sp@-
    0xc00e40ce <+94>: pea %a2@(4)
    0xc00e40d2 <+98>: movel %d0,%fp@(-96)
    0xc00e40d6 <+102>: bsrl 0xc00bc850 <__GI_memset>
    0xc00e40dc <+108>: movel %fp@(-88),%a2@
    0xc00e40e0 <+112>: movel %fp@(-80),%a2@(4)
    0xc00e40e6 <+118>: movel %fp@(-72),%a2@(8)
    0xc00e40ec <+124>: movel %fp@(-64),%a2@(12)
    0xc00e40f2 <+130>: movel %fp@(-60),%a2@(16)
    0xc00e40f8 <+136>: movel %fp@(-56),%a2@(20)
    0xc00e40fe <+142>: movel %fp@(-52),%a2@(24)
    0xc00e4104 <+148>: movel %fp@(-48),%a2@(28)
    0xc00e410a <+154>: movel %fp@(-44),%a2@(32)
    0xc00e4110 <+160>: movel %fp@(-40),%a2@(36)
    0xc00e4116 <+166>: movel %fp@(-36),%a2@(40)
    0xc00e411c <+172>: movel %fp@(-32),%a2@(44)
    0xc00e4122 <+178>: movel %fp@(-28),%a2@(48)
    0xc00e4128 <+184>: movel %fp@(-24),%a2@(52)
    0xc00e412e <+190>: movel %fp@(-20),%a2@(56)
    0xc00e4134 <+196>: movel %fp@(-16),%a2@(60)
    0xc00e413a <+202>: movel %fp@(-12),%a2@(64)
    0xc00e4140 <+208>: movel %fp@(-8),%a2@(68)
    0xc00e4146 <+214>: lea %sp@(12),%sp
    0xc00e414a <+218>: movel %fp@(-96),%d0
    0xc00e414e <+222>: braw 0xc00e40b2 <__wait3+66>
    0xc00e4152 <+226>: clrl %sp@-
    0xc00e4154 <+228>: movel %fp@(12),%sp@-
    0xc00e4158 <+232>: movel %d0,%sp@-
    0xc00e415a <+234>: pea 0xffffffff
    0xc00e415e <+238>: bsrl 0xc00e4174 <__GI___wait4_time64>
    0xc00e4164 <+244>: lea %sp@(16),%sp
    0xc00e4168 <+248>: braw 0xc00e40b2 <__wait3+66>
    0xc00e416c <+252>: illegal
    End of assembler dump.


    (gdb) info frame
    Stack level 3, frame at 0xeffff82c:
    pc = 0xc00e8164 in __GI___wait3_time64
    (../sysdeps/unix/sysv/linux/wait3.c:26); saved pc = 0xd000c38e
    inlined into frame 4, caller of frame at 0xeffff7a8
    source language c.
    Arglist at unknown address.
    Locals at unknown address, Previous frame's sp is 0xeffff7a8
    Saved registers:
    d2 at 0xeffff738, d3 at 0xeffff73c, d4 at 0xeffff740, d5 at 0xeffff744,
    a2 at 0xeffff748, a3 at 0xeffff74c, a5 at 0xeffff750, pc at 0xeffff7a4


    (gdb) up
    #4 __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:35
    35 in ../sysdeps/unix/sysv/linux/wait3.c
    (gdb) info frame
    Stack level 4, frame at 0xeffff82c:
    pc = 0xc00e8164 in __wait3 (../sysdeps/unix/sysv/linux/wait3.c:35);
    saved pc = 0xd000c38e
    called by frame at 0xeffff928, caller of frame at 0xeffff82c
    source language c.
    Arglist at 0xeffff824, args: stat_loc=<optimized out>,
    options=<optimized out>, usage=<optimized out>
    Locals at 0xeffff824, Previous frame's sp is 0xeffff82c
    Saved registers:
    a2 at 0xeffff7b8, a3 at 0xeffff7bc, a5 at 0xeffff7c0, fp at 0xeffff824,
    pc at 0xeffff828


    Note that frame 3 was "inlined into frame 4". The inlined code can be seen above at 0xc00e4154. So the backtrace is misleading inasmuchas it
    represents the source code rather than the disassembly.

    #5 0xd000c38e in waitproc (status=0xeffff85a, block=1) at jobs.c:1179
    #6 waitone (block=1, job=0xd001f618) at jobs.c:1055
    #7 0xd000c5b8 in dowait (block=1, jp=0xd001f618) at jobs.c:1137
    #8 0xd000ddb0 in waitforjob (jp=0xd001f618) at jobs.c:1014
    #9 0xd000aade in expbackq (flag=68, cmd=0xd001e4c8 <stackbase+36>)
    at expand.c:520
    #10 argstr (p=<optimized out>, flag=68) at expand.c:335
    #11 0xd000b5ce in expandarg (arg=0xd001e4e8 <stackbase+68>,
    arglist=0xeffffb08, flag=4) at expand.c:192
    #12 0xd0007e2a in evalcommand (cmd=<optimized out>, flags=<optimized out>)
    at eval.c:855
    #13 0xd0006ffc in evaltree (n=0xd001e4f8 <stackbase+84>, flags=0) at eval.c:300
    #14 0xd000e3c0 in cmdloop (top=1) at main.c:246
    #15 0xd0005018 in main (argc=<optimized out>, argv=<optimized out>)
    at main.c:181


    0xeffff750: 0xc01a0000 saved $a5 == libc .got 0xeffff74c: 0xc0023e8c saved $a3 == &__stack_chk_guard
    0xeffff748: 0x00000000 saved $a2
    0xeffff744: 0x00000001 saved $d5
    0xeffff740: 0xeffff86e saved $d4
    0xeffff73c: 0xeffff86a saved $d3
    0xeffff738: 0x00000002 saved $d2
    0xeffff734: 0x00000000
    0xeffff730: 0x00000000
    0xeffff72c: 0x00000000
    0xeffff728: 0x00000000
    0xeffff724: 0x00000000
    0xeffff720: 0x00000000
    0xeffff71c: 0x00000000
    0xeffff718: 0x00000000
    0xeffff714: 0x00000000
    0xeffff710: 0x00000000
    0xeffff70c: 0x00000000
    0xeffff708: 0x00000000
    0xeffff704: 0x00000000
    0xeffff700: 0x00000000
    0xeffff6fc: 0x00000000
    0xeffff6f8: 0x00000000
    0xeffff6f4: 0x00000000
    0xeffff6f0: 0x00000000
    0xeffff6ec: 0x00000000
    0xeffff6e8: 0x00000000
    0xeffff6e4: 0x00000000
    0xeffff6e0: 0x00000000
    0xeffff6dc: 0x00000000
    0xeffff6d8: 0x00000000
    0xeffff6d4: 0x00000000
    0xeffff6d0: 0x00000000
    0xeffff6cc: 0x00000000
    0xeffff6c8: 0x00000000
    0xeffff6c4: 0x00000000
    0xeffff6c0: 0x00000000
    0xeffff6bc: 0x00000000
    0xeffff6b8: 0x00000000
    0xeffff6b4: 0x00000000
    0xeffff6b0: 0x00000000
    0xeffff6ac: 0x00000000
    0xeffff6a8: 0x00000000
    0xeffff6a4: 0x00000000
    0xeffff6a0: 0x00000000
    0xeffff69c: 0x00000000
    0xeffff698: 0x00000000
    0xeffff694: 0x00000000
    0xeffff690: 0x00000000
    0xeffff68c: 0x00000000
    0xeffff688: 0x00000000
    0xeffff684: 0x00000000
    0xeffff680: 0x00000000
    0xeffff67c: 0x00000000
    0xeffff678: 0x00000000
    0xeffff674: 0x00000000
    0xeffff670: 0x00000000
    0xeffff66c: 0x00000000
    0xeffff668: 0x00000000
    0xeffff664: 0x00000000
    0xeffff660: 0x41000000
    0xeffff65c: 0x00000000
    0xeffff658: 0x00000000
    0xeffff654: 0x00000000
    0xeffff650: 0x00000000
    0xeffff64c: 0x80000000
    0xeffff648: 0x3fff0000
    0xeffff644: 0x00000000
    0xeffff640: 0xd0000000
    0xeffff63c: 0x40020000 <= (sc.formatvec & 0xffff) << 16; fpregs from here on
    0xeffff638: 0x81b60080 <= (sc.pc & 0xffff) << 16 | sc.formatvec >> 16
    0xeffff634: 0x0000c00e <= sc.sr << 16 sc.pc >> 16
    0xeffff630: 0xd001e4e3 <= sc.a1
    0xeffff62c: 0xc0028780 <= sc.a0
    0xeffff628: 0xffffffff <= sc.d1
    0xeffff624: 0x0000041f <= sc.d0
    0xeffff620: 0xeffff738 <= sc.usp
    0xeffff61c: 0x00000000 <= sc.mask
    0xeffff618: 0x00000000 <= extramask 0xeffff614: 0x00000000 <= frame.retcode[1] 0xeffff610: 0x70774e40 moveq #119,%d0 ; trap #0
    0xeffff60c: 0xeffff61c <= frame->sc 0xeffff608: 0x00000080 <= tregs->vector 0xeffff604: 0x00000011 <= signal no. 0xeffff600: 0xeffff610 return address

    The above comes from dash running under gdb under qemu, which does not exhibit the failure but is convenient for that kind of experiment.

    I would have expected to see a different signal trampoline (for sys_rt_sigreturn) ...

    Well, this seems to be the trampoline from setup_frame() and not setup_rt_frame().

    But anyway:

    The saved pc is 0xc00e81b6 which does match the backtrace above. Vector offset 80 matches trap 0 which suggests 0xc00e81b6 should be the
    instruction after a trap 0 instruction. d0 is 1055 which is not a signal number I recognize.


    I don't know what d0 represents here. But &frame->sig == 0x11 is correct (SIGCHLD).


    Again as far as I understand, the core dump happens on process exit.
    Stack smashing is detected and process exit is forced only at exit
    from __wait3() or __wait4_time64(),

    I placed an illegal instruction in __wait3. This executes instead of
    the call to __stack_chk_fail because that obliterates stack memory of interest.

    OK.


    Consequently the latest core dump still contains dead stack frames
    (see below) of subroutines that returned before __wait3() dumped core.
    You can see the return address for the branch to __wait4_time64() and
    below that you can see the return address for the branch to __m68k_read_tp().

    (gdb) disas __wait4_time64
    Dump of assembler code for function __GI___wait4_time64:
    0xc00e4174 <+0>: lea %sp@(-80),%sp
    0xc00e4178 <+4>: moveml %d2-%d5/%a2-%a3/%a5,%sp@-
    0xc00e417c <+8>: lea %pc@(0xc019c000),%a5
    0xc00e4184 <+16>: movel %sp@(116),%d2
    0xc00e4188 <+20>: moveal %sp@(124),%a2
    0xc00e418c <+24>: moveal %a5@(108),%a3
    0xc00e4190 <+28>: movel %a3@,%sp@(104)
    0xc00e4194 <+32>: bsrl 0xc0056e2c <__m68k_read_tp@plt>

    I gather the signal was delivered before __wait4_time64+38, otherwise
    the return address 0xc00e419a (which appears below) would have been overwritten by the signal frame. The signal must have been delivered
    after waitproc() initialized gotsigchld = 0 since gotsigchld is 1 at
    the time of the coredump.

    I assume the %a3 corruption happened after __wait4_time64+8 because
    that's when %a3 first appears on the stack. And the corruption must
    have happened before __wait4_time64+238, which is when %a3 was
    restored.

    If it was the signal which somehow corrupted the saved %a3, there's
    only a small window for that. The only syscall in that window is get_thread_area.

    I see sys_wait4 called in two places (0xc00e01b4, and then 0xc00e0286 depending on the return code of the first). The second one again would
    have called __m68k_read_tp so would have left a return address on the
    stack (0xc00e02d2). Leaves the first.


    That's why my analysis stopped at __wait4_time64+38: the rest of
    __wait4_time64 is not relevant to the dead stack contents. (It would have
    left a different return address in that memory location.)


    Here's some stack memory from the core dump.

    0xeffff0dc: 0xd000c38e return address waitproc+124
    0xeffff0d8: 0xd001c1ec frame 0 $fp == &suppressint
    0xeffff0d4: 0x00add14b canary
    0xeffff0d0: 0x00000000
    0xeffff0cc: 0x0000000a
    0xeffff0c8: 0x00000202
    0xeffff0c4: 0x00000008
    0xeffff0c0: 0x00000000
    0xeffff0bc: 0x00000000
    0xeffff0b8: 0x00000174
    0xeffff0b4: 0x00000004
    0xeffff0b0: 0x00000004
    0xeffff0ac: 0x00000006
    0xeffff0a8: 0x000000e0
    0xeffff0a4: 0x000000e0
    0xeffff0a0: 0x00171f20
    0xeffff09c: 0x00171f20
    0xeffff098: 0x00171f20
    0xeffff094: 0x00000002
    0xeffff090: 0x00002000
    0xeffff08c: 0x00000006
    0xeffff088: 0x0000e920
    0xeffff084: 0x00005360
    0xeffff080: 0x00170700
    0xeffff07c: 0x00170700
    0xeffff078: 0x00170700 frame 0 $fp - 96
    0xeffff074: 0xd001b874 saved $a5 == dash .got 0xeffff070: 0xd001e498 saved $a3 == &dash_errno 0xeffff06c: 0xd001e718 frame 0 $sp saved $a2 == &gotsigchld 0xeffff068: 0x00000000
    0xeffff064: 0x00000000
    0xeffff060: 0xeffff11e
    0xeffff05c: 0xffffffff
    0xeffff058: 0xc00e4164 return address __wait3+244
    0xeffff054: 0x00add14b canary
    0xeffff050: 0x00000001
    0xeffff04c: 0x00000004
    0xeffff048: 0x0000000d
    0xeffff044: 0x0000000d
    0xeffff040: 0x0015ef82
    0xeffff03c: 0x0015ef82
    0xeffff038: 0x0015ef82
    0xeffff034: 0x00000003
    0xeffff030: 0x00000004
    0xeffff02c: 0x00000004
    0xeffff028: 0x00000140
    0xeffff024: 0x00000140
    0xeffff020: 0x00000034
    0xeffff01c: 0x00000034
    0xeffff018: 0x00000034
    0xeffff014: 0x00000006
    0xeffff010: 0x003b003a
    0xeffff00c: 0x000a0028
    0xeffff008: 0x00340020
    0xeffff004: 0xc019c000 saved $a5 == libc .got 0xeffff000: 0xeffff068 saved $a3 (corrupted) 0xefffeffc: 0x00000000 saved $a2
    0xefffeff8: 0x00000001 saved $d5
    0xefffeff4: 0xeffff122 saved $d4
    0xefffeff0: 0xeffff11e saved $d3
    0xefffefec: 0x00000000 saved $d2
    0xefffefe8: 0xc00e419a return address __GI___wait4_time64+38 0xefffefe4: 0xc0028780
    0xefffefe0: 0x3c344bfb
    0xefffefdc: 0x000af353
    0xefffefd8: 0x3c340170
    0xefffefd4: 0x00000000
    0xefffefd0: 0xc00e417c
    0xefffefcc: 0xc00e417e
    0xefffefc8: 0xc00e4180
    0xefffefc4: 0x48e73c34
    0xefffefc0: 0x00000000
    0xefffefbc: 0xefffeff8
    0xefffefb8: 0xefffeffc
    0xefffefb4: 0x4bfb0170
    0xefffefb0: 0x0eee0709
    0xefffefac: 0x00000000
    0xefffefa8: 0x00000000
    0xefffefa4: 0x00000000
    0xefffefa0: 0x00000000
    0xefffef9c: 0x00000000
    0xefffef98: 0x00000000
    0xefffef94: 0x00000000
    0xefffef90: 0x00000000
    0xefffef8c: 0x00000000
    0xefffef88: 0x00000000
    0xefffef84: 0x00000000
    0xefffef80: 0x00000000
    0xefffef7c: 0x00000000
    0xefffef78: 0x00000000
    0xefffef74: 0x00000000
    0xefffef70: 0x00000000
    0xefffef6c: 0x00000000
    0xefffef68: 0x00000000
    0xefffef64: 0x00000000
    0xefffef60: 0x00000000
    0xefffef5c: 0x00000000
    0xefffef58: 0x00000000
    0xefffef54: 0x00000000
    0xefffef50: 0x00000000
    0xefffef4c: 0x00000000
    0xefffef48: 0x00000000
    0xefffef44: 0x00000000
    0xefffef40: 0x00000000
    0xefffef3c: 0x00000000
    0xefffef38: 0x00000000
    0xefffef34: 0x00000000
    0xefffef30: 0x00000000
    0xefffef2c: 0x00000000
    0xefffef28: 0x00000000
    0xefffef24: 0x00000000
    0xefffef20: 0x00000000
    0xefffef1c: 0x00000000
    0xefffef18: 0x00000000
    0xefffef14: 0x00000000
    0xefffef10: 0x7c0effff
    0xefffef0c: 0xffffffff
    0xefffef08: 0xaaaaaaaa
    0xefffef04: 0xaf54eaaa
    0xefffef00: 0x40040000
    0xefffeefc: 0x40040000
    0xefffeef8: 0x2b000000
    0xefffeef4: 0x00000000
    0xefffeef0: 0x00000000
    0xefffeeec: 0x408ece9a
    0xefffeee8: 0x00000000
    0xefffeee4: 0xf0ff0000
    0xefffeee0: 0x0f800000
    0xefffeedc: 0xf0fff0ff
    0xefffeed8: 0x1f380000
    0xefffeed4: 0x00000000
    0xefffeed0: 0x00000000
    0xefffeecc: 0x00000000
    0xefffeec8: 0xffffffff
    0xefffeec4: 0xffffffff
    0xefffeec0: 0x7fff0000
    0xefffeebc: 0xffffffff
    0xefffeeb8: 0xffffffff
    0xefffeeb4: 0x7fff0000 sc_formatvec

    The signal frame is not readily apparent (to me).

    From looking at the above stack dump, sc ought to start at 0xefffee90,
    and the trampoline would be three words below that.

    0xefffeeb0: 0x4178b008 sc_pc, sc_formatvec
    0xefffeeac: 0x0008c00e sc_sr, sc_pc
    0xefffeea8: 0xd00223bb sc_a1
    0xefffeea4: 0xd001e32c sc_a0
    0xefffeea0: 0x00000003 sc_d1
    0xefffee9c: 0xeffff11e sc_d0
    0xefffee98: 0xeffff004 sc_usp
    0xefffee94: 0x00000000 sc_mask
    0xefffee90: 0x00000000 extramask
    0xefffee8c: 0xc0024a90 retcode[1]
    0xefffee88: 0x70774e40 retcode[0]
    0xefffee84: 0xefffee94 psc
    0xefffee80: 0x00000008 code
    0xefffee7c: 0x00000011 sig
    0xefffee78: 0xefffee88 pretcode
    0xefffee74: 0xc019c000
    0xefffee70: 0x00000000
    0xefffee6c: 0xc0025878
    0xefffee68: 0xc0007ed4
    0xefffee64: 0xc0024000
    0xefffee60: 0xefffef50
    0xefffee5c: 0xc0024000
    0xefffee58: 0xc002a034
    0xefffee54: 0xc0024a90
    0xefffee50: 0xc0025878
    0xefffee4c: 0x00000001
    0xefffee48: 0x0017f020
    0xefffee44: 0x0000002c
    0xefffee40: 0x0000000f
    0xefffee3c: 0x00000000
    0xefffee38: 0xfffff7fa
    0xefffee34: 0xffffffff
    0xefffee30: 0x00009782
    0xefffee2c: 0x00000000
    0xefffee28: 0x0000001e
    0xefffee24: 0xc0025858
    0xefffee20: 0xc0025af8
    0xefffee1c: 0xc000b376
    0xefffee18: 0xc0024000
    0xefffee14: 0xc0025878
    0xefffee10: 0x0000001d
    0xefffee0c: 0xd0001b60
    0xefffee08: 0x0000002f
    0xefffee04: 0xc002563e
    0xefffee00: 0xc0025490

    The last address you show corresponds to 0xeffff640 in first dump above, which is at the start of the saved fpregs. I'd say we just miss the
    beginning of the signal frame?


    It looks like you're right. I'm not sure how I missed that.

    So when the signal was delivered, PC == 0xc00e4178 and USP == 0xc00e4178.
    Those addresses can be found in the disassembly and the stack contents I
    sent previously (quoted above) and it all seems to line up.

    (My reasoning is that copy_siginfo_to_user clears the end of the signal stack, which is what we can see in both cases.)

    Can't explain the 14 words below the saved return address though.


    Right. Is it sc_fpstate? Perhaps we should expect QEMU to differ here.

    Bottom line is, the corrupted %a3 register would have been saved by the
    MOVEM instruction at 0xc00e4178, which turns out to be the PC in the
    signal frame. So it certainly looks like the kernel was the culprit here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Tue Apr 18 07:30:01 2023
    Hi Finn,

    Am 18.04.2023 um 14:04 schrieb Finn Thain:
    On Tue, 18 Apr 2023, Michael Schmitz wrote:

    Am 16.04.2023 um 18:44 schrieb Finn Thain:


    The backtrace confirms that this signal was delivered during execution
    of __wait3(). (Delivery can happen during execution of __libc_fork()
    but I just repeat the test until I get these ducks in a row.)

    (gdb) c
    Continuing.
    # x=$(:)
    [Detaching after fork from child process 1055]

    Breakpoint 6.1, onsig (signo=17) at trap.c:286
    286 trap.c: No such file or directory.
    (gdb) bt
    #0 onsig (signo=17) at trap.c:286
    #1 <signal handler called>
    #2 0xc00e81b6 in __GI___wait4_time64 (pid=-1, stat_loc=0xeffff86a,
    options=2,
    usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:35
    #3 0xc00e8164 in __GI___wait3_time64 (usage=0x0, options=<optimized out>, >>> stat_loc=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:26

    Where did that one come from? I don't think we saw __GI___wait3_time64
    called in your disassembly of __wait3 ...

    #4 __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:35


    Well spotted. However, it turns out there is a good explanation for that:

    (gdb) print __GI___wait3_time64
    $3 = {pid_t (int *, int, struct __rusage64 *)} 0xc00e4054 <__GI___wait3_time64>
    (gdb) disass __GI___wait3_time64
    Dump of assembler code for function __GI___wait3_time64:
    0xc00e4054 <+0>: movel %sp@(12),%sp@-
    0xc00e4058 <+4>: movel %sp@(12),%sp@-
    0xc00e405c <+8>: movel %sp@(12),%sp@-
    0xc00e4060 <+12>: pea 0xffffffff
    0xc00e4064 <+16>: bsrl 0xc00e4174 <__GI___wait4_time64>
    0xc00e406a <+22>: lea %sp@(16),%sp
    0xc00e406e <+26>: rts
    End of assembler dump.
    (gdb) print __wait3
    $2 = {pid_t (int *, int, struct rusage *)} 0xc00e4070 <__wait3>
    (gdb) disass __wait3
    Dump of assembler code for function __wait3:
    0xc00e4070 <+0>: linkw %fp,#-96
    0xc00e4074 <+4>: moveml %a2-%a3/%a5,%sp@-
    0xc00e4078 <+8>: lea %pc@(0xc019c000),%a5
    0xc00e4080 <+16>: movel %fp@(8),%d0
    0xc00e4084 <+20>: moveal %fp@(16),%a2
    0xc00e4088 <+24>: moveal %a5@(108),%a3
    0xc00e408c <+28>: movel %a3@,%fp@(-4)
    0xc00e4090 <+32>: tstl %a2
    0xc00e4092 <+34>: beqw 0xc00e4152 <__wait3+226>
    0xc00e4096 <+38>: pea %fp@(-92)
    0xc00e409a <+42>: movel %fp@(12),%sp@-
    0xc00e409e <+46>: movel %d0,%sp@-
    0xc00e40a0 <+48>: pea 0xffffffff
    0xc00e40a4 <+52>: bsrl 0xc00e4174 <__GI___wait4_time64>
    0xc00e40aa <+58>: lea %sp@(16),%sp
    0xc00e40ae <+62>: tstl %d0
    0xc00e40b0 <+64>: bgts 0xc00e40c8 <__wait3+88>
    0xc00e40b2 <+66>: moveal %fp@(-4),%a0
    0xc00e40b6 <+70>: movel %a3@,%d1
    0xc00e40b8 <+72>: cmpl %a0,%d1
    0xc00e40ba <+74>: bnew 0xc00e416c <__wait3+252>
    0xc00e40be <+78>: moveml %fp@(-108),%a2-%a3/%a5
    0xc00e40c4 <+84>: unlk %fp
    0xc00e40c6 <+86>: rts
    0xc00e40c8 <+88>: pea 0x44
    0xc00e40cc <+92>: clrl %sp@-
    0xc00e40ce <+94>: pea %a2@(4)
    0xc00e40d2 <+98>: movel %d0,%fp@(-96)
    0xc00e40d6 <+102>: bsrl 0xc00bc850 <__GI_memset>
    0xc00e40dc <+108>: movel %fp@(-88),%a2@
    0xc00e40e0 <+112>: movel %fp@(-80),%a2@(4)
    0xc00e40e6 <+118>: movel %fp@(-72),%a2@(8)
    0xc00e40ec <+124>: movel %fp@(-64),%a2@(12)
    0xc00e40f2 <+130>: movel %fp@(-60),%a2@(16)
    0xc00e40f8 <+136>: movel %fp@(-56),%a2@(20)
    0xc00e40fe <+142>: movel %fp@(-52),%a2@(24)
    0xc00e4104 <+148>: movel %fp@(-48),%a2@(28)
    0xc00e410a <+154>: movel %fp@(-44),%a2@(32)
    0xc00e4110 <+160>: movel %fp@(-40),%a2@(36)
    0xc00e4116 <+166>: movel %fp@(-36),%a2@(40)
    0xc00e411c <+172>: movel %fp@(-32),%a2@(44)
    0xc00e4122 <+178>: movel %fp@(-28),%a2@(48)
    0xc00e4128 <+184>: movel %fp@(-24),%a2@(52)
    0xc00e412e <+190>: movel %fp@(-20),%a2@(56)
    0xc00e4134 <+196>: movel %fp@(-16),%a2@(60)
    0xc00e413a <+202>: movel %fp@(-12),%a2@(64)
    0xc00e4140 <+208>: movel %fp@(-8),%a2@(68)
    0xc00e4146 <+214>: lea %sp@(12),%sp
    0xc00e414a <+218>: movel %fp@(-96),%d0
    0xc00e414e <+222>: braw 0xc00e40b2 <__wait3+66>
    0xc00e4152 <+226>: clrl %sp@-
    0xc00e4154 <+228>: movel %fp@(12),%sp@-
    0xc00e4158 <+232>: movel %d0,%sp@-
    0xc00e415a <+234>: pea 0xffffffff
    0xc00e415e <+238>: bsrl 0xc00e4174 <__GI___wait4_time64>
    0xc00e4164 <+244>: lea %sp@(16),%sp
    0xc00e4168 <+248>: braw 0xc00e40b2 <__wait3+66>
    0xc00e416c <+252>: illegal
    End of assembler dump.


    (gdb) info frame
    Stack level 3, frame at 0xeffff82c:
    pc = 0xc00e8164 in __GI___wait3_time64
    (../sysdeps/unix/sysv/linux/wait3.c:26); saved pc = 0xd000c38e
    inlined into frame 4, caller of frame at 0xeffff7a8
    source language c.
    Arglist at unknown address.
    Locals at unknown address, Previous frame's sp is 0xeffff7a8
    Saved registers:
    d2 at 0xeffff738, d3 at 0xeffff73c, d4 at 0xeffff740, d5 at 0xeffff744,
    a2 at 0xeffff748, a3 at 0xeffff74c, a5 at 0xeffff750, pc at 0xeffff7a4


    (gdb) up
    #4 __wait3 (stat_loc=<optimized out>, options=<optimized out>,
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:35
    35 in ../sysdeps/unix/sysv/linux/wait3.c
    (gdb) info frame
    Stack level 4, frame at 0xeffff82c:
    pc = 0xc00e8164 in __wait3 (../sysdeps/unix/sysv/linux/wait3.c:35);
    saved pc = 0xd000c38e
    called by frame at 0xeffff928, caller of frame at 0xeffff82c
    source language c.
    Arglist at 0xeffff824, args: stat_loc=<optimized out>,
    options=<optimized out>, usage=<optimized out>
    Locals at 0xeffff824, Previous frame's sp is 0xeffff82c
    Saved registers:
    a2 at 0xeffff7b8, a3 at 0xeffff7bc, a5 at 0xeffff7c0, fp at 0xeffff824,
    pc at 0xeffff828


    Note that frame 3 was "inlined into frame 4". The inlined code can be seen above at 0xc00e4154. So the backtrace is misleading inasmuchas it
    represents the source code rather than the disassembly.

    OK then ...


    #5 0xd000c38e in waitproc (status=0xeffff85a, block=1) at jobs.c:1179
    #6 waitone (block=1, job=0xd001f618) at jobs.c:1055
    #7 0xd000c5b8 in dowait (block=1, jp=0xd001f618) at jobs.c:1137
    #8 0xd000ddb0 in waitforjob (jp=0xd001f618) at jobs.c:1014
    #9 0xd000aade in expbackq (flag=68, cmd=0xd001e4c8 <stackbase+36>)
    at expand.c:520
    #10 argstr (p=<optimized out>, flag=68) at expand.c:335
    #11 0xd000b5ce in expandarg (arg=0xd001e4e8 <stackbase+68>,
    arglist=0xeffffb08, flag=4) at expand.c:192
    #12 0xd0007e2a in evalcommand (cmd=<optimized out>, flags=<optimized out>) >>> at eval.c:855
    #13 0xd0006ffc in evaltree (n=0xd001e4f8 <stackbase+84>, flags=0) at
    eval.c:300
    #14 0xd000e3c0 in cmdloop (top=1) at main.c:246
    #15 0xd0005018 in main (argc=<optimized out>, argv=<optimized out>)
    at main.c:181


    0xeffff750: 0xc01a0000 saved $a5 == libc .got
    0xeffff74c: 0xc0023e8c saved $a3 == &__stack_chk_guard
    0xeffff748: 0x00000000 saved $a2
    0xeffff744: 0x00000001 saved $d5
    0xeffff740: 0xeffff86e saved $d4
    0xeffff73c: 0xeffff86a saved $d3
    0xeffff738: 0x00000002 saved $d2
    0xeffff734: 0x00000000
    0xeffff730: 0x00000000
    0xeffff72c: 0x00000000
    0xeffff728: 0x00000000
    0xeffff724: 0x00000000
    0xeffff720: 0x00000000
    0xeffff71c: 0x00000000
    0xeffff718: 0x00000000
    0xeffff714: 0x00000000
    0xeffff710: 0x00000000
    0xeffff70c: 0x00000000
    0xeffff708: 0x00000000
    0xeffff704: 0x00000000
    0xeffff700: 0x00000000
    0xeffff6fc: 0x00000000
    0xeffff6f8: 0x00000000
    0xeffff6f4: 0x00000000
    0xeffff6f0: 0x00000000
    0xeffff6ec: 0x00000000
    0xeffff6e8: 0x00000000
    0xeffff6e4: 0x00000000
    0xeffff6e0: 0x00000000
    0xeffff6dc: 0x00000000
    0xeffff6d8: 0x00000000
    0xeffff6d4: 0x00000000
    0xeffff6d0: 0x00000000
    0xeffff6cc: 0x00000000
    0xeffff6c8: 0x00000000
    0xeffff6c4: 0x00000000
    0xeffff6c0: 0x00000000
    0xeffff6bc: 0x00000000
    0xeffff6b8: 0x00000000
    0xeffff6b4: 0x00000000
    0xeffff6b0: 0x00000000
    0xeffff6ac: 0x00000000
    0xeffff6a8: 0x00000000
    0xeffff6a4: 0x00000000
    0xeffff6a0: 0x00000000
    0xeffff69c: 0x00000000
    0xeffff698: 0x00000000
    0xeffff694: 0x00000000
    0xeffff690: 0x00000000
    0xeffff68c: 0x00000000
    0xeffff688: 0x00000000
    0xeffff684: 0x00000000
    0xeffff680: 0x00000000
    0xeffff67c: 0x00000000
    0xeffff678: 0x00000000
    0xeffff674: 0x00000000
    0xeffff670: 0x00000000
    0xeffff66c: 0x00000000
    0xeffff668: 0x00000000
    0xeffff664: 0x00000000
    0xeffff660: 0x41000000
    0xeffff65c: 0x00000000
    0xeffff658: 0x00000000
    0xeffff654: 0x00000000
    0xeffff650: 0x00000000
    0xeffff64c: 0x80000000
    0xeffff648: 0x3fff0000
    0xeffff644: 0x00000000
    0xeffff640: 0xd0000000
    0xeffff63c: 0x40020000 <= (sc.formatvec & 0xffff) << 16; fpregs from here on
    0xeffff638: 0x81b60080 <= (sc.pc & 0xffff) << 16 | sc.formatvec >> 16
    0xeffff634: 0x0000c00e <= sc.sr << 16 sc.pc >> 16
    0xeffff630: 0xd001e4e3 <= sc.a1
    0xeffff62c: 0xc0028780 <= sc.a0
    0xeffff628: 0xffffffff <= sc.d1
    0xeffff624: 0x0000041f <= sc.d0
    0xeffff620: 0xeffff738 <= sc.usp
    0xeffff61c: 0x00000000 <= sc.mask
    0xeffff618: 0x00000000 <= extramask
    0xeffff614: 0x00000000 <= frame.retcode[1]
    0xeffff610: 0x70774e40 moveq #119,%d0 ; trap #0
    0xeffff60c: 0xeffff61c <= frame->sc
    0xeffff608: 0x00000080 <= tregs->vector
    0xeffff604: 0x00000011 <= signal no.
    0xeffff600: 0xeffff610 return address

    The above comes from dash running under gdb under qemu, which does not
    exhibit the failure but is convenient for that kind of experiment.

    I would have expected to see a different signal trampoline (for
    sys_rt_sigreturn) ...

    Well, this seems to be the trampoline from setup_frame() and not setup_rt_frame().

    According to the manpages I've seen, glibc ought to pick rt signals if
    the kernel supports those (which I suppose it does).


    But anyway:

    The saved pc is 0xc00e81b6 which does match the backtrace above. Vector
    offset 80 matches trap 0 which suggests 0xc00e81b6 should be the
    instruction after a trap 0 instruction. d0 is 1055 which is not a signal
    number I recognize.


    I don't know what d0 represents here. But &frame->sig == 0x11 is correct (SIGCHLD).

    Correct - that all works out. But d0 holds the syscall number when we
    enter the kernel via trap 0, and that one is odd.


    Again as far as I understand, the core dump happens on process exit.
    Stack smashing is detected and process exit is forced only at exit
    from __wait3() or __wait4_time64(),

    I placed an illegal instruction in __wait3. This executes instead of
    the call to __stack_chk_fail because that obliterates stack memory of
    interest.

    OK.


    Consequently the latest core dump still contains dead stack frames
    (see below) of subroutines that returned before __wait3() dumped core.
    You can see the return address for the branch to __wait4_time64() and
    below that you can see the return address for the branch to
    __m68k_read_tp().

    (gdb) disas __wait4_time64
    Dump of assembler code for function __GI___wait4_time64:
    0xc00e4174 <+0>: lea %sp@(-80),%sp
    0xc00e4178 <+4>: moveml %d2-%d5/%a2-%a3/%a5,%sp@-
    0xc00e417c <+8>: lea %pc@(0xc019c000),%a5
    0xc00e4184 <+16>: movel %sp@(116),%d2
    0xc00e4188 <+20>: moveal %sp@(124),%a2
    0xc00e418c <+24>: moveal %a5@(108),%a3
    0xc00e4190 <+28>: movel %a3@,%sp@(104)
    0xc00e4194 <+32>: bsrl 0xc0056e2c <__m68k_read_tp@plt>

    I gather the signal was delivered before __wait4_time64+38, otherwise
    the return address 0xc00e419a (which appears below) would have been
    overwritten by the signal frame. The signal must have been delivered
    after waitproc() initialized gotsigchld = 0 since gotsigchld is 1 at
    the time of the coredump.

    I assume the %a3 corruption happened after __wait4_time64+8 because
    that's when %a3 first appears on the stack. And the corruption must
    have happened before __wait4_time64+238, which is when %a3 was
    restored.

    If it was the signal which somehow corrupted the saved %a3, there's
    only a small window for that. The only syscall in that window is
    get_thread_area.

    I see sys_wait4 called in two places (0xc00e01b4, and then 0xc00e0286
    depending on the return code of the first). The second one again would
    have called __m68k_read_tp so would have left a return address on the
    stack (0xc00e02d2). Leaves the first.


    That's why my analysis stopped at __wait4_time64+38: the rest of __wait4_time64 is not relevant to the dead stack contents. (It would have left a different return address in that memory location.)


    Here's some stack memory from the core dump.

    0xeffff0dc: 0xd000c38e return address waitproc+124
    0xeffff0d8: 0xd001c1ec frame 0 $fp ==
    &suppressint
    0xeffff0d4: 0x00add14b canary
    0xeffff0d0: 0x00000000
    0xeffff0cc: 0x0000000a
    0xeffff0c8: 0x00000202
    0xeffff0c4: 0x00000008
    0xeffff0c0: 0x00000000
    0xeffff0bc: 0x00000000
    0xeffff0b8: 0x00000174
    0xeffff0b4: 0x00000004
    0xeffff0b0: 0x00000004
    0xeffff0ac: 0x00000006
    0xeffff0a8: 0x000000e0
    0xeffff0a4: 0x000000e0
    0xeffff0a0: 0x00171f20
    0xeffff09c: 0x00171f20
    0xeffff098: 0x00171f20
    0xeffff094: 0x00000002
    0xeffff090: 0x00002000
    0xeffff08c: 0x00000006
    0xeffff088: 0x0000e920
    0xeffff084: 0x00005360
    0xeffff080: 0x00170700
    0xeffff07c: 0x00170700
    0xeffff078: 0x00170700 frame 0 $fp - 96
    0xeffff074: 0xd001b874 saved $a5 == dash .got >>> 0xeffff070: 0xd001e498 saved $a3 == &dash_errno >>> 0xeffff06c: 0xd001e718 frame 0 $sp saved $a2 == &gotsigchld >>> 0xeffff068: 0x00000000
    0xeffff064: 0x00000000
    0xeffff060: 0xeffff11e
    0xeffff05c: 0xffffffff
    0xeffff058: 0xc00e4164 return address __wait3+244
    0xeffff054: 0x00add14b canary
    0xeffff050: 0x00000001
    0xeffff04c: 0x00000004
    0xeffff048: 0x0000000d
    0xeffff044: 0x0000000d
    0xeffff040: 0x0015ef82
    0xeffff03c: 0x0015ef82
    0xeffff038: 0x0015ef82
    0xeffff034: 0x00000003
    0xeffff030: 0x00000004
    0xeffff02c: 0x00000004
    0xeffff028: 0x00000140
    0xeffff024: 0x00000140
    0xeffff020: 0x00000034
    0xeffff01c: 0x00000034
    0xeffff018: 0x00000034
    0xeffff014: 0x00000006
    0xeffff010: 0x003b003a
    0xeffff00c: 0x000a0028
    0xeffff008: 0x00340020
    0xeffff004: 0xc019c000 saved $a5 == libc .got
    0xeffff000: 0xeffff068 saved $a3 (corrupted)
    0xefffeffc: 0x00000000 saved $a2
    0xefffeff8: 0x00000001 saved $d5
    0xefffeff4: 0xeffff122 saved $d4
    0xefffeff0: 0xeffff11e saved $d3
    0xefffefec: 0x00000000 saved $d2
    0xefffefe8: 0xc00e419a return address __GI___wait4_time64+38
    0xefffefe4: 0xc0028780
    0xefffefe0: 0x3c344bfb
    0xefffefdc: 0x000af353
    0xefffefd8: 0x3c340170
    0xefffefd4: 0x00000000
    0xefffefd0: 0xc00e417c
    0xefffefcc: 0xc00e417e
    0xefffefc8: 0xc00e4180
    0xefffefc4: 0x48e73c34
    0xefffefc0: 0x00000000
    0xefffefbc: 0xefffeff8
    0xefffefb8: 0xefffeffc
    0xefffefb4: 0x4bfb0170
    0xefffefb0: 0x0eee0709
    0xefffefac: 0x00000000
    0xefffefa8: 0x00000000
    0xefffefa4: 0x00000000
    0xefffefa0: 0x00000000
    0xefffef9c: 0x00000000
    0xefffef98: 0x00000000
    0xefffef94: 0x00000000
    0xefffef90: 0x00000000
    0xefffef8c: 0x00000000
    0xefffef88: 0x00000000
    0xefffef84: 0x00000000
    0xefffef80: 0x00000000
    0xefffef7c: 0x00000000
    0xefffef78: 0x00000000
    0xefffef74: 0x00000000
    0xefffef70: 0x00000000
    0xefffef6c: 0x00000000
    0xefffef68: 0x00000000
    0xefffef64: 0x00000000
    0xefffef60: 0x00000000
    0xefffef5c: 0x00000000
    0xefffef58: 0x00000000
    0xefffef54: 0x00000000
    0xefffef50: 0x00000000
    0xefffef4c: 0x00000000
    0xefffef48: 0x00000000
    0xefffef44: 0x00000000
    0xefffef40: 0x00000000
    0xefffef3c: 0x00000000
    0xefffef38: 0x00000000
    0xefffef34: 0x00000000
    0xefffef30: 0x00000000
    0xefffef2c: 0x00000000
    0xefffef28: 0x00000000
    0xefffef24: 0x00000000
    0xefffef20: 0x00000000
    0xefffef1c: 0x00000000
    0xefffef18: 0x00000000
    0xefffef14: 0x00000000
    0xefffef10: 0x7c0effff
    0xefffef0c: 0xffffffff
    0xefffef08: 0xaaaaaaaa
    0xefffef04: 0xaf54eaaa
    0xefffef00: 0x40040000
    0xefffeefc: 0x40040000
    0xefffeef8: 0x2b000000
    0xefffeef4: 0x00000000
    0xefffeef0: 0x00000000
    0xefffeeec: 0x408ece9a
    0xefffeee8: 0x00000000
    0xefffeee4: 0xf0ff0000
    0xefffeee0: 0x0f800000
    0xefffeedc: 0xf0fff0ff
    0xefffeed8: 0x1f380000
    0xefffeed4: 0x00000000
    0xefffeed0: 0x00000000
    0xefffeecc: 0x00000000
    0xefffeec8: 0xffffffff
    0xefffeec4: 0xffffffff
    0xefffeec0: 0x7fff0000
    0xefffeebc: 0xffffffff
    0xefffeeb8: 0xffffffff
    0xefffeeb4: 0x7fff0000 sc_formatvec

    The signal frame is not readily apparent (to me).

    From looking at the above stack dump, sc ought to start at 0xefffee90,
    and the trampoline would be three words below that.

    0xefffeeb0: 0x4178b008 sc_pc, sc_formatvec
    0xefffeeac: 0x0008c00e sc_sr, sc_pc
    0xefffeea8: 0xd00223bb sc_a1
    0xefffeea4: 0xd001e32c sc_a0
    0xefffeea0: 0x00000003 sc_d1
    0xefffee9c: 0xeffff11e sc_d0
    0xefffee98: 0xeffff004 sc_usp
    0xefffee94: 0x00000000 sc_mask
    0xefffee90: 0x00000000 extramask
    0xefffee8c: 0xc0024a90 retcode[1]
    0xefffee88: 0x70774e40 retcode[0]
    0xefffee84: 0xefffee94 psc
    0xefffee80: 0x00000008 code
    0xefffee7c: 0x00000011 sig
    0xefffee78: 0xefffee88 pretcode

    OK, that's our SIGCHLD. But the signal frame format is odd ...

    Frame format b, vector offset 008. That's a bus error? How does that get
    on the user mode stack?

    0xefffee74: 0xc019c000
    0xefffee70: 0x00000000
    0xefffee6c: 0xc0025878
    0xefffee68: 0xc0007ed4
    0xefffee64: 0xc0024000
    0xefffee60: 0xefffef50
    0xefffee5c: 0xc0024000
    0xefffee58: 0xc002a034
    0xefffee54: 0xc0024a90
    0xefffee50: 0xc0025878
    0xefffee4c: 0x00000001
    0xefffee48: 0x0017f020
    0xefffee44: 0x0000002c
    0xefffee40: 0x0000000f
    0xefffee3c: 0x00000000
    0xefffee38: 0xfffff7fa
    0xefffee34: 0xffffffff
    0xefffee30: 0x00009782
    0xefffee2c: 0x00000000
    0xefffee28: 0x0000001e
    0xefffee24: 0xc0025858
    0xefffee20: 0xc0025af8
    0xefffee1c: 0xc000b376
    0xefffee18: 0xc0024000
    0xefffee14: 0xc0025878
    0xefffee10: 0x0000001d
    0xefffee0c: 0xd0001b60
    0xefffee08: 0x0000002f
    0xefffee04: 0xc002563e
    0xefffee00: 0xc0025490

    The last address you show corresponds to 0xeffff640 in first dump above,
    which is at the start of the saved fpregs. I'd say we just miss the
    beginning of the signal frame?


    It looks like you're right. I'm not sure how I missed that.

    So when the signal was delivered, PC == 0xc00e4178 and USP == 0xc00e4178.

    USP is 0xeffff004 AFAICS. That's the location 15 was saved to above
    (holding libc .got according to your interpretation).

    The saved PC is that from the exception frame, in this case a long bus
    error sequence fault frame. The PC is that of the instruction executing
    when the fault occurred. As you say, that's the moveml saving registers
    to the stack.

    I don't believe the whole fault frame is on the signal stack in one
    contiguous piece, just the first four words, then we have struct
    sigcontext. But after that, the extra contents follows, and that nicely explains the extra bits right below the return address from the
    __m68k_read_tp call.

    Those addresses can be found in the disassembly and the stack contents I
    sent previously (quoted above) and it all seems to line up.

    (My reasoning is that copy_siginfo_to_user clears the end of the signal
    stack, which is what we can see in both cases.)

    Can't explain the 14 words below the saved return address though.


    Right. Is it sc_fpstate? Perhaps we should expect QEMU to differ here.

    See above - I think what's stored there is the extra frame content for a
    format b bus error frame. But that extra frame is incomplete at best
    (should be 22 longwords, only a4 are seen). Probably overwritten by the
    stack frame from __GI___wait4_time64.

    Let's parse what's left:
    <=
    0xefffefe4: 0xc0028780 <= internal registers (6x)
    0xefffefe0: 0x3c344bfb <=
    0xefffefdc: 0x000af353 <=
    0xefffefd8: 0x3c340170 <= internal reg; version no.
    0xefffefd4: 0x00000000 <= data input buffer
    0xefffefd0: 0xc00e417c <= internal registers (2x)
    0xefffefcc: 0xc00e417e <= stage b address
    0xefffefc8: 0xc00e4180 <= internal registers (4x)
    0xefffefc4: 0x48e73c34 <=
    0xefffefc0: 0x00000000 <= data output buffer
    0xefffefbc: 0xefffeff8 <= internal registers (2x)
    0xefffefb8: 0xefffeffc <= data fault address
    0xefffefb4: 0x4bfb0170 <= ins stage c, stage b
    0xefffefb0: 0x0eee0709 <= internal register; ssw

    The fault address is the location on the stack where a2 is saved. That
    does match the data output buffer contents BTW. fc, fb, rc, rb bits
    clear means the fault didn't occur in stage b or c instructions. ssw bit
    8 set indicates a data fault - the data cycle should be rerun on rte. rm
    and rw bits clear tell us it's a write fault. If the moveml instruction
    copies registers to the stack in descending order, the fault address
    makes sense - the stack pointer just crossed a page boundary.


    Bottom line is, the corrupted %a3 register would have been saved by the
    MOVEM instruction at 0xc00e4178, which turns out to be the PC in the
    signal frame. So it certainly looks like the kernel was the culprit here.

    I think the moveml instruction did cause a bus error, and on return from
    that exception the signal got delivered.

    On entering the buserror handler, only a1 and a2 are saved, but the
    comment in entry.h states that a3-a6 and d6, d7 are preserved by C code.
    After buserr_c returns, a3 should be restored to what it was when taking
    the bus error. All registers restored before rte, the moveml instruction
    ought to be able to resume normally.

    Unless that register use constraint has changed, I don't see how a3
    could have changed midway during return from the bus error exception.
    But maybe a disassembly of buserr_c from your kernel could confirm that?

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Wed Apr 19 04:10:01 2023
    On Tue, 18 Apr 2023, Michael Schmitz wrote:

    Am 18.04.2023 um 14:04 schrieb Finn Thain:
    On Tue, 18 Apr 2023, Michael Schmitz wrote:
    Am 16.04.2023 um 18:44 schrieb Finn Thain:

    0xeffff750: 0xc01a0000 saved $a5 == libc .got >>> 0xeffff74c: 0xc0023e8c saved $a3 == &__stack_chk_guard
    0xeffff748: 0x00000000 saved $a2
    0xeffff744: 0x00000001 saved $d5
    0xeffff740: 0xeffff86e saved $d4
    0xeffff73c: 0xeffff86a saved $d3
    0xeffff738: 0x00000002 saved $d2
    0xeffff734: 0x00000000
    0xeffff730: 0x00000000
    0xeffff72c: 0x00000000
    0xeffff728: 0x00000000
    0xeffff724: 0x00000000
    0xeffff720: 0x00000000
    0xeffff71c: 0x00000000
    0xeffff718: 0x00000000
    0xeffff714: 0x00000000
    0xeffff710: 0x00000000
    0xeffff70c: 0x00000000
    0xeffff708: 0x00000000
    0xeffff704: 0x00000000
    0xeffff700: 0x00000000
    0xeffff6fc: 0x00000000
    0xeffff6f8: 0x00000000
    0xeffff6f4: 0x00000000
    0xeffff6f0: 0x00000000
    0xeffff6ec: 0x00000000
    0xeffff6e8: 0x00000000
    0xeffff6e4: 0x00000000
    0xeffff6e0: 0x00000000
    0xeffff6dc: 0x00000000
    0xeffff6d8: 0x00000000
    0xeffff6d4: 0x00000000
    0xeffff6d0: 0x00000000
    0xeffff6cc: 0x00000000
    0xeffff6c8: 0x00000000
    0xeffff6c4: 0x00000000
    0xeffff6c0: 0x00000000
    0xeffff6bc: 0x00000000
    0xeffff6b8: 0x00000000
    0xeffff6b4: 0x00000000
    0xeffff6b0: 0x00000000
    0xeffff6ac: 0x00000000
    0xeffff6a8: 0x00000000
    0xeffff6a4: 0x00000000
    0xeffff6a0: 0x00000000
    0xeffff69c: 0x00000000
    0xeffff698: 0x00000000
    0xeffff694: 0x00000000
    0xeffff690: 0x00000000
    0xeffff68c: 0x00000000
    0xeffff688: 0x00000000
    0xeffff684: 0x00000000
    0xeffff680: 0x00000000
    0xeffff67c: 0x00000000
    0xeffff678: 0x00000000
    0xeffff674: 0x00000000
    0xeffff670: 0x00000000
    0xeffff66c: 0x00000000
    0xeffff668: 0x00000000
    0xeffff664: 0x00000000
    0xeffff660: 0x41000000
    0xeffff65c: 0x00000000
    0xeffff658: 0x00000000
    0xeffff654: 0x00000000
    0xeffff650: 0x00000000
    0xeffff64c: 0x80000000
    0xeffff648: 0x3fff0000
    0xeffff644: 0x00000000
    0xeffff640: 0xd0000000
    0xeffff63c: 0x40020000 <= (sc.formatvec & 0xffff) << 16; fpregs from here on
    0xeffff638: 0x81b60080 <= (sc.pc & 0xffff) << 16 | sc.formatvec >> 16
    0xeffff634: 0x0000c00e <= sc.sr << 16 sc.pc >> 16
    0xeffff630: 0xd001e4e3 <= sc.a1
    0xeffff62c: 0xc0028780 <= sc.a0
    0xeffff628: 0xffffffff <= sc.d1
    0xeffff624: 0x0000041f <= sc.d0
    0xeffff620: 0xeffff738 <= sc.usp
    0xeffff61c: 0x00000000 <= sc.mask
    0xeffff618: 0x00000000 <= extramask
    0xeffff614: 0x00000000 <= frame.retcode[1]
    0xeffff610: 0x70774e40 moveq #119,%d0 ; trap #0
    0xeffff60c: 0xeffff61c <= frame->sc
    0xeffff608: 0x00000080 <= tregs->vector
    0xeffff604: 0x00000011 <= signal no.
    0xeffff600: 0xeffff610 return address

    The above comes from dash running under gdb under qemu, which does
    not exhibit the failure but is convenient for that kind of
    experiment.

    I would have expected to see a different signal trampoline (for
    sys_rt_sigreturn) ...

    Well, this seems to be the trampoline from setup_frame() and not setup_rt_frame().

    According to the manpages I've seen, glibc ought to pick rt signals if
    the kernel supports those (which I suppose it does).


    It's got to be the trampoline from setup_frame() because dash did this:

    act.sa_flags = 0;
    sigfillset(&act.sa_mask);
    sigaction(signo, &act, 0);

    and the kernel did this:

    /* set up the stack frame */
    if (ksig->ka.sa.sa_flags & SA_SIGINFO)
    err = setup_rt_frame(ksig, oldset, regs);
    else
    err = setup_frame(ksig, oldset, regs);


    But anyway:

    The saved pc is 0xc00e81b6 which does match the backtrace above.
    Vector offset 80 matches trap 0 which suggests 0xc00e81b6 should be
    the instruction after a trap 0 instruction. d0 is 1055 which is not a
    signal number I recognize.


    I don't know what d0 represents here. But &frame->sig == 0x11 is
    correct (SIGCHLD).

    Correct - that all works out. But d0 holds the syscall number when we
    enter the kernel via trap 0, and that one is odd.


    Well, you showed subsequently that the kernel was probably entered via a
    page fault and not the get_thread_area trap. Would that explain the d0
    value?

    ...

    Here's some stack memory from the core dump.

    0xeffff0dc: 0xd000c38e return address waitproc+124
    0xeffff0d8: 0xd001c1ec frame 0 $fp == &suppressint >>> 0xeffff0d4: 0x00add14b canary
    0xeffff0d0: 0x00000000
    0xeffff0cc: 0x0000000a
    0xeffff0c8: 0x00000202
    0xeffff0c4: 0x00000008
    0xeffff0c0: 0x00000000
    0xeffff0bc: 0x00000000
    0xeffff0b8: 0x00000174
    0xeffff0b4: 0x00000004
    0xeffff0b0: 0x00000004
    0xeffff0ac: 0x00000006
    0xeffff0a8: 0x000000e0
    0xeffff0a4: 0x000000e0
    0xeffff0a0: 0x00171f20
    0xeffff09c: 0x00171f20
    0xeffff098: 0x00171f20
    0xeffff094: 0x00000002
    0xeffff090: 0x00002000
    0xeffff08c: 0x00000006
    0xeffff088: 0x0000e920
    0xeffff084: 0x00005360
    0xeffff080: 0x00170700
    0xeffff07c: 0x00170700
    0xeffff078: 0x00170700 frame 0 $fp - 96
    0xeffff074: 0xd001b874 saved $a5 == dash .got >>> 0xeffff070: 0xd001e498 saved $a3 == &dash_errno
    0xeffff06c: 0xd001e718 frame 0 $sp saved $a2 == &gotsigchld
    0xeffff068: 0x00000000
    0xeffff064: 0x00000000
    0xeffff060: 0xeffff11e
    0xeffff05c: 0xffffffff
    0xeffff058: 0xc00e4164 return address __wait3+244
    0xeffff054: 0x00add14b canary
    0xeffff050: 0x00000001
    0xeffff04c: 0x00000004
    0xeffff048: 0x0000000d
    0xeffff044: 0x0000000d
    0xeffff040: 0x0015ef82
    0xeffff03c: 0x0015ef82
    0xeffff038: 0x0015ef82
    0xeffff034: 0x00000003
    0xeffff030: 0x00000004
    0xeffff02c: 0x00000004
    0xeffff028: 0x00000140
    0xeffff024: 0x00000140
    0xeffff020: 0x00000034
    0xeffff01c: 0x00000034
    0xeffff018: 0x00000034
    0xeffff014: 0x00000006
    0xeffff010: 0x003b003a
    0xeffff00c: 0x000a0028
    0xeffff008: 0x00340020
    0xeffff004: 0xc019c000 saved $a5 == libc .got >>> 0xeffff000: 0xeffff068 saved $a3 (corrupted)
    0xefffeffc: 0x00000000 saved $a2
    0xefffeff8: 0x00000001 saved $d5
    0xefffeff4: 0xeffff122 saved $d4
    0xefffeff0: 0xeffff11e saved $d3
    0xefffefec: 0x00000000 saved $d2
    0xefffefe8: 0xc00e419a return address __GI___wait4_time64+38
    0xefffefe4: 0xc0028780
    0xefffefe0: 0x3c344bfb
    0xefffefdc: 0x000af353
    0xefffefd8: 0x3c340170
    0xefffefd4: 0x00000000
    0xefffefd0: 0xc00e417c
    0xefffefcc: 0xc00e417e
    0xefffefc8: 0xc00e4180
    0xefffefc4: 0x48e73c34
    0xefffefc0: 0x00000000
    0xefffefbc: 0xefffeff8
    0xefffefb8: 0xefffeffc
    0xefffefb4: 0x4bfb0170
    0xefffefb0: 0x0eee0709
    0xefffefac: 0x00000000
    0xefffefa8: 0x00000000
    0xefffefa4: 0x00000000
    0xefffefa0: 0x00000000
    0xefffef9c: 0x00000000
    0xefffef98: 0x00000000
    0xefffef94: 0x00000000
    0xefffef90: 0x00000000
    0xefffef8c: 0x00000000
    0xefffef88: 0x00000000
    0xefffef84: 0x00000000
    0xefffef80: 0x00000000
    0xefffef7c: 0x00000000
    0xefffef78: 0x00000000
    0xefffef74: 0x00000000
    0xefffef70: 0x00000000
    0xefffef6c: 0x00000000
    0xefffef68: 0x00000000
    0xefffef64: 0x00000000
    0xefffef60: 0x00000000
    0xefffef5c: 0x00000000
    0xefffef58: 0x00000000
    0xefffef54: 0x00000000
    0xefffef50: 0x00000000
    0xefffef4c: 0x00000000
    0xefffef48: 0x00000000
    0xefffef44: 0x00000000
    0xefffef40: 0x00000000
    0xefffef3c: 0x00000000
    0xefffef38: 0x00000000
    0xefffef34: 0x00000000
    0xefffef30: 0x00000000
    0xefffef2c: 0x00000000
    0xefffef28: 0x00000000
    0xefffef24: 0x00000000
    0xefffef20: 0x00000000
    0xefffef1c: 0x00000000
    0xefffef18: 0x00000000
    0xefffef14: 0x00000000
    0xefffef10: 0x7c0effff
    0xefffef0c: 0xffffffff
    0xefffef08: 0xaaaaaaaa
    0xefffef04: 0xaf54eaaa
    0xefffef00: 0x40040000
    0xefffeefc: 0x40040000
    0xefffeef8: 0x2b000000
    0xefffeef4: 0x00000000
    0xefffeef0: 0x00000000
    0xefffeeec: 0x408ece9a
    0xefffeee8: 0x00000000
    0xefffeee4: 0xf0ff0000
    0xefffeee0: 0x0f800000
    0xefffeedc: 0xf0fff0ff
    0xefffeed8: 0x1f380000
    0xefffeed4: 0x00000000
    0xefffeed0: 0x00000000
    0xefffeecc: 0x00000000
    0xefffeec8: 0xffffffff
    0xefffeec4: 0xffffffff
    0xefffeec0: 0x7fff0000
    0xefffeebc: 0xffffffff
    0xefffeeb8: 0xffffffff
    0xefffeeb4: 0x7fff0000 sc_formatvec

    The signal frame is not readily apparent (to me).

    From looking at the above stack dump, sc ought to start at 0xefffee90,
    and the trampoline would be three words below that.

    0xefffeeb0: 0x4178b008 sc_pc, sc_formatvec
    0xefffeeac: 0x0008c00e sc_sr, sc_pc
    0xefffeea8: 0xd00223bb sc_a1
    0xefffeea4: 0xd001e32c sc_a0
    0xefffeea0: 0x00000003 sc_d1
    0xefffee9c: 0xeffff11e sc_d0
    0xefffee98: 0xeffff004 sc_usp
    0xefffee94: 0x00000000 sc_mask
    0xefffee90: 0x00000000 extramask
    0xefffee8c: 0xc0024a90 retcode[1]
    0xefffee88: 0x70774e40 retcode[0]
    0xefffee84: 0xefffee94 psc
    0xefffee80: 0x00000008 code
    0xefffee7c: 0x00000011 sig
    0xefffee78: 0xefffee88 pretcode

    OK, that's our SIGCHLD. But the signal frame format is odd ...

    Frame format b, vector offset 008. That's a bus error?
    How does that get on the user mode stack?

    0xefffee74: 0xc019c000
    0xefffee70: 0x00000000
    0xefffee6c: 0xc0025878
    0xefffee68: 0xc0007ed4
    0xefffee64: 0xc0024000
    0xefffee60: 0xefffef50
    0xefffee5c: 0xc0024000
    0xefffee58: 0xc002a034
    0xefffee54: 0xc0024a90
    0xefffee50: 0xc0025878
    0xefffee4c: 0x00000001
    0xefffee48: 0x0017f020
    0xefffee44: 0x0000002c
    0xefffee40: 0x0000000f
    0xefffee3c: 0x00000000
    0xefffee38: 0xfffff7fa
    0xefffee34: 0xffffffff
    0xefffee30: 0x00009782
    0xefffee2c: 0x00000000
    0xefffee28: 0x0000001e
    0xefffee24: 0xc0025858
    0xefffee20: 0xc0025af8
    0xefffee1c: 0xc000b376
    0xefffee18: 0xc0024000
    0xefffee14: 0xc0025878
    0xefffee10: 0x0000001d
    0xefffee0c: 0xd0001b60
    0xefffee08: 0x0000002f
    0xefffee04: 0xc002563e
    0xefffee00: 0xc0025490

    The last address you show corresponds to 0xeffff640 in first dump
    above, which is at the start of the saved fpregs. I'd say we just
    miss the beginning of the signal frame?


    It looks like you're right. I'm not sure how I missed that.

    So when the signal was delivered, PC == 0xc00e4178 and USP ==
    0xc00e4178.

    USP is 0xeffff004 AFAICS. That's the location 15 was saved to above
    (holding libc .got according to your interpretation).


    Right, it was a typo. USP is 0xeffff004, where a5 is to be saved.

    The saved PC is that from the exception frame, in this case a long bus
    error sequence fault frame. The PC is that of the instruction executing
    when the fault occurred. As you say, that's the moveml saving registers
    to the stack.

    I don't believe the whole fault frame is on the signal stack in one contiguous piece, just the first four words, then we have struct
    sigcontext. But after that, the extra contents follows, and that nicely explains the extra bits right below the return address from the __m68k_read_tp call.

    Those addresses can be found in the disassembly and the stack contents
    I sent previously (quoted above) and it all seems to line up.

    (My reasoning is that copy_siginfo_to_user clears the end of the
    signal stack, which is what we can see in both cases.)

    Can't explain the 14 words below the saved return address though.


    Right. Is it sc_fpstate? Perhaps we should expect QEMU to differ here.

    See above - I think what's stored there is the extra frame content for a format b bus error frame. But that extra frame is incomplete at best
    (should be 22 longwords, only a4 are seen). Probably overwritten by the
    stack frame from __GI___wait4_time64.


    Maybe the exception frame leaked onto the user stack via setup_frame()?

    Let's parse what's left:
    <=
    0xefffefe4: 0xc0028780 <= internal registers (6x)
    0xefffefe0: 0x3c344bfb <=
    0xefffefdc: 0x000af353 <=
    0xefffefd8: 0x3c340170 <= internal reg; version no.
    0xefffefd4: 0x00000000 <= data input buffer
    0xefffefd0: 0xc00e417c <= internal registers (2x)
    0xefffefcc: 0xc00e417e <= stage b address
    0xefffefc8: 0xc00e4180 <= internal registers (4x)
    0xefffefc4: 0x48e73c34 <=
    0xefffefc0: 0x00000000 <= data output buffer
    0xefffefbc: 0xefffeff8 <= internal registers (2x)
    0xefffefb8: 0xefffeffc <= data fault address
    0xefffefb4: 0x4bfb0170 <= ins stage c, stage b
    0xefffefb0: 0x0eee0709 <= internal register; ssw

    The fault address is the location on the stack where a2 is saved. That
    does match the data output buffer contents BTW. fc, fb, rc, rb bits
    clear means the fault didn't occur in stage b or c instructions. ssw bit
    8 set indicates a data fault - the data cycle should be rerun on rte. rm
    and rw bits clear tell us it's a write fault. If the moveml instruction copies registers to the stack in descending order, the fault address
    makes sense - the stack pointer just crossed a page boundary.


    Well spotted!


    Bottom line is, the corrupted %a3 register would have been saved by
    the MOVEM instruction at 0xc00e4178, which turns out to be the PC in
    the signal frame. So it certainly looks like the kernel was the
    culprit here.

    I think the moveml instruction did cause a bus error, and on return from
    that exception the signal got delivered.


    Maybe the signal frame was partially overwritten by the resumed MOVEM?

    I wonder what we'd see if we patched the kernel to log every user data
    write fault caused by a MOVEM instruction. I'll try to code that up.

    On entering the buserror handler, only a1 and a2 are saved, but the
    comment in entry.h states that a3-a6 and d6, d7 are preserved by C code. After buserr_c returns, a3 should be restored to what it was when taking
    the bus error. All registers restored before rte, the moveml instruction ought to be able to resume normally.

    Unless that register use constraint has changed, I don't see how a3
    could have changed midway during return from the bus error exception.
    But maybe a disassembly of buserr_c from your kernel could confirm that?


    I disassembled the relevant build. AFAICT, buserr_c() saves and restores
    those registers in the right places.

    BTW, I've reproduced the failures with kernels built with both GCC 12 and
    GCC 6.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Wed Apr 19 10:20:01 2023
    Hi Finn,

    Am 19.04.2023 um 13:50 schrieb Finn Thain:
    I would have expected to see a different signal trampoline (for
    sys_rt_sigreturn) ...

    Well, this seems to be the trampoline from setup_frame() and not
    setup_rt_frame().

    According to the manpages I've seen, glibc ought to pick rt signals if
    the kernel supports those (which I suppose it does).


    It's got to be the trampoline from setup_frame() because dash did this:

    act.sa_flags = 0;
    sigfillset(&act.sa_mask);
    sigaction(signo, &act, 0);

    Ah - dash explicitly requests the old format. Make sense then.

    and the kernel did this:

    /* set up the stack frame */
    if (ksig->ka.sa.sa_flags & SA_SIGINFO)
    err = setup_rt_frame(ksig, oldset, regs);
    else
    err = setup_frame(ksig, oldset, regs);


    But anyway:

    The saved pc is 0xc00e81b6 which does match the backtrace above.
    Vector offset 80 matches trap 0 which suggests 0xc00e81b6 should be
    the instruction after a trap 0 instruction. d0 is 1055 which is not a
    signal number I recognize.


    I don't know what d0 represents here. But &frame->sig == 0x11 is
    correct (SIGCHLD).

    Correct - that all works out. But d0 holds the syscall number when we
    enter the kernel via trap 0, and that one is odd.


    Well, you showed subsequently that the kernel was probably entered via a
    page fault and not the get_thread_area trap. Would that explain the d0
    value?

    That d0 was from the dash under gdb run. But I got my signal delivery
    mixed up - d0 is only expected to hold the syscall number when we issue
    a syscall. That would be in the child process, not the parent which we
    debug.

    d0 is just whatever the parent had in its register when it started do_signal_return after exception or syscall. On return after syscall, d0
    holds the task info flags, maybe that's what we see here.

    See above - I think what's stored there is the extra frame content for a
    format b bus error frame. But that extra frame is incomplete at best
    (should be 22 longwords, only a4 are seen). Probably overwritten by the
    stack frame from __GI___wait4_time64.


    Maybe the exception frame leaked onto the user stack via setup_frame()?

    Yes, for exception frames larger than four words the excess is copied
    after the end of the sigcontext block.


    Let's parse what's left:
    <=
    0xefffefe4: 0xc0028780 <= internal registers (6x)
    0xefffefe0: 0x3c344bfb <=
    0xefffefdc: 0x000af353 <=
    0xefffefd8: 0x3c340170 <= internal reg; version no.
    0xefffefd4: 0x00000000 <= data input buffer
    0xefffefd0: 0xc00e417c <= internal registers (2x)
    0xefffefcc: 0xc00e417e <= stage b address
    0xefffefc8: 0xc00e4180 <= internal registers (4x)
    0xefffefc4: 0x48e73c34 <=
    0xefffefc0: 0x00000000 <= data output buffer
    0xefffefbc: 0xefffeff8 <= internal registers (2x)
    0xefffefb8: 0xefffeffc <= data fault address
    0xefffefb4: 0x4bfb0170 <= ins stage c, stage b
    0xefffefb0: 0x0eee0709 <= internal register; ssw

    The fault address is the location on the stack where a2 is saved. That
    does match the data output buffer contents BTW. fc, fb, rc, rb bits
    clear means the fault didn't occur in stage b or c instructions. ssw bit
    8 set indicates a data fault - the data cycle should be rerun on rte. rm
    and rw bits clear tell us it's a write fault. If the moveml instruction
    copies registers to the stack in descending order, the fault address
    makes sense - the stack pointer just crossed a page boundary.


    Well spotted!


    Bottom line is, the corrupted %a3 register would have been saved by
    the MOVEM instruction at 0xc00e4178, which turns out to be the PC in
    the signal frame. So it certainly looks like the kernel was the
    culprit here.

    I think the moveml instruction did cause a bus error, and on return from
    that exception the signal got delivered.


    Maybe the signal frame was partially overwritten by the resumed MOVEM?

    That's possible - the saved usp in the signal frame is that of the first register saved to the stack (before the page fault).

    I wonder what we'd see if we patched the kernel to log every user data
    write fault caused by a MOVEM instruction. I'll try to code that up.

    If these instructions did always cause stack corruption on 030, I think
    we would have noticed long ago?


    On entering the buserror handler, only a1 and a2 are saved, but the
    comment in entry.h states that a3-a6 and d6, d7 are preserved by C code.
    After buserr_c returns, a3 should be restored to what it was when taking
    the bus error. All registers restored before rte, the moveml instruction
    ought to be able to resume normally.

    Unless that register use constraint has changed, I don't see how a3
    could have changed midway during return from the bus error exception.
    But maybe a disassembly of buserr_c from your kernel could confirm that?


    I disassembled the relevant build. AFAICT, buserr_c() saves and restores those registers in the right places.

    BTW, I've reproduced the failures with kernels built with both GCC 12 and
    GCC 6.

    Thanks - that was highly unlikely but had to be checked.

    Leaves the possibility that some kernel bug did corrupt the saved a3
    copy in struct switch_stack... but that is not used in bus error
    exceptions. And the only other use of a3 is in ret_from_kernel_thread
    which is called only from copy_thread() ...

    Still baffled...

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Wed Apr 19 13:10:01 2023
    On Tue, 18 Apr 2023, Michael Schmitz wrote:


    ... I think what's stored there is the extra frame content for a format
    b bus error frame. But that extra frame is incomplete at best (should be
    22 longwords, only a4 are seen). Probably overwritten by the stack frame
    from __GI___wait4_time64.

    Let's parse what's left:
    <=
    0xefffefe4: 0xc0028780 <= internal registers (6x)
    0xefffefe0: 0x3c344bfb <=
    0xefffefdc: 0x000af353 <=
    0xefffefd8: 0x3c340170 <= internal reg; version no. >>> 0xefffefd4: 0x00000000 <= data input buffer
    0xefffefd0: 0xc00e417c <= internal registers (2x)
    0xefffefcc: 0xc00e417e <= stage b address
    0xefffefc8: 0xc00e4180 <= internal registers (4x)
    0xefffefc4: 0x48e73c34 <=
    0xefffefc0: 0x00000000 <= data output buffer
    0xefffefbc: 0xefffeff8 <= internal registers (2x)
    0xefffefb8: 0xefffeffc <= data fault address
    0xefffefb4: 0x4bfb0170 <= ins stage c, stage b
    0xefffefb0: 0x0eee0709 <= internal register; ssw

    The fault address is the location on the stack where a2 is saved. That
    does match the data output buffer contents BTW. fc, fb, rc, rb bits
    clear means the fault didn't occur in stage b or c instructions. ssw bit
    8 set indicates a data fault - the data cycle should be rerun on rte. rm
    and rw bits clear tell us it's a write fault. If the moveml instruction copies registers to the stack in descending order, the fault address
    makes sense - the stack pointer just crossed a page boundary.


    Inspired by your observation about the page fault and stack growth, I
    wrote a small test program (given below) that just pushes registers onto
    the stack recursively while forking processes and collecting the SIGCHLD signals.

    On a Motorola '030 the stack grows to about 7 MiB before it gets
    corrupted. The program detects the stack corruption and terminates
    immediately with an illegal instruction. Oddly, the program never detects
    any stack corruption when run on the QEMU '040.

    root@debian:~# ./movem
    Illegal instruction
    root@debian:~# ulimit -a
    real-time non-blocking time (microseconds, -R) unlimited
    core file size (blocks, -c) 0
    data seg size (kbytes, -d) unlimited
    scheduling priority (-e) 0
    file size (blocks, -f) unlimited
    pending signals (-i) 242
    max locked memory (kbytes, -l) 8192
    max memory size (kbytes, -m) unlimited
    open files (-n) 1024
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    real-time priority (-r) 0
    stack size (kbytes, -s) 8192
    cpu time (seconds, -t) unlimited
    max user processes (-u) 242
    virtual memory (kbytes, -v) unlimited
    file locks (-x) unlimited
    root@debian:~# ulimit -s 7200
    root@debian:~# ./movem
    Illegal instruction
    root@debian:~# ulimit -s 7000
    root@debian:~# ./movem
    Segmentation fault
    root@debian:~# ulimit -s 16384
    root@debian:~# ./movem
    Illegal instruction
    root@debian:~#

    Looking at the core dump in gdb, the backtrace has 189869 frames. The dead stack frames confirm the recursion depth reached the limit I set at 200000 before the stack began to reduce again. This was also confirmed by the
    lowest page fault address that was logged by the custom kernel.

    That means validation succeeded 200000 - 189869 == 10131 times before it encountered corruption (I should try to figure out whether this varies).

    The registers %a2, %a3 and %a4 below should contain 0x91929394, 0xa1a2a3a4
    and 0xb1b2b3b4 respectively. But they don't. Their values were restored
    from a corrupted stack by the returning rec() function call.

    (gdb) info reg
    d0 0x91929394 -1852664940
    d1 0xf3 243
    d2 0xd1d2d3d4 -774712364
    d3 0xe1e2e3e4 -505224220
    d4 0xf1f2f3f4 -235736076
    d5 0x80003f0c -2147467508
    d6 0xd014c528 -803945176
    d7 0x0 0
    a0 0xc0021708 0xc0021708
    a1 0xc0023e8c 0xc0023e8c <__stack_chk_guard>
    a2 0xf3 0xf3
    a3 0x1464000 0x1464000
    a4 0xef97bf44 0xef97bf44
    a5 0xc1c2c3c4 0xc1c2c3c4
    fp 0xef97b034 0xef97b034
    sp 0xef97b018 0xef97b018
    ps 0x8 [ N ]
    pc 0x800005f6 0x800005f6 <rec+262>
    fpcontrol 0x0 0
    fpstatus 0x0 0
    fpiaddr 0x0 0x0

    (gdb) x/z $sp - 36
    0xef97aff4: 0xd1d2d3d4
    (gdb)
    0xef97aff8: 0xe1e2e3e4
    (gdb)
    0xef97affc: 0xf1f2f3f4
    (gdb)
    0xef97b000: 0x000000f3
    (gdb)
    0xef97b004: 0x01464000
    (gdb)
    0xef97b008: 0xef97bf44
    (gdb)
    0xef97b00c: 0xc1c2c3c4
    (gdb)
    0xef97b010: 0xef97b034
    (gdb)
    0xef97b014: 0x8000055c

    As with dash, the corruption lies the page boundary.

    Any signal frames or exception frames have been completely overwritten
    because the recursion continued after the corruption took place. So
    there's not much to see in the core dump.

    (gdb) disass rec
    Dump of assembler code for function rec:
    0x800004f0 <+0>: linkw %fp,#0
    0x800004f4 <+4>: moveml %d2-%d4/%a2-%a5,%sp@-
    0x800004f8 <+8>: moveal 0x80000672 <i0>,%a2
    0x800004fe <+14>: moveal 0x80000676 <i1>,%a3
    0x80000504 <+20>: moveal 0x8000067a <i2>,%a4
    0x8000050a <+26>: moveal 0x8000067e <i3>,%a5
    0x80000510 <+32>: movel 0x80000682 <i4>,%d2
    0x80000516 <+38>: movel 0x80000686 <i5>,%d3
    0x8000051c <+44>: movel 0x8000068a <i6>,%d4
    0x80000522 <+50>: movel 0x80004034 <depth>,%d0
    0x80000528 <+56>: andil #2047,%d0
    0x8000052e <+62>: bnes 0x80000542 <rec+82>
    0x80000530 <+64>: jsr 0x8000042c <fork@plt>
    0x80000536 <+70>: tstl %d0
    0x80000538 <+72>: bnes 0x80000542 <rec+82>
    0x8000053a <+74>: clrl %sp@-
    0x8000053c <+76>: jsr 0x80000404 <exit@plt>
    0x80000542 <+82>: movel 0x80004034 <depth>,%d0
    0x80000548 <+88>: subql #1,%d0
    0x8000054a <+90>: movel %d0,0x80004034 <depth>
    0x80000550 <+96>: movel 0x80004034 <depth>,%d0
    0x80000556 <+102>: beqs 0x8000055c <rec+108>
    0x80000558 <+104>: jsr %pc@(0x800004f0 <rec>)
    0x8000055c <+108>: movel %a2,0x8000403c <o0>
    0x80000562 <+114>: movel %a3,0x80004040 <o1>
    0x80000568 <+120>: movel %a4,0x80004044 <o2>
    0x8000056e <+126>: movel %a5,0x80004048 <o3>
    0x80000574 <+132>: movel %d2,0x8000404c <o4>
    0x8000057a <+138>: movel %d3,0x80004050 <o5>
    0x80000580 <+144>: movel %d4,0x80004054 <o6>
    0x80000586 <+150>: movel 0x8000403c <o0>,%d1
    0x8000058c <+156>: movel #-1852664940,%d0
    0x80000592 <+162>: cmpl %d1,%d0
    0x80000594 <+164>: bnes 0x800005f6 <rec+262>
    0x80000596 <+166>: movel 0x80004040 <o1>,%d1
    0x8000059c <+172>: movel #-1583176796,%d0
    0x800005a2 <+178>: cmpl %d1,%d0
    0x800005a4 <+180>: bnes 0x800005f6 <rec+262>
    0x800005a6 <+182>: movel 0x80004044 <o2>,%d1
    0x800005ac <+188>: movel #-1313688652,%d0
    0x800005b2 <+194>: cmpl %d1,%d0
    0x800005b4 <+196>: bnes 0x800005f6 <rec+262>
    0x800005b6 <+198>: movel 0x80004048 <o3>,%d1
    0x800005bc <+204>: movel #-1044200508,%d0
    0x800005c2 <+210>: cmpl %d1,%d0
    0x800005c4 <+212>: bnes 0x800005f6 <rec+262>
    0x800005c6 <+214>: movel 0x8000404c <o4>,%d1
    0x800005cc <+220>: movel #-774712364,%d0
    0x800005d2 <+226>: cmpl %d1,%d0
    0x800005d4 <+228>: bnes 0x800005f6 <rec+262>
    0x800005d6 <+230>: movel 0x80004050 <o5>,%d1
    0x800005dc <+236>: movel #-505224220,%d0
    0x800005e2 <+242>: cmpl %d1,%d0
    0x800005e4 <+244>: bnes 0x800005f6 <rec+262>
    0x800005e6 <+246>: movel 0x80004054 <o6>,%d1
    0x800005ec <+252>: movel #-235736076,%d0
    0x800005f2 <+258>: cmpl %d1,%d0
    0x800005f4 <+260>: beqs 0x800005f8 <rec+264>
    0x800005f6 <+262>: illegal
    0x800005f8 <+264>: nop
    0x800005fa <+266>: moveml %fp@(-28),%d2-%d4/%a2-%a5
    0x80000600 <+272>: unlk %fp
    0x80000602 <+274>: rts
    End of assembler dump.

    ---

    #include <stdlib.h>
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    #include <string.h>

    int depth = 200000;

    const unsigned long i0 = 0x91929394;
    const unsigned long i1 = 0xa1a2a3a4;
    const unsigned long i2 = 0xb1b2b3b4;
    const unsigned long i3 = 0xc1c2c3c4;
    const unsigned long i4 = 0xd1d2d3d4;
    const unsigned long i5 = 0xe1e2e3e4;
    const unsigned long i6 = 0xf1f2f3f4;

    unsigned long o0;
    unsigned long o1;
    unsigned long o2;
    unsigned long o3;
    unsigned long o4;
    unsigned long o5;
    unsigned long o6;

    static void rec(void)
    {
    // initialize registers
    asm( " move.l %0, %%a2\n"
    " move.l %1, %%a3\n"
    " move.l %2, %%a4\n"
    " move.l %3, %%a5\n"
    " move.l %4, %%d2\n"
    " move.l %5, %%d3\n"
    " move.l %6, %%d4\n"
    :
    : "m" (i0), "m" (i1), "m" (i2),
    "m" (i3), "m" (i4), "m" (i5), "m" (i6)
    : "a2", "a3", "a4", "a5", "d2", "d3", "d4"
    );

    // maybe fork a short-lived process
    if ((depth & 0x7ff) == 0)
    if (fork() == 0)
    exit(0);

    if (--depth)
    rec();
  • From Geert Uytterhoeven@21:1/5 to [email protected] on Wed Apr 19 14:00:01 2023
    Hi Finn,

    On Wed, Apr 19, 2023 at 12:53 PM Finn Thain <[email protected]> wrote:
    Inspired by your observation about the page fault and stack growth, I
    wrote a small test program (given below) that just pushes registers onto
    the stack recursively while forking processes and collecting the SIGCHLD signals.

    On a Motorola '030 the stack grows to about 7 MiB before it gets
    corrupted. The program detects the stack corruption and terminates immediately with an illegal instruction. Oddly, the program never detects
    any stack corruption when run on the QEMU '040.

    Nice! (the reproducer, not the bug ;-)

    Does it also fail on a very old kernel image you still have lying around?
    Just to rule out a recent kernel bug.

    Gr{oetje,eeting}s,

    Geert

    --
    Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

    In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that.
    -- Linus Torvalds

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Finn Thain on Thu Apr 20 03:00:02 2023
    Hi Finn,

    On 19/04/23 22:50, Finn Thain wrote:
    On Tue, 18 Apr 2023, Michael Schmitz wrote:

    ... I think what's stored there is the extra frame content for a format
    b bus error frame. But that extra frame is incomplete at best (should be
    22 longwords, only a4 are seen). Probably overwritten by the stack frame
    from __GI___wait4_time64.

    Let's parse what's left:
    <=
    0xefffefe4: 0xc0028780 <= internal registers (6x) >>>>> 0xefffefe0: 0x3c344bfb <=
    0xefffefdc: 0x000af353 <=
    0xefffefd8: 0x3c340170 <= internal reg; version no. >>>>> 0xefffefd4: 0x00000000 <= data input buffer
    0xefffefd0: 0xc00e417c <= internal registers (2x) >>>>> 0xefffefcc: 0xc00e417e <= stage b address
    0xefffefc8: 0xc00e4180 <= internal registers (4x) >>>>> 0xefffefc4: 0x48e73c34 <=
    0xefffefc0: 0x00000000 <= data output buffer
    0xefffefbc: 0xefffeff8 <= internal registers (2x) >>>>> 0xefffefb8: 0xefffeffc <= data fault address
    0xefffefb4: 0x4bfb0170 <= ins stage c, stage b
    0xefffefb0: 0x0eee0709 <= internal register; ssw
    The fault address is the location on the stack where a2 is saved. That
    does match the data output buffer contents BTW. fc, fb, rc, rb bits
    clear means the fault didn't occur in stage b or c instructions. ssw bit
    8 set indicates a data fault - the data cycle should be rerun on rte. rm
    and rw bits clear tell us it's a write fault. If the moveml instruction
    copies registers to the stack in descending order, the fault address
    makes sense - the stack pointer just crossed a page boundary.

    Inspired by your observation about the page fault and stack growth, I
    wrote a small test program (given below) that just pushes registers onto
    the stack recursively while forking processes and collecting the SIGCHLD signals.

    On a Motorola '030 the stack grows to about 7 MiB before it gets
    corrupted. The program detects the stack corruption and terminates immediately with an illegal instruction. Oddly, the program never detects
    any stack corruption when run on the QEMU '040.

    That's great - finally irrefutable confirmation that we're onto a kernel
    bug.


    root@debian:~# ./movem
    Illegal instruction
    root@debian:~# ulimit -a
    real-time non-blocking time (microseconds, -R) unlimited
    core file size (blocks, -c) 0
    data seg size (kbytes, -d) unlimited
    scheduling priority (-e) 0
    file size (blocks, -f) unlimited
    pending signals (-i) 242
    max locked memory (kbytes, -l) 8192
    max memory size (kbytes, -m) unlimited
    open files (-n) 1024
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    real-time priority (-r) 0
    stack size (kbytes, -s) 8192
    cpu time (seconds, -t) unlimited
    max user processes (-u) 242
    virtual memory (kbytes, -v) unlimited
    file locks (-x) unlimited
    root@debian:~# ulimit -s 7200
    root@debian:~# ./movem
    Illegal instruction
    root@debian:~# ulimit -s 7000
    root@debian:~# ./movem
    Segmentation fault
    root@debian:~# ulimit -s 16384
    root@debian:~# ./movem
    Illegal instruction
    root@debian:~#

    Looking at the core dump in gdb, the backtrace has 189869 frames. The dead stack frames confirm the recursion depth reached the limit I set at 200000 before the stack began to reduce again. This was also confirmed by the
    lowest page fault address that was logged by the custom kernel.

    That means validation succeeded 200000 - 189869 == 10131 times before it encountered corruption (I should try to figure out whether this varies).

    The registers %a2, %a3 and %a4 below should contain 0x91929394, 0xa1a2a3a4 and 0xb1b2b3b4 respectively. But they don't. Their values were restored
    from a corrupted stack by the returning rec() function call.

    (gdb) info reg
    d0 0x91929394 -1852664940
    d1 0xf3 243
    d2 0xd1d2d3d4 -774712364
    d3 0xe1e2e3e4 -505224220
    d4 0xf1f2f3f4 -235736076
    d5 0x80003f0c -2147467508
    d6 0xd014c528 -803945176
    d7 0x0 0
    a0 0xc0021708 0xc0021708
    a1 0xc0023e8c 0xc0023e8c <__stack_chk_guard>
    a2 0xf3 0xf3
    a3 0x1464000 0x1464000
    a4 0xef97bf44 0xef97bf44
    a5 0xc1c2c3c4 0xc1c2c3c4
    fp 0xef97b034 0xef97b034
    sp 0xef97b018 0xef97b018
    ps 0x8 [ N ]
    pc 0x800005f6 0x800005f6 <rec+262>
    fpcontrol 0x0 0
    fpstatus 0x0 0
    fpiaddr 0x0 0x0

    (gdb) x/z $sp - 36
    0xef97aff4: 0xd1d2d3d4
    (gdb)
    0xef97aff8: 0xe1e2e3e4
    (gdb)
    0xef97affc: 0xf1f2f3f4
    (gdb)
    0xef97b000: 0x000000f3
    (gdb)
    0xef97b004: 0x01464000
    (gdb)
    0xef97b008: 0xef97bf44
    (gdb)
    0xef97b00c: 0xc1c2c3c4
    (gdb)
    0xef97b010: 0xef97b034
    (gdb)
    0xef97b014: 0x8000055c

    As with dash, the corruption lies the page boundary.

    Hence implies a page fault handled at the page boundary.

    Can you try and fault in as many of these stack pages as possible, ahead
    of filling the stack? (Depending on how much RAM you have ...). Maybe we
    would need to lock those pages into memory? Just to show that with no
    page faults (but still signals) there is no corruption?

    Any signal frames or exception frames have been completely overwritten because the recursion continued after the corruption took place. So
    there's not much to see in the core dump.

    We'd need a way to stop recursion once the first corruption has taken
    place. If the 'safe' recursion depth of 10131 is constant, the dump
    taken at that point should look similar to what you saw in dash
    (assuming it is the page fault and subsequent signal return that causes
    the corruption).

    I'll give your test case a spin on my Falcon. I should also have a range
    of kernels to test and answer Geert's question...

    Cheers,

        Michael




    (gdb) disass rec
    Dump of assembler code for function rec:
    0x800004f0 <+0>: linkw %fp,#0
    0x800004f4 <+4>: moveml %d2-%d4/%a2-%a5,%sp@-
    0x800004f8 <+8>: moveal 0x80000672 <i0>,%a2
    0x800004fe <+14>: moveal 0x80000676 <i1>,%a3
    0x80000504 <+20>: moveal 0x8000067a <i2>,%a4
    0x8000050a <+26>: moveal 0x8000067e <i3>,%a5
    0x80000510 <+32>: movel 0x80000682 <i4>,%d2
    0x80000516 <+38>: movel 0x80000686 <i5>,%d3
    0x8000051c <+44>: movel 0x8000068a <i6>,%d4
    0x80000522 <+50>: movel 0x80004034 <depth>,%d0
    0x80000528 <+56>: andil #2047,%d0
    0x8000052e <+62>: bnes 0x80000542 <rec+82>
    0x80000530 <+64>: jsr 0x8000042c <fork@plt>
    0x80000536 <+70>: tstl %d0
    0x80000538 <+72>: bnes 0x80000542 <rec+82>
    0x8000053a <+74>: clrl %sp@-
    0x8000053c <+76>: jsr 0x80000404 <exit@plt>
    0x80000542 <+82>: movel 0x80004034 <depth>,%d0
    0x80000548 <+88>: subql #1,%d0
    0x8000054a <+90>: movel %d0,0x80004034 <depth>
    0x80000550 <+96>: movel 0x80004034 <depth>,%d0
    0x80000556 <+102>: beqs 0x8000055c <rec+108>
    0x80000558 <+104>: jsr %pc@(0x800004f0 <rec>)
    0x8000055c <+108>: movel %a2,0x8000403c <o0>
    0x80000562 <+114>: movel %a3,0x80004040 <o1>
    0x80000568 <+120>: movel %a4,0x80004044 <o2>
    0x8000056e <+126>: movel %a5,0x80004048 <o3>
    0x80000574 <+132>: movel %d2,0x8000404c <o4>
    0x8000057a <+138>: movel %d3,0x80004050 <o5>
    0x80000580 <+144>: movel %d4,0x80004054 <o6>
    0x80000586 <+150>: movel 0x8000403c <o0>,%d1
    0x8000058c <+156>: movel #-1852664940,%d0
    0x80000592 <+162>: cmpl %d1,%d0
    0x80000594 <+164>: bnes 0x800005f6 <rec+262>
    0x80000596 <+166>: movel 0x80004040 <o1>,%d1
    0x8000059c <+172>: movel #-1583176796,%d0
    0x800005a2 <+178>: cmpl %d1,%d0
    0x800005a4 <+180>: bnes 0x800005f6 <rec+262>
    0x800005a6 <+182>: movel 0x80004044 <o2>,%d1
    0x800005ac <+188>: movel #-1313688652,%d0
    0x800005b2 <+194>: cmpl %d1,%d0
    0x800005b4 <+196>: bnes 0x800005f6 <rec+262>
    0x800005b6 <+198>: movel 0x80004048 <o3>,%d1
    0x800005bc <+204>: movel #-1044200508,%d0
    0x800005c2 <+210>: cmpl %d1,%d0
    0x800005c4 <+212>: bnes 0x800005f6 <rec+262>
    0x800005c6 <+214>: movel 0x8000404c <o4>,%d1
    0x800005cc <+220>: movel #-774712364,%d0
    0x800005d2 <+226>: cmpl %d1,%d0
    0x800005d4 <+228>: bnes 0x800005f6 <rec+262>
    0x800005d6 <+230>: movel 0x80004050 <o5>,%d1
    0x800005dc <+236>: movel #-505224220,%d0
    0x800005e2 <+242>: cmpl %d1,%d0
    0x800005e4 <+244>: bnes 0x800005f6 <rec+262>
    0x800005e6 <+246>: movel 0x80004054 <o6>,%d1
    0x800005ec <+252>: movel #-235736076,%d0
    0x800005f2 <+258>: cmpl %d1,%d0
    0x800005f4 <+260>: beqs 0x800005f8 <rec+264>
    0x800005f6 <+262>: illegal
    0x800005f8 <+264>: nop
    0x800005fa <+266>: moveml %fp@(-28),%d2-%d4/%a2-%a5
    0x80000600 <+272>: unlk %fp
    0x80000602 <+274>: rts
    End of assembler dump.

    ---

    #include <stdlib.h>
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    #include <string.h>

    int depth = 200000;

    const unsigned long i0 = 0x91929394;
    const unsigned long i1 = 0xa1a2a3a4;
    const unsigned long i2 = 0xb1b2b3b4;
    const unsigned long i3 = 0xc1c2c3c4;
    const unsigned long i4 = 0xd1d2d3d4;
    const unsigned long i5 = 0xe1e2e3e4;
    const unsigned long i6 = 0xf1f2f3f4;

    unsigned long o0;
    unsigned long o1;
    unsigned long o2;
    unsigned long o3;
    unsigned long o4;
    unsigned long o5;
    unsigned long o6;

    static void rec(void)
    {
    // initialize registers
    asm( " move.l %0, %%a2\n"
    " move.l %1, %%a3\n"
    " move.l %2, %%a4\n"
    " move.l %3, %%a5\n"
    " move.l %4, %%d2\n"
    " move.l %5, %%d3\n"
    " move.l %6, %%d4\n"
    :
    : "m" (i0), "m" (i1), "m" (i2),
    "m" (i3), "m" (i4), "m" (i5), "m" (i6)
    : "a2", "a3", "a4", "a5", "d2", "d3", "d4"
    );

    // maybe fork a short-lived process
    if ((depth & 0x7ff) == 0)
    if (fork() == 0)
    exit(0);

    if (--depth)
    rec(); // callee to save & restore registers

    // compare register contents
    asm( " move.l %%a2, %0\n"
    " move.l %%a3, %1\n"
    " move.l %%a4, %2\n"
    " move.l %%a5, %3\n"
    " move.l %%d2, %4\n"
    " move.l %%d3, %5\n"
    " move.l %%d4, %6\n"
    : "=m" (o0), "=m" (o1), "=m" (o2),
    "=m" (o3), "=m" (o4), "=m" (o5), "=m" (o6)
    :
    :
    );
    if (o0 != i0 || o1 != i1 || o2 != i2 ||
    o3 != i3 || o4 != i4 || o5 != i5 || o6 != i6)
    asm("illegal");
    }

    static void handler(int)
    {
    }

    int main(void)
    {
    struct sigaction act;

    memset(&act, 0, sizeof(act));
    act.sa_handler = handler;
    sigaction(SIGCHLD, &act, NULL);

    rec();
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Geert Uytterhoeven on Thu Apr 20 03:20:01 2023
    On Wed, 19 Apr 2023, Geert Uytterhoeven wrote:

    Does it also fail on a very old kernel image you still have lying
    around? Just to rule out a recent kernel bug.


    These are two mainline builds I've tried (among others):

    [ 0.000000] Linux version 4.14.0-mac (fthain@nippy) (gcc version 6.4.0 (btc)) #1 Mon May 23 15:03:18 AEST 2022

    [ 0.000000] Linux version 6.3.0-rc5-mac (fthain@nippy) (m68k-unknown-linux-musl-gcc (Gentoo 12.2.1_p20221224 p7) 12.2.1 20221224,
    GNU ld (Gentoo 2.39 p5) 2.39.0) #1 Thu Apr 20 10:10:37 AEST 2023

    The results are the same.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Thu Apr 20 05:20:01 2023
    On Thu, 20 Apr 2023, Michael Schmitz wrote:

    Can you try and fault in as many of these stack pages as possible, ahead
    of filling the stack? (Depending on how much RAM you have ...). Maybe we would need to lock those pages into memory? Just to show that with no
    page faults (but still signals) there is no corruption?


    OK.

    Any signal frames or exception frames have been completely overwritten because the recursion continued after the corruption took place. So
    there's not much to see in the core dump.

    We'd need a way to stop recursion once the first corruption has taken
    place. If the 'safe' recursion depth of 10131 is constant, the dump
    taken at that point should look similar to what you saw in dash
    (assuming it is the page fault and subsequent signal return that causes
    the corruption).


    It turns out that the recursion depth can be set a lot lower than the
    200000 that I chose in that test program. (I used that value as it kept
    the stack size just below the default 8192 kB limit.)

    At depth = 2500, a failure is around 95% certain. At depth = 2048 I can
    still get an intermittent failure. This only required 21 stack pagefaults
    and one fork.

    I suspect that the location of the corruption is probably somewhat random,
    and the larger the stack happens to be when the signal comes in, the
    better the odds of detection.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Thu Apr 20 06:10:01 2023
    Hi Finn,

    reproduced on my Falcon (with minor mods to the C source - my version of
    gcc didn't like asm with no clobbers, so I added "memory" as clobber in
    the second asm block). In this case it's a4 that is corrupted, but that
    varies.

    depth of 4096 gets me two core dumps on 20 attempts so this isn't quite
    as fast on my Falcon. With 8192, it's nine.

    Example:

    Core was generated by `./moveml'.
    Program terminated with signal 4, Illegal instruction.
    Reading symbols from /lib/libc.so.6...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld.so.1...done.
    Loaded symbols for /lib/ld.so.1
    #0 0x8000060e in rec ()
    (gdb) info reg
    d0 0x8000057c -2147482244
    d1 0xc0017000 -1073647616
    d2 0xd1d2d3d4 -774712364
    d3 0xe1e2e3e4 -505224220
    d4 0xf1f2f3f4 -235736076
    d5 0x80096168 -2146868888
    d6 0x80093108 -2146881272
    d7 0x0 0
    a0 0x0 0x0
    a1 0xefdadbdc 0xefdadbdc
    a2 0x91929394 0x91929394
    a3 0xa1a2a3a4 0xa1a2a3a4
    a4 0x8000057c 0x8000057c
    a5 0xc1c2c3c4 0xc1c2c3c4
    fp 0xef87402c 0xef87402c
    sp 0xef874010 0xef874010
    ps 0x209 521
    pc 0x8000060e 0x8000060e <rec+242>
    fpcontrol 0x0 0
    fpstatus 0x0 0
    fpiaddr 0x0 0
    (gdb)


    Am 20.04.2023 um 14:57 schrieb Finn Thain:
    On Thu, 20 Apr 2023, Michael Schmitz wrote:

    Can you try and fault in as many of these stack pages as possible, ahead
    of filling the stack? (Depending on how much RAM you have ...). Maybe we
    would need to lock those pages into memory? Just to show that with no
    page faults (but still signals) there is no corruption?


    OK.

    Any signal frames or exception frames have been completely overwritten
    because the recursion continued after the corruption took place. So
    there's not much to see in the core dump.

    We'd need a way to stop recursion once the first corruption has taken
    place. If the 'safe' recursion depth of 10131 is constant, the dump
    taken at that point should look similar to what you saw in dash
    (assuming it is the page fault and subsequent signal return that causes
    the corruption).


    It turns out that the recursion depth can be set a lot lower than the
    200000 that I chose in that test program. (I used that value as it kept
    the stack size just below the default 8192 kB limit.)

    And it does keep the core a lot smaller. Still not hard to work with on
    my 14 MB RAM Falcon...


    At depth = 2500, a failure is around 95% certain. At depth = 2048 I can
    still get an intermittent failure. This only required 21 stack pagefaults
    and one fork.

    I suspect that the location of the corruption is probably somewhat random, and the larger the stack happens to be when the signal comes in, the
    better the odds of detection.

    Yep, but there must me some more to that. Timing of page faults due to
    swap bandwidth, perhaps?

    Cheers,

    Michael



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to All on Thu Apr 20 08:10:01 2023
    On Thu, 20 Apr 2023, I wrote:


    So it must be that a MOVEM went awry when a signal got delivered.

    Or signal delivery went awry after a MOVEM got resumed?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Thu Apr 20 07:40:01 2023
    On Thu, 20 Apr 2023, Michael Schmitz wrote:


    As with dash, the corruption lies the page boundary.

    Hence implies a page fault handled at the page boundary.

    Can you try and fault in as many of these stack pages as possible, ahead
    of filling the stack? (Depending on how much RAM you have ...). Maybe we would need to lock those pages into memory? Just to show that with no
    page faults (but still signals) there is no corruption?


    I modified the test program to execute rec() to full depth with no
    forking, then do it again with forking.

    root@(none):/root# while ./stack-test 5000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    done.
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction
    root@(none):/root#

    I can't get this to crash during the first descent. The second descent
    always crashes, given sufficient depth:

    root@(none):/root# while ./stack-test 50000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction

    So all the stack pages would have been faulted in well before the failure
    shows up. It appears to be the signal that's the problem and not the page fault. That's not surprising considering the PC in the signal frame in the
    dash crash was a MOVEM saving registers onto the stack.

    It's worth noting that the test program never crashes with a corrupted
    return address. Random corruption would have clobbered that address about
    10% of the time, since the entire rec() stack frame is 9 long words. So it
    must be that a MOVEM went awry when a signal got delivered.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to All on Thu Apr 20 08:20:01 2023
    On Wed, 19 Apr 2023, I wrote:

    Oddly, the program never detects any stack corruption when run on the
    QEMU '040.


    I tested a Motorola '040 and got the same result.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Thu Apr 20 09:40:01 2023
    Hi Finn,

    Am 20.04.2023 um 17:17 schrieb Finn Thain:
    On Thu, 20 Apr 2023, Michael Schmitz wrote:


    As with dash, the corruption lies the page boundary.

    Hence implies a page fault handled at the page boundary.

    Can you try and fault in as many of these stack pages as possible, ahead
    of filling the stack? (Depending on how much RAM you have ...). Maybe we
    would need to lock those pages into memory? Just to show that with no
    page faults (but still signals) there is no corruption?


    I modified the test program to execute rec() to full depth with no
    forking, then do it again with forking.

    root@(none):/root# while ./stack-test 5000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    done.
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction
    root@(none):/root#

    I can't get this to crash during the first descent. The second descent
    always crashes, given sufficient depth:

    root@(none):/root# while ./stack-test 50000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction

    So all the stack pages would have been faulted in well before the failure shows up. It appears to be the signal that's the problem and not the page fault. That's not surprising considering the PC in the signal frame in the dash crash was a MOVEM saving registers onto the stack.

    Well. without locking the faulted in pages in memory we can't be sure
    they were not swapped back out. Unless I misunderstand what's involved
    in that ...

    In my tests, increasing the depth does not cause a monotonous increase
    in fault probability. 16k depth only has four crashes, 8k had nine. I'll
    stick with 200000 for now.

    Best try and narrow down how long this bug has been present - my kernel
    builds on the current system go back a little over ten years.

    Cheers,

    Michael



    It's worth noting that the test program never crashes with a corrupted
    return address. Random corruption would have clobbered that address about
    10% of the time, since the entire rec() stack frame is 9 long words. So it must be that a MOVEM went awry when a signal got delivered.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Thu Apr 20 09:50:01 2023
    Hi Finn,

    Am 20.04.2023 um 18:04 schrieb Finn Thain:
    On Wed, 19 Apr 2023, I wrote:

    Oddly, the program never detects any stack corruption when run on the
    QEMU '040.


    I tested a Motorola '040 and got the same result.

    OK, that would mean the bus error was just the most reliable way to get do_signal_return run after child process termination, and the signal
    delivery itself may be responsible for stack corruption.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Thu Apr 20 10:10:01 2023
    On Thu, 20 Apr 2023, Michael Schmitz wrote:

    Am 20.04.2023 um 17:17 schrieb Finn Thain:
    On Thu, 20 Apr 2023, Michael Schmitz wrote:


    As with dash, the corruption lies the page boundary.

    Hence implies a page fault handled at the page boundary.

    Can you try and fault in as many of these stack pages as possible, ahead >> of filling the stack? (Depending on how much RAM you have ...). Maybe we >> would need to lock those pages into memory? Just to show that with no
    page faults (but still signals) there is no corruption?


    I modified the test program to execute rec() to full depth with no
    forking, then do it again with forking.

    root@(none):/root# while ./stack-test 5000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    done.
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction
    root@(none):/root#

    I can't get this to crash during the first descent. The second descent always crashes, given sufficient depth:

    root@(none):/root# while ./stack-test 50000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction

    So all the stack pages would have been faulted in well before the
    failure shows up. It appears to be the signal that's the problem and
    not the page fault. That's not surprising considering the PC in the
    signal frame in the dash crash was a MOVEM saving registers onto the
    stack.

    Well. without locking the faulted in pages in memory we can't be sure
    they were not swapped back out. Unless I misunderstand what's involved
    in that ...


    There was no swap enabled.

    50000 frames * 36 bytes per frame == 1.8 MB

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Thu Apr 20 10:20:01 2023
    On Thu, 20 Apr 2023, Michael Schmitz wrote:

    Am 20.04.2023 um 18:04 schrieb Finn Thain:
    On Wed, 19 Apr 2023, I wrote:

    Oddly, the program never detects any stack corruption when run on the
    QEMU '040.


    I tested a Motorola '040 and got the same result.

    OK, that would mean the bus error was just the most reliable way to get do_signal_return run after child process termination, and the signal
    delivery itself may be responsible for stack corruption.


    I agree (I assume you're replying to my other message about the absence of swap).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Thu Apr 20 11:10:01 2023
    On Thu, 20 Apr 2023, Michael Schmitz wrote:

    Am 20.04.2023 um 19:47 schrieb Finn Thain:
    So all the stack pages would have been faulted in well before the
    failure shows up. It appears to be the signal that's the problem and
    not the page fault. That's not surprising considering the PC in the
    signal frame in the dash crash was a MOVEM saving registers onto the
    stack.

    Well. without locking the faulted in pages in memory we can't be sure
    they were not swapped back out. Unless I misunderstand what's
    involved in that ...


    There was no swap enabled.

    50000 frames * 36 bytes per frame == 1.8 MB

    OK - swap is enabled in my case. That may explain the different fault
    rates.

    But in any case, it looks like we can eliminate the bus error code. Same fault on both 030 and 040 with very different bus error handlers is
    highly unlikely.


    There's no failure on '040. QEMU and Motorola '040 gave the same result.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Thu Apr 20 10:30:01 2023
    Hi Finn,

    Am 20.04.2023 um 19:47 schrieb Finn Thain:
    So all the stack pages would have been faulted in well before the
    failure shows up. It appears to be the signal that's the problem and
    not the page fault. That's not surprising considering the PC in the
    signal frame in the dash crash was a MOVEM saving registers onto the
    stack.

    Well. without locking the faulted in pages in memory we can't be sure
    they were not swapped back out. Unless I misunderstand what's involved
    in that ...


    There was no swap enabled.

    50000 frames * 36 bytes per frame == 1.8 MB

    OK - swap is enabled in my case. That may explain the different fault
    rates.

    But in any case, it looks like we can eliminate the bus error code. Same
    fault on both 030 and 040 with very different bus error handlers is
    highly unlikely.

    Cheers.

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Finn Thain on Fri Apr 21 00:00:01 2023
    Hi Finn,

    On 20/04/23 20:55, Finn Thain wrote:

    But in any case, it looks like we can eliminate the bus error code. Same
    fault on both 030 and 040 with very different bus error handlers is
    highly unlikely.

    There's no failure on '040. QEMU and Motorola '040 gave the same result.

    Sorry, my fault - I interpreted your mail as saying 030 and 040 gave the
    same result.

    Back to the drawing board ... I've got kernel images back to 2.4.30 and
    2.6.37 to try and test. I'm also trying with rt signals and alternate
    signal stack (rt signals show the same behaviour).

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Finn Thain on Fri Apr 21 04:00:01 2023
    Hi Finn,

    On 21/04/23 13:15, Finn Thain wrote:
    Fri, 21 Apr 2023 11:15:22 +1000 (AEST)n Thu, 20 Apr 2023, Michael Schmitz wrote:

    In my tests, increasing the depth does not cause a monotonous increase
    in fault probability. 16k depth only has four crashes, 8k had nine. I'll
    stick with 200000 for now.

    My tests used 'norandmaps' in the kernel parameters. With the attached .config (which supports Mac and Atari) I saw 12 failures out of 16 tests
    at depth 5000.

    Right - I' make sure to add that in future tests.

    I'll also add your patch to bus_error030 in my builds.

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Michael Schmitz on Fri Apr 21 03:50:01 2023
    Hi Finn,

    one more datapoint: using sigaltstack() to set up a separate signal
    stack in main() (code lifted straight from the sigaltstack man page)
    does avoid the stack corruption.

    That might be useful as a workaround for dash, while we're analyzing
    this bug.

    I wonder whether recursively growing the stack while taking signals on
    the same stack is meant to be safe. Need to look at usp inside the
    signal frame (i.e. at the time the program was preempted for signal
    delivery) and usp at the lowest recursion depth at the time of the
    signal handler running ...

    Cheers,

        Michael


    On 21/04/23 09:58, Michael Schmitz wrote:
    Hi Finn,

    On 20/04/23 20:55, Finn Thain wrote:

    But in any case, it looks like we can eliminate the bus error code.
    Same
    fault on both 030 and 040 with very different bus error handlers is
    highly unlikely.

    There's no failure on '040. QEMU and Motorola '040 gave the same result.

    Sorry, my fault - I interpreted your mail as saying 030 and 040 gave
    the same result.

    Back to the drawing board ... I've got kernel images back to 2.4.30
    and 2.6.37 to try and test. I'm also trying with rt signals and
    alternate signal stack (rt signals show the same behaviour).

    Cheers,

        Michael





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Finn Thain on Fri Apr 21 04:00:01 2023
    On Thu, 20 Apr 2023, Finn Thain wrote:


    I modified the test program to execute rec() to full depth with no
    forking, then do it again with forking.

    root@(none):/root# while ./stack-test 5000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    done.
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction
    root@(none):/root#

    I can't get this to crash during the first descent. The second descent
    always crashes, given sufficient depth:

    root@(none):/root# while ./stack-test 50000 ; do : ; done
    starting recursion
    done.
    starting recursion with fork
    Illegal instruction

    So all the stack pages would have been faulted in well before the
    failure shows up. It appears to be the signal that's the problem and not
    the page fault.

    Sorry, I was mistaken when I infered the absence of bus errors. I should
    have enabled ignore_loglevel and checked the printk output from my patched bus_error030():

    # ./stack-test 5000
    [ 47.430000] bus_error030: page fault: pc 80000642, addr 80072fe0
    [ 47.440000] bus_error030: descriptor address 005a3dc8, contents 00000000
    [ 47.460000] bus_error030: page fault: pc 8000064a, addr 8006ddc8
    [ 47.470000] bus_error030: descriptor address 005a3db4, contents 011b300d
    [ 47.750000] bus_error030: page fault: pc 8001a952, addr 80073068
    [ 47.760000] bus_error030: descriptor address 005a3dcc, contents 00000000
    [ 47.790000] bus_error030: page fault: pc 8001a99c, addr 8006fb14
    [ 47.800000] bus_error030: descriptor address 005a3dbc, contents 011b800d
    [ 47.880000] bus_error030: page fault: pc 80013e9e, addr 800744e0
    [ 47.890000] bus_error030: descriptor address 005a3dd0, contents 00000000
    [ 48.150000] bus_error030: page fault: pc 80038d14, addr efffee38
    [ 48.160000] bus_error030: descriptor address 005a3bf8, contents 00000000
    [ 48.600000] bus_error030: page fault: pc 80011d08, addr 80075adc
    [ 48.620000] bus_error030: descriptor address 005a3dd4, contents 00000000 starting recursion
    [ 48.650000] bus_error030: page fault: pc 8000038c, addr efffdffc
    [ 48.660000] bus_error030: descriptor address 005a3bf4, contents 00000000
    [ 48.670000] bus_error030: page fault: pc 8000038c, addr efffcffc
    [ 48.690000] bus_error030: descriptor address 005a3bf0, contents 00000000
    [ 48.710000] bus_error030: page fault: pc 8000038c, addr efffbffc
    [ 48.720000] bus_error030: descriptor address 005a3bec, contents 00000000
    [ 48.730000] bus_error030: page fault: pc 8000038c, addr efffaffc
    [ 48.760000] bus_error030: descriptor address 005a3be8, contents 00000000
    [ 48.770000] bus_error030: page fault: pc 8000038c, addr efff9ffc
    [ 48.780000] bus_error030: descriptor address 005a3be4, contents 00000000
    [ 48.790000] bus_error030: page fault: pc 80000388, addr efff8ffc
    [ 48.820000] bus_error030: descriptor address 005a3be0, contents 00000000
    [ 48.830000] bus_error030: page fault: pc 8000038c, addr efff7ffc
    [ 48.840000] bus_error030: descriptor address 005a3bdc, contents 00000000
    [ 48.850000] bus_error030: page fault: pc 8000038c, addr efff6ffc
    [ 48.880000] bus_error030: descriptor address 005a3bd8, contents 00000000
    [ 48.890000] bus_error030: page fault: pc 8000038c, addr efff5ffc
    [ 48.900000] bus_error030: descriptor address 005a3bd4, contents 00000000
    [ 48.920000] bus_error030: page fault: pc 8000038c, addr efff4ffc
    [ 48.930000] bus_error030: descriptor address 005a3bd0, contents 00000000
    [ 48.940000] bus_error030: page fault: pc 8000038c, addr efff3ffc
    [ 48.950000] bus_error030: descriptor address 005a3bcc, contents 00000000
    [ 48.970000] bus_error030: page fault: pc 8000038c, addr efff2ffc
    [ 48.980000] bus_error030: descriptor address 005a3bc8, contents 00000000
    [ 48.990000] bus_error030: page fault: pc 8000038c, addr efff1ffc
    [ 49.000000] bus_error030: descriptor address 005a3bc4, contents 00000000
    [ 49.020000] bus_error030: page fault: pc 8000038c, addr efff0ffc
    [ 49.030000] bus_error030: descriptor address 005a3bc0, contents 00000000
    [ 49.040000] bus_error030: page fault: pc 80000388, addr effefffc
    [ 49.050000] bus_error030: descriptor address 005a3bbc, contents 00000000
    [ 49.070000] bus_error030: page fault: pc 8000038c, addr effeeffc
    [ 49.080000] bus_error030: descriptor address 005a3bb8, contents 00000000
    [ 49.090000] bus_error030: page fault: pc 8000038c, addr effedffc
    [ 49.100000] bus_error030: descriptor address 005a3bb4, contents 00000000
    [ 49.120000] bus_error030: page fault: pc 8000038c, addr effecffc
    [ 49.130000] bus_error030: descriptor address 005a3bb0, contents 00000000
    [ 49.140000] bus_error030: page fault: pc 8000038c, addr effebffc
    [ 49.150000] bus_error030: descriptor address 005a3bac, contents 00000000
    [ 49.170000] bus_error030: page fault: pc 8000038c, addr effeaffc
    [ 49.180000] bus_error030: descriptor address 005a3ba8, contents 00000000
    [ 49.190000] bus_error030: page fault: pc 8000038c, addr effe9ffc
    [ 49.200000] bus_error030: descriptor address 005a3ba4, contents 00000000
    [ 49.220000] bus_error030: page fault: pc 8000038c, addr effe8ffc
    [ 49.230000] bus_error030: descriptor address 005a3ba0, contents 00000000
    [ 49.240000] bus_error030: page fault: pc 8000038c, addr effe7ffc
    [ 49.250000] bus_error030: descriptor address 005a3b9c, contents 00000000
    [ 49.270000] bus_error030: page fault: pc 80000388, addr effe6ffc
    [ 49.280000] bus_error030: descriptor address 005a3b98, contents 00000000
    [ 49.290000] bus_error030: page fault: pc 8000038c, addr effe5ffc
    [ 49.300000] bus_error030: descriptor address 005a3b94, contents 00000000
    [ 49.320000] bus_error030: page fault: pc 8000038c, addr effe4ffc
    [ 49.330000] bus_error030: descriptor address 005a3b90, contents 00000000
    [ 49.340000] bus_error030: page fault: pc 8000038c, addr effe3ffc
    [ 49.360000] bus_error030: descriptor address 005a3b8c, contents 00000000
    [ 49.380000] bus_error030: page fault: pc 8000038c, addr effe2ffc
    [ 49.390000] bus_error030: descriptor address 005a3b88, contents 00000000
    [ 49.400000] bus_error030: page fault: pc 8000038c, addr effe1ffc
    [ 49.420000] bus_error030: descriptor address 005a3b84, contents 00000000
    [ 49.440000] bus_error030: page fault: pc 8000038c, addr effe0ffc
    [ 49.450000] bus_error030: descriptor address 005a3b80, contents 00000000
    [ 49.460000] bus_error030: page fault: pc 8000038c, addr effdfffc
    [ 49.490000] bus_error030: descriptor address 005a3b7c, contents 00000000
    [ 49.500000] bus_error030: page fault: pc 8000038c, addr effdeffc
    [ 49.510000] bus_error030: descriptor address 005a3b78, contents 00000000
    [ 49.530000] bus_error030: page fault: pc 80000388, addr effddffc
    [ 49.540000] bus_error030: descriptor address 005a3b74, contents 00000000
    [ 49.550000] bus_error030: page fault: pc 8000038c, addr effdcffc
    [ 49.560000] bus_error030: descriptor address 005a3b70, contents 00000000
    [ 49.580000] bus_error030: page fault: pc 8000038c, addr effdbffc
    [ 49.590000] bus_error030: descriptor address 005a3b6c, contents 00000000
    [ 49.600000] bus_error030: page fault: pc 8000038c, addr effdaffc
    [ 49.610000] bus_error030: descriptor address 005a3b68, contents 00000000
    [ 49.630000] bus_error030: page fault: pc 8000038c, addr effd9ffc
    [ 49.640000] bus_error030: descriptor address 005a3b64, contents 00000000
    [ 49.650000] bus_error030: page fault: pc 8000038c, addr effd8ffc
    [ 49.680000] bus_error030: descriptor address 005a3b60, contents 00000000
    [ 49.690000] bus_error030: page fault: pc 8000038c, addr effd7ffc
    [ 49.700000] bus_error030: descriptor address 005a3b5c, contents 00000000
    [ 49.720000] bus_error030: page fault: pc 8000038c, addr effd6ffc
    [ 49.730000] bus_error030: descriptor address 005a3b58, contents 00000000
    [ 49.740000] bus_error030: page fault: pc 8000038c, addr effd5ffc
    [ 49.750000] bus_error030: descriptor address 005a3b54, contents 00000000
    [ 49.770000] bus_error030: page fault: pc 80000388, addr effd4ffc
    [ 49.780000] bus_error030: descriptor address 005a3b50, contents 00000000
    [ 49.800000] bus_error030: page fault: pc 8000038c, addr effd3ffc
    [ 49.820000] bus_error030: descriptor address 005a3b4c, contents 00000000 done.
    starting recursion with fork
    [ 49.910000] bus_error030: page fault: pc 8001599c, addr efff7d3c
    [ 49.930000] bus_error030: descriptor address 005a3bdc, contents 0182e01d
    [ 49.940000] bus_error030: page fault: pc 80000b3c, addr efff7d24
    [ 49.970000] bus_error030: descriptor address 014280dc, contents 0182e01d
    [ 49.980000] bus_error030: page fault: pc 800003ea, addr 800703a0
    [ 49.990000] bus_error030: descriptor address 005a3dc0, contents 011aa01d
    [ 50.000000] bus_error030: page fault: pc 80015730, addr 800731d0
    [ 50.030000] bus_error030: descriptor address 005a3fcc, contents 0132201d
    [ 50.040000] bus_error030: page fault: pc 8000038c, addr efff6ffc
    [ 50.050000] bus_error030: descriptor address 005a3bd8, contents 0182f01d
    [ 50.070000] bus_error030: page fault: pc 8000038c, addr efff5ffc
    [ 50.080000] bus_error030: descriptor address 005a3bd4, contents 0183001d
    [ 50.090000] bus_error030: page fault: pc 80015736, addr 8006faf0
    [ 50.100000] bus_error030: descriptor address 005a3fbc, contents 0132301d
    [ 50.120000] bus_error030: page fault: pc 8000038c, addr efff4ffc
    [ 50.130000] bus_error030: descriptor address 005a3bd0, contents 0183101d
    [ 50.150000] bus_error030: page fault: pc 8000038c, addr efff3ffc
    [ 50.160000] bus_error030: descriptor address 005a3bcc, contents 0183201d
    [ 50.190000] bus_error030: page fault: pc 8000038c, addr efff2ffc
    [ 50.200000] bus_error030: descriptor address 005a3bc8, contents 0183301d
    [ 50.210000] bus_error030: page fault: pc 8000038c, addr efff1ffc
    [ 50.240000] bus_error030: descriptor address 005a3bc4, contents 0183401d
    [ 50.250000] bus_error030: page fault: pc 8000038c, addr efff0ffc
    [ 50.260000] bus_error030: descriptor address 005a3bc0, contents 0183501d
    [ 50.270000] bus_error030: page fault: pc 80000388, addr effefffc
    [ 50.290000] bus_error030: descriptor address 005a3bbc, contents 0183601d
    [ 50.310000] bus_error030: page fault: pc 8000038c, addr effeeffc
    [ 50.320000] bus_error030: descriptor address 005a3bb8, contents 0183701d
    [ 50.330000] bus_error030: page fault: pc 8000038c, addr effedffc
    [ 50.360000] bus_error030: descriptor address 005a3bb4, contents 0183801d
    [ 50.370000] bus_error030: page fault: pc 8000038c, addr effecffc
    [ 50.380000] bus_error030: descriptor address 005a3bb0, contents 0183901d
    [ 50.390000] bus_error030: page fault: pc 8000038c, addr effebffc
    [ 50.420000] bus_error030: descriptor address 005a3bac, contents 0183a01d
    [ 50.430000] bus_error030: page fault: pc 8000038c, addr effeaffc
    [ 50.440000] bus_error030: descriptor address 005a3ba8, contents 0183b01d
    [ 50.450000] bus_error030: page fault: pc 8000038c, addr effe9ffc
    [ 50.480000] bus_error030: descriptor address 005a3ba4, contents 0183c01d
    [ 50.490000] bus_error030: page fault: pc 8000038c, addr effe8ffc
    [ 50.500000] bus_error030: descriptor address 005a3ba0, contents 0183d01d
    [ 50.510000] bus_error030: page fault: pc 8000038c, addr effe7ffc
    [ 50.530000] bus_error030: descriptor address 005a3b9c, contents 0183e01d
    [ 50.550000] bus_error030: page fault: pc 80000388, addr effe6ffc
    [ 50.560000] bus_error030: descriptor address 005a3b98, contents 0183f01d
    [ 50.570000] bus_error030: page fault: pc 8000038c, addr effe5ffc
    [ 50.590000] bus_error030: descriptor address 005a3b94, contents 0184001d
    [ 50.620000] bus_error030: page fault: pc 8001599c, addr effe5d3c
    [ 50.630000] bus_error030: descriptor address 005a3b94, contents 0184001d
    [ 50.650000] bus_error030: page fault: pc 80000b3c, addr effe5d24
    [ 50.660000] bus_error030: descriptor address 01429094, contents 0184001d
    [ 50.670000] bus_error030: page fault: pc 800003ea, addr 800703a0
    [ 50.680000] bus_error030: descriptor address 005a3dc0, contents 0185501d
    [ 50.700000] bus_error030: page fault: pc 80015730, addr 800731d0
    [ 50.710000] bus_error030: descriptor address 005a3fcc, contents 0132201d
    [ 50.720000] bus_error030: page fault: pc 8000038c, addr effe4ffc
    [ 50.730000] bus_error030: descriptor address 005a3b90, contents 0184101d
    [ 50.750000] bus_error030: page fault: pc 80015736, addr 8006faf0
    [ 50.760000] bus_error030: descriptor address 005a3fbc, contents 0132301d
    [ 50.780000] bus_error030: page fault: pc 8000038c, addr effe3ffc
    [ 50.790000] bus_error030: descriptor address 005a3b8c, contents 0184201d
    [ 50.820000] bus_error030: page fault: pc 8000038c, addr effe2ffc
    [ 50.830000] bus_error030: descriptor address 005a3b88, contents 0184301d
    [ 50.840000] bus_error030: page fault: pc 8000038c, addr effe1ffc
    [ 50.860000] bus_error030: descriptor address 005a3b84, contents 0184401d
    [ 50.880000] bus_error030: page fault: pc 8000038c, addr effe0ffc
    [ 50.890000] bus_error030: descriptor address 005a3b80, contents 0184501d
    [ 50.900000] bus_error030: page fault: pc 8000038c, addr effdfffc
    [ 50.920000] bus_error030: descriptor address 005a3b7c, contents 0184601d
    [ 50.940000] bus_error030: page fault: pc 8000038c, addr effdeffc
    [ 50.950000] bus_error030: descriptor address 005a3b78, contents 0184701d
    [ 50.960000] bus_error030: page fault: pc 80000388, addr effddffc
    [ 50.990000] bus_error030: descriptor address 005a3b74, contents 0184801d
    [ 51.000000] bus_error030: page fault: pc 8000038c, addr effdcffc
    [ 51.010000] bus_error030: descriptor address 005a3b70, contents 0184901d
    [ 51.030000] bus_error030: page fault: pc 8000038c, addr effdbffc
    [ 51.040000] bus_error030: descriptor address 005a3b6c, contents 0184a01d
    [ 51.050000] bus_error030: page fault: pc 8000038c, addr effdaffc
    [ 51.060000] bus_error030: descriptor address 005a3b68, contents 0184b01d
    [ 51.070000] bus_error030: page fault: pc 8000038c, addr effd9ffc
    [ 51.100000] bus_error030: descriptor address 005a3b64, contents 0184c01d
    [ 51.110000] bus_error030: page fault: pc 8000038c, addr effd8ffc
    [ 51.120000] bus_error030: descriptor address 005a3b60, contents 0184d01d
    [ 51.130000] bus_error030: page fault: pc 8000038c, addr effd7ffc
    [ 51.160000] bus_error030: descriptor address 005a3b5c, contents 0184e01d
    [ 51.170000] bus_error030: page fault: pc 8000038c, addr effd6ffc
    [ 51.180000] bus_error030: descriptor address 005a3b58, contents 0184f01d
    [ 51.190000] bus_error030: page fault: pc 8000038c, addr effd5ffc
    [ 51.220000] bus_error030: descriptor address 005a3b54, contents 0185001d
    [ 51.230000] bus_error030: page fault: pc 80000388, addr effd4ffc
    [ 51.240000] bus_error030: descriptor address 005a3b50, contents 0185101d
    [ 51.260000] bus_error030: page fault: pc 8000038c, addr effd3ffc
    [ 51.270000] bus_error030: descriptor address 005a3b4c, contents 0185201d
    [ 51.290000] bus_error030: page fault: pc 8001599c, addr effd3d3c
    [ 51.300000] bus_error030: descriptor address 005a3b4c, contents 0185201d
    [ 51.320000] bus_error030: page fault: pc 80000b3c, addr effd3d24
    [ 51.330000] bus_error030: descriptor address 0142804c, contents 0185201d
    [ 51.340000] bus_error030: page fault: pc 800003ea, addr 800703a0
    [ 51.360000] bus_error030: descriptor address 005a3dc0, contents 0185601d
    [ 51.380000] bus_error030: page fault: pc 80015730, addr 800731d0
    [ 51.390000] bus_error030: descriptor address 005a3fcc, contents 0132201d
    [ 51.400000] bus_error030: page fault: pc 80015736, addr 8006faf0
    [ 51.430000] bus_error030: descriptor address 005a3fbc, contents 0132301d Illegal instruction
    #


    diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
    index a700807c9b6d..4820d5e7dd8d 100644
    --- a/arch/m68k/kernel/traps.c
    +++ b/arch/m68k/kernel/traps.c
    @@ -500,9 +500,7 @@ static inline void bus_error030 (struct frame *fp)
    unsigned short mmusr;
    unsigned long addr, errorcode;
    unsigned short ssw = fp->un.fmtb.ssw;
    -#ifdef DEBUG
    unsigned long desc;
    -#endif

    pr_debug("pid = %x ", current->pid);
    pr_debug("SSW=%#06x ", ssw);
    @@ -526,25 +524,24 @@ static inline void bus_error030 (struct frame *fp)
    if (ssw & DF) {
    addr = fp->un.fmtb.daddr;

    -#ifdef DEBUG
    asm volatile ("ptestr %3,%2@,#7,%0\n\t"
    "pmove %%psr,%1"
    : "=a&" (desc), "=m" (temp)
    : "a" (addr), "d" (ssw));
    - pr_debug("mmusr is %#x for addr %#lx in task %p\n",
    - temp, addr, current);
    - pr_debug("descriptor address is 0x%p, contents %#lx\n",
    - __va(desc), *(unsigned long *)__va(desc));
    -#else
    - asm volatile ("ptestr %2,%1@,#7\n\t"
    - "pmove %%psr,%0"
    - : "=m" (temp) : "a" (addr), "d" (ssw));
    -#endif
    mmusr = temp;
    errorcode = (mmusr & MMU_I
  • From Michael Schmitz@21:1/5 to All on Fri Apr 21 08:00:02 2023
    Hi Finn,

    Am 21.04.2023 um 09:58 schrieb Michael Schmitz:
    Hi Finn,

    On 20/04/23 20:55, Finn Thain wrote:

    But in any case, it looks like we can eliminate the bus error code. Same >>> fault on both 030 and 040 with very different bus error handlers is
    highly unlikely.

    There's no failure on '040. QEMU and Motorola '040 gave the same result.

    Sorry, my fault - I interpreted your mail as saying 030 and 040 gave the
    same result.

    Back to the drawing board ... I've got kernel images back to 2.4.30 and 2.6.37 to try and test. I'm also trying with rt signals and alternate
    signal stack (rt signals show the same behaviour).

    2.4.30 behaves the same as recent kernels - stack corruption unless
    signal handlers use the alternate signal stack.

    We already knew the 030 bus error code hasn't seen changes since 2.6, so
    this isn't much of a surprise.

    I'll add some logging to the signal frame setup to print the user stack pointer, and perhaps do the same in the bus error handler (when hitting
    a moveml instruction).

    How often did a page fault happen when executing moveml, in other programs?

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Fri Apr 21 10:50:01 2023
    On Fri, 21 Apr 2023, Michael Schmitz wrote:


    How often did a page fault happen when executing moveml, in other
    programs?


    The printk() I placed in bus_error030() was conditional on the short word
    at the instruction pointer. It didn't consider all forms of movem, just
    0x48e7 which is the start of "moveml X,%sp@-". This matched page faults in
    many of the programs that executed while booting to single user mode. I
    suppose most of them don't use signal handlers in the same way dash does otherwise they would probably be unreliable too.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Fri Apr 21 11:20:01 2023
    Hi Finn,

    Am 21.04.2023 um 20:30 schrieb Finn Thain:
    On Fri, 21 Apr 2023, Michael Schmitz wrote:


    How often did a page fault happen when executing moveml, in other
    programs?


    The printk() I placed in bus_error030() was conditional on the short word
    at the instruction pointer. It didn't consider all forms of movem, just 0x48e7 which is the start of "moveml X,%sp@-". This matched page faults in many of the programs that executed while booting to single user mode. I suppose most of them don't use signal handlers in the same way dash does otherwise they would probably be unreliable too.

    OK; so too much noise unless filtered on the command name...

    I'll try first to get register state at the time of signal delivery from
    the sa_sigaction handler's ucontext parameter to see where the signal
    stack falls in relation to the call frames from your rec() function on
    the stack (and what the register contents were). Hope that won't be too
    noisy ... Then see how that changes with bus error handling forcing
    signal delivery.

    Shame we don't have similar code to find the MMU descriptors on 040.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sat Apr 22 10:00:01 2023
    This is a multi-part message in MIME format.
    Hi Finn,

    Am 21.04.2023 um 21:18 schrieb Michael Schmitz:
    Hi Finn,

    Am 21.04.2023 um 20:30 schrieb Finn Thain:
    On Fri, 21 Apr 2023, Michael Schmitz wrote:


    How often did a page fault happen when executing moveml, in other
    programs?


    The printk() I placed in bus_error030() was conditional on the short word
    at the instruction pointer. It didn't consider all forms of movem, just
    0x48e7 which is the start of "moveml X,%sp@-". This matched page
    faults in
    many of the programs that executed while booting to single user mode. I
    suppose most of them don't use signal handlers in the same way dash does
    otherwise they would probably be unreliable too.

    OK; so too much noise unless filtered on the command name...

    I'll try first to get register state at the time of signal delivery from
    the sa_sigaction handler's ucontext parameter to see where the signal
    stack falls in relation to the call frames from your rec() function on
    the stack (and what the register contents were). Hope that won't be too

    Took a little while to figure out that the ucontext format changed in
    the decade or two since my userland's libc headers were generated. With
    the correct format, the information stored on th signal frame made a lot
    more sense.

    Log of your test program (attached), instrumented to keep track of user
    stack pointer in the parent process, user stack pointer in the signal
    handler, and stack pointer, pc and exceptiopn frame format from the
    signal stack (only the last few signals shown):

    parent usp : 0xef97beb8
    handler tos : 0xef97bdc4
    handler usp : 0xef97bbe0
    signal usp : 0xef97bea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef969eb8
    handler tos : 0xef969dc4
    handler usp : 0xef969be0
    signal usp : 0xef969ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef9530d0
    handler tos : 0xef952fec
    handler usp : 0xef952e08
    signal usp : 0xef9530d0
    signal pc : 0x800006dc
    signal fmtv : 0x91929394

    parent usp : 0xef945eb8
    handler tos : 0xef945dc4
    handler usp : 0xef945be0
    signal usp : 0xef945ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef933eb8
    handler tos : 0xef933dc4
    handler usp : 0xef933be0
    signal usp : 0xef933ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef921edc
    handler tos : 0xef997984
    handler stack overwrote usp!
    handler usp : 0xef9977a0
    signal usp : 0xef997a64
    signal pc : 0x80000768
    signal fmtv : 0xa1a2a3a4

    Illegal instruction (core dumped)

    usp from the signal stack is below that of the parent process (before
    calling fork()).

    usp from the signal handler is below both of those. So far, so good.

    The top of the signal frame, however, is getting quite close to these
    stack pointers. In the last log, it has grown above the user stack pointer.

    Two things to note:

    - pc in the signal frame (from struct uc_mcontext) is either the return
    pc from the stack stuffing function, or something else I cannot work
    out. That part of ucontext appears valid.

    - what ought to be the frame format and vector offset does in fact hold
    varying longwords from the user stack. This information is not from
    struct uc_mcontext, but from extra information copied after struct
    ucontext ends. That wouldn't be there if at time of signal delivery,
    nothing had yet written to the area where the signal frame is stored.

    It appears that neither format/vector nor other exception frame
    information is stored there by the kernel signal code, so the registers
    that rec() had saved to the stack remain there. At time of signal
    delivery, the user stack appears to have grown into the area used for
    the signal stack, in spite of the user stack pointer saved after setting
    up the registers saying otherwise. I can't figure that out.

    At any rate, the constant part of the extra signal stack information
    does not take into account that we may end up storing additional
    information beyond that, in the case that the exception frame that got
    us into kernel mode is larger than four words. Maybe that is why the
    last stack frames logged do have the end of the signal stack go beyond
    the user stack pointer.

    The FPU state and exception frame part stored above the ucontext struct
    may well have a different size in your libc headers, or I may have made
    a mistake calculating the size of the extra information stored on the
    stack. You may want to undo my hack to use a local copy of the struct declarations and compile with the correct headers, maybe that does
    change the overall picture.

    noisy ... Then see how that changes with bus error handling forcing
    signal delivery.

    Haven't got to that bit yet ...

    Cheers,

    Michael


    Shame we don't have similar code to find the MMU descriptors on 040.

    Cheers,

    Michael



    #include <stdlib.h>
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    #include <string.h>
    #include <ucontext.h>

    /* copies of ucontext struct matching 6.x kernels */
    /* no extra stack contents */
    typedef struct k_ucontext {
    unsigned long uc_flags;
    struct ucontext *uc_link;
    stack_t uc_stack;
    mcontext_t uc_mcontext;
    sigset_t uc_sigmask; /* mask last for extensibility */
    } k_ucontext_t;
    /* extra stack contents (fpu state, exception frame part (> 4 words) */
    typedef struct k_ucontextf {
    unsigned long uc_flags;
    struct ucontext *uc_link;
    stack_t uc_stack;
    mcontext_t uc_mcontext;
    sigset_t uc_sigmask; /* mask last for extensibility */
    long int uc_filler[174];
    } k_ucontextf_t;

    int depth = 200000 /* 32768 ; 200000 */;

    const unsigned long i0 = 0x91929394;
    const unsigned long i1 = 0xa1a2a3a4;
    const unsigned long i2 = 0xb1b2b3b4;
    const unsigned long i3 = 0xc1c2c3c4;
    const unsigned long i4 = 0xd1d2d3d4;
    const unsigned long i5 = 0xe1e2e3e4;
    const unsigned long i6 = 0xf1f2f3f4;

    unsigned long o0;
    unsigned long o1;
    unsigned long o2;
    unsigned long o3;
    unsigned long o4;
    unsigned long o5;
    unsigned long o6;

    unsigned long parent_usp, child_usp;

    static void rec(void)
    {
    // initialize registers
    asm( " move.l %0, %%a2\n"
    " move.l %1, %%a3\n"
    " move.l %2, %%a4\n"
    " move.l %3, %%a5\n"
    " move.l %4, %%d2\n"
    " move.l %5, %%d3\n"
    " move.l %6, %%d4\n"
    :
    : "m" (i0), "m" (i1), "m" (i2),
    "m" (i3), "m" (i4), "m" (i5), "m" (i6)
    : "a2", "a3", "a4", "a5", "d2", "d3", "d4"
    );

    // note current usp
    asm( " move.l %%sp, %0\n"
    : "=m" (parent_usp)
    :
    : "memory"
    );

    // maybe fork a short-lived process
    if ((depth & 0x7ff) == 0)
    if (fork() == 0) {
    // note current usp
    asm( " move.l %%sp, %0\n"
    : "=m" (child_usp)
    :
    : "memory"
    );
    exit(0);
    }

    if (--depth)
    rec(); // callee to save & restore registers

    // compare register contents
    asm( " move.l %%a2, %0\n"
    " move.l %%a3, %1\n"
    " move.l %%a4, %2\n"
    " move.l %%a5, %3\n"
    " move.l %%d2, %4\n"
    " move.l %%d3, %5\n"
    " move.l %%d4, %6\n"
    : "=m" (o0), "=m" (o1), "=m" (o2),
    "=m" (o3), "=m" (o4), "=m" (o5), "=m" (o6)
    :
    : "memory" /* "a2", "a3", "a4", "a5", "d2", "d3", "d4" */
    );
    if (o0 != i0 || o1 != i1 || o2 != i2 ||
    o3 != i3 || o4 != i4 || o5 != i5 || o6 != i6)
    asm("illegal");
    }

    static void handler(int nevermind)
    {
    }

    static void rthandler(int signo, siginfo_t *info, void *ucontext)
    {
    /* use struct ucontext from libc headers */
    // ucontext_t *ctx = (ucontext_t *) ucontext;
    /* use struct ucontext from kernel source */
    k_ucontext_t *ctx = (k_ucontext_t *) ucontext;
    k_ucontextf_t *ctxf = (k_ucontextf_t *) ucontext;

    if (signo == SIGCHLD) {
    unsigned long usp, pc;
    int i;

    asm( " move.l %%sp, %0\n"
    : "=m" (usp)
    :
    : "memory"
    );
    fprintf(stderr, "parent usp : 0x%lx\n", parent_usp);
    // fprintf(stderr, "child usp : 0x%lx\n", child_usp);
    /* top of signal stack - using ucontext w/o uc_filler
    * Adding filler (which will take additional exception
    * frame contents for longer tha four word frames),
    * top of stack may end up above parent usp! */
    fprintf(stderr, "handler tos : 0x%lx\n", usp + sizeof(struct siginfo) + sizeof(struct k_ucontext) + 24);
    if (usp + sizeof(struct siginfo) + sizeof(struct k_ucontext) + 24 > parent_usp)
    fprintf(stderr, "handler stack overwrote usp!\n");
    fprintf(stderr, "handler usp : 0x%lx\n", usp);
    fprintf(stderr, "signal usp : 0x%lx\n", ctxf->uc_mcontext.gregs[15]);
    fprintf(stderr, "signal pc : 0x%lx\n", ctxf->uc_mcontext.gregs[16]);
    fprintf(stderr, "signal fmtv : 0x%lx\n", ctxf->uc_filler[54]);
    fprintf(stderr, "\n");
    }
    }

    int main(void)

    {
    struct sigaction act;
    stack_t ss;

    ss.ss_sp = malloc(SIGSTKSZ);
    if (ss.ss_sp == NULL) {
    perror("malloc");
    exit(EXIT_FAILURE);
    }

    ss.ss_size = SIGSTKSZ;
    ss.ss_flags = 0;
    if (sigaltstack(&ss, NULL) == -1) {
    perror("sigaltstack");
    exit(EXIT_FAILURE);
    }

    #ifdef USE_ALTSTACK
    fprintf(stderr, "alt sig stack: 0x%lx - 0x%lx\n", ss.ss_sp, ss.ss_sp + ss.ss_size);

    act.sa_flags = SA_ONSTACK;
    act.sa_flags |= SA_SIGINFO;
    act.sa_sigaction = rthandler;
    #else
    act.sa_flags = SA_SIGINFO;
    act.sa_handler = handler;
    #endif
    sigemptyset(&act.sa_mask);
    if (sigaction(SIGCHLD, &act, NULL) == -1) {
    perror("sigaction");
    exit(EXIT_FAILURE);
    }

    rec();
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Sat Apr 22 10:30:02 2023
    On Apr 22 2023, Michael Schmitz wrote:

    Took a little while to figure out that the ucontext format changed in the decade or two since my userland's libc headers were generated.

    In which way did it change?

    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sat Apr 22 10:20:01 2023
    Hi Andreas,

    Am 22.04.2023 um 20:07 schrieb Andreas Schwab:
    On Apr 22 2023, Michael Schmitz wrote:

    Took a little while to figure out that the ucontext format changed in the
    decade or two since my userland's libc headers were generated.

    In which way did it change?

    This is the definition from the kernel's
    include/uapi/asm-generic/ucontext.h:

    struct ucontext {
    unsigned long uc_flags;
    struct ucontext *uc_link;
    stack_t uc_stack;
    struct sigcontext uc_mcontext;
    sigset_t uc_sigmask; /* mask last for extensibility */
    };

    And this is /usr/include/sys/ucontext.h:

    /* Userlevel context. */
    typedef struct ucontext
    {
    unsigned long int uc_flags;
    struct ucontext *uc_link;
    __sigset_t uc_sigmask;
    stack_t uc_stack;
    mcontext_t uc_mcontext;
    long int uc_filler[174];
    } ucontext_t;

    uc_sigmask appears before uc_stack and uc_mcontext.

    I'm assuming libc just passes on what the kernel set, without reordering?

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sat Apr 22 11:10:01 2023
    Hi Finn,

    Am 22.04.2023 um 19:54 schrieb Michael Schmitz:
    Hi Finn,

    Am 21.04.2023 um 21:18 schrieb Michael Schmitz:
    Hi Finn,

    Am 21.04.2023 um 20:30 schrieb Finn Thain:
    On Fri, 21 Apr 2023, Michael Schmitz wrote:


    How often did a page fault happen when executing moveml, in other
    programs?


    The printk() I placed in bus_error030() was conditional on the short
    word
    at the instruction pointer. It didn't consider all forms of movem, just
    0x48e7 which is the start of "moveml X,%sp@-". This matched page
    faults in
    many of the programs that executed while booting to single user mode. I
    suppose most of them don't use signal handlers in the same way dash does >>> otherwise they would probably be unreliable too.

    OK; so too much noise unless filtered on the command name...

    I'll try first to get register state at the time of signal delivery from
    the sa_sigaction handler's ucontext parameter to see where the signal
    stack falls in relation to the call frames from your rec() function on
    the stack (and what the register contents were). Hope that won't be too

    Took a little while to figure out that the ucontext format changed in
    the decade or two since my userland's libc headers were generated. With
    the correct format, the information stored on th signal frame made a lot
    more sense.

    Log of your test program (attached), instrumented to keep track of user
    stack pointer in the parent process, user stack pointer in the signal handler, and stack pointer, pc and exceptiopn frame format from the
    signal stack (only the last few signals shown):

    parent usp : 0xef97beb8
    handler tos : 0xef97bdc4
    handler usp : 0xef97bbe0
    signal usp : 0xef97bea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef969eb8
    handler tos : 0xef969dc4
    handler usp : 0xef969be0
    signal usp : 0xef969ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef9530d0
    handler tos : 0xef952fec
    handler usp : 0xef952e08
    signal usp : 0xef9530d0
    signal pc : 0x800006dc
    signal fmtv : 0x91929394

    parent usp : 0xef945eb8
    handler tos : 0xef945dc4
    handler usp : 0xef945be0
    signal usp : 0xef945ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef933eb8
    handler tos : 0xef933dc4
    handler usp : 0xef933be0
    signal usp : 0xef933ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x800006ca

    parent usp : 0xef921edc
    handler tos : 0xef997984
    handler stack overwrote usp!
    handler usp : 0xef9977a0
    signal usp : 0xef997a64
    signal pc : 0x80000768
    signal fmtv : 0xa1a2a3a4

    Illegal instruction (core dumped)

    usp from the signal stack is below that of the parent process (before
    calling fork()).

    usp from the signal handler is below both of those. So far, so good.

    The top of the signal frame, however, is getting quite close to these
    stack pointers. In the last log, it has grown above the user stack pointer.

    Two things to note:

    - pc in the signal frame (from struct uc_mcontext) is either the return
    pc from the stack stuffing function, or something else I cannot work
    out. That part of ucontext appears valid.

    - what ought to be the frame format and vector offset does in fact hold varying longwords from the user stack. This information is not from
    struct uc_mcontext, but from extra information copied after struct
    ucontext ends. That wouldn't be there if at time of signal delivery,
    nothing had yet written to the area where the signal frame is stored.

    Actually, 0x800006ca (should be format/vector but more likely a return
    address saved on the stack by rec()) is the instruction after fork() in
    rec(), and 0xc009f37a is the instuction after trap 0 in fork(). The
    signal frame pc seems to be anywhere in rec() ...

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Sat Apr 22 12:40:01 2023
    On Apr 22 2023, Michael Schmitz wrote:

    This is the definition from the kernel's
    include/uapi/asm-generic/ucontext.h:

    That's not actually used by m68k, it uses
    arch/m68k/include/asm/ucontext.h, which confusingly isn't an uapi
    header.

    And this is /usr/include/sys/ucontext.h:

    /* Userlevel context. */
    typedef struct ucontext
    {
    unsigned long int uc_flags;
    struct ucontext *uc_link;
    __sigset_t uc_sigmask;
    stack_t uc_stack;
    mcontext_t uc_mcontext;
    long int uc_filler[174];
    } ucontext_t;

    uc_sigmask appears before uc_stack and uc_mcontext.

    Yes, that got fixed as part of commit 9c986f878a back in 2006.

    I'm assuming libc just passes on what the kernel set, without reordering?

    Trying to rewrite the signal context would be prohibitive, yes.

    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sat Apr 22 20:30:01 2023
    Hi Andreas,

    Am 22.04.2023 um 22:12 schrieb Andreas Schwab:
    On Apr 22 2023, Michael Schmitz wrote:

    This is the definition from the kernel's
    include/uapi/asm-generic/ucontext.h:

    That's not actually used by m68k, it uses
    arch/m68k/include/asm/ucontext.h, which confusingly isn't an uapi
    header.

    Argh - not sure how I missed that one.

    And this is /usr/include/sys/ucontext.h:

    /* Userlevel context. */
    typedef struct ucontext
    {
    unsigned long int uc_flags;
    struct ucontext *uc_link;
    __sigset_t uc_sigmask;
    stack_t uc_stack;
    mcontext_t uc_mcontext;
    long int uc_filler[174];
    } ucontext_t;

    uc_sigmask appears before uc_stack and uc_mcontext.

    Yes, that got fixed as part of commit 9c986f878a back in 2006.

    With the header corrected as per arch/m68k/include/asm/ucontext.h, I get
    the correct information in uc_filler[54] aka fmtvec now. Good. Every
    sigreturn possibly overwrites the user stack now (if the filler is used
    in full). Not so good.

    I'm assuming libc just passes on what the kernel set, without reordering?

    Trying to rewrite the signal context would be prohibitive, yes.

    Now I wonder who adds sigmask ... and whether that's also ending up on
    the user stack.

    Cheers,

    Michael




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Sat Apr 22 21:00:01 2023
    On Apr 23 2023, Michael Schmitz wrote:

    Now I wonder who adds sigmask ... and whether that's also ending up on the user stack.

    The kernel only writes the first 64 bits of the signal mask, as it does
    for all signal mask related syscalls. The kernel version of the context
    ends after that; since the user-space version is larger it actually
    extends into the next stack frame.

    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sat Apr 22 22:10:01 2023
    Hi Andreas,

    Am 23.04.2023 um 06:38 schrieb Andreas Schwab:
    On Apr 23 2023, Michael Schmitz wrote:

    Now I wonder who adds sigmask ... and whether that's also ending up on the >> user stack.

    The kernel only writes the first 64 bits of the signal mask, as it does
    for all signal mask related syscalls. The kernel version of the context
    ends after that; since the user-space version is larger it actually
    extends into the next stack frame.

    Yep, seen that now, but only with rt signals this gets copied at the
    very end of the signal context.

    I'll have to double check my calculations to see where this will
    actually end up on the stack in either case.

    I'll see whether the signal context is available on the stack even if
    the handler isn't passed that parameter.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Sat Apr 22 23:10:01 2023
    On Apr 23 2023, Michael Schmitz wrote:

    I'll see whether the signal context is available on the stack even if the handler isn't passed that parameter.

    The signal context is always on the stack, and used by the
    (rt_)sigreturn syscall.

    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sun Apr 23 03:50:02 2023
    Hi Andreas,

    Am 23.04.2023 um 08:46 schrieb Andreas Schwab:
    On Apr 23 2023, Michael Schmitz wrote:

    I'll see whether the signal context is available on the stack even if the
    handler isn't passed that parameter.

    The signal context is always on the stack, and used by the
    (rt_)sigreturn syscall.

    True, but at the time the signal handler (sa_handler type) runs, all I
    have is the user stack pointer upon entry to the handler. I'll have to calculate back from that address, if that is possible.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Sun Apr 23 10:00:01 2023
    On Apr 23 2023, Michael Schmitz wrote:

    Hi Andreas,

    Am 23.04.2023 um 08:46 schrieb Andreas Schwab:
    On Apr 23 2023, Michael Schmitz wrote:

    I'll see whether the signal context is available on the stack even if the >>> handler isn't passed that parameter.

    The signal context is always on the stack, and used by the
    (rt_)sigreturn syscall.

    True, but at the time the signal handler (sa_handler type) runs, all I
    have is the user stack pointer upon entry to the handler. I'll have to calculate back from that address, if that is possible.

    The signal handler always receives all three arguments.
    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Sun Apr 23 10:10:01 2023
    On Wed, 19 Apr 2023, Michael Schmitz wrote:

    I wonder what we'd see if we patched the kernel to log every user data write fault caused by a MOVEM instruction. I'll try to code that up.

    If these instructions did always cause stack corruption on 030, I think
    we would have noticed long ago?


    I think it probably was noticed long ago, in the form of rare userland
    crashes on 68030. But it was probably never reported because the actual
    culprit is too distant from the symptoms.

    But I take your point -- signal delivery seems to be crucial. Would it be difficult to skip signal delivery following a bus error? Perhaps there's
    no need to try that experiment, as we know what would happen.

    I will take a look at your modified test program and try to use the output
    to figure out the stack gymnastics.

    IIUC, there are two RTEs following the page fault. The first one runs the signal handler, the second one resumes the MOVEM that faulted. Maybe we'll
    have to intercept the latter (at do_sigreturn() perhaps?) and examine that exception frame.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Sun Apr 23 10:50:01 2023
    On Apr 23 2023, Michael Schmitz wrote:

    Wasn't too hard actually. The signo parameter passed to the handler turns
    out to be passed by reference, and signo is located 4 bytes into the
    kernel sigframe.

    That's not "passed by reference". Function arguments are always passed
    on the stack.

    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Sun Apr 23 10:20:02 2023
    Hi Finn, Andreas,

    Am 23.04.2023 um 13:41 schrieb Michael Schmitz:
    Hi Andreas,

    Am 23.04.2023 um 08:46 schrieb Andreas Schwab:
    On Apr 23 2023, Michael Schmitz wrote:

    I'll see whether the signal context is available on the stack even if
    the
    handler isn't passed that parameter.

    The signal context is always on the stack, and used by the
    (rt_)sigreturn syscall.

    True, but at the time the signal handler (sa_handler type) runs, all I
    have is the user stack pointer upon entry to the handler. I'll have to calculate back from that address, if that is possible.

    Wasn't too hard actually. The signo parameter passed to the handler
    turns out to be passed by reference, and signo is located 4 bytes into
    the kernel sigframe. The address of the sigcontext struct follows from
    that.

    Result for the original test code that Finn posted (using sa_handler):

    parent usp : 0xefa2f010
    handler tos : 0xefa2efa0
    frame end : 0xefa2efb4
    frame start : 0xefa2ee78
    handler usp : 0xefa2ee60
    signal usp : 0xefa2f004
    signal pc : 0x80000588
    signal fmtv : 0xb008
    parent usp : 0xef957ed4
    handler tos : 0xef957eb0
    frame end : 0xef957ec4
    frame start : 0xef957d88
    handler usp : 0xef957d70
    signal usp : 0xef957ec4
    signal pc : 0xc009f37a
    signal fmtv : 0x80
    parent usp : 0xef957ed4
    handler tos : 0xef957eb0
    frame end : 0xef957ec4
    frame start : 0xef957d88
    handler usp : 0xef957d70
    signal usp : 0xef957ec4
    signal pc : 0xc009f37a
    signal fmtv : 0x80
    parent usp : 0xef945ed4
    handler tos : 0xef945eb0
    frame end : 0xef945ec4
    frame start : 0xef945d88
    handler usp : 0xef945d70
    signal usp : 0xef945ec4
    signal pc : 0xc009f37a
    signal fmtv : 0x80
    parent usp : 0xef921ef8
    handler tos : 0xef9b0f88
    handler stack overwrote usp!
    frame end : 0xef9b0f9c
    frame start : 0xef9b0e60
    handler usp : 0xef9b0e48
    signal usp : 0xef9b0fec
    signal pc : 0x8000067e
    signal fmtv : 0xb008
    Illegal instruction (core dumped)

    In this case, only on the last recursion before the core dump does the
    handler warn that the end of the kernel sigframe (shown as frame end)
    lies above the current stack pointer from the main program.

    That nicely explains the overwriting of the saved register a3 (which did
    point to the stack canary) and subsequent stack corruption abort.

    Though the question remains - is this expected behaviour for programs
    that do deep recursion on the stack while taking signals (and the reason
    for the option to run signal handlers on an alternate stack)?

    And why does this almost always appear to happen after bus error
    exceptions (frame format b)? The extra exception stack information isn't
    even accounted for in the above frame end address!

    Result with sa_sigaction handler:

    parent usp : 0xef969e28
    handler tos : 0xef969e6c
    handler stack overwrote usp!
    frame end : 0xef969e7c
    frame start : 0xef969b58
    handler usp : 0xef969b40
    signal usp : 0xef969e04
    signal pc : 0x80000696
    signal fmtv : 0x114

    parent usp : 0xef955008
    handler tos : 0xef955064
    handler stack overwrote usp!
    frame end : 0xef955074
    frame start : 0xef954d50
    handler usp : 0xef954d38
    signal usp : 0xef954ffc
    signal pc : 0x80000680
    signal fmtv : 0xb008

    parent usp : 0xef945eb8
    handler tos : 0xef945f0c
    handler stack overwrote usp!
    frame end : 0xef945f1c
    frame start : 0xef945bf8
    handler usp : 0xef945be0
    signal usp : 0xef945ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x80

    parent usp : 0xef933eb8
    handler tos : 0xef933f0c
    handler stack overwrote usp!
    frame end : 0xef933f1c
    frame start : 0xef933bf8
    handler usp : 0xef933be0
    signal usp : 0xef933ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x80

    parent usp : 0xef921edc
    handler tos : 0xef9aaca4
    handler stack overwrote usp!
    frame end : 0xef9aacb4
    frame start : 0xef9aa990
    handler usp : 0xef9aa978
    signal usp : 0xef9aac40
    signal pc : 0x80000782
    signal fmtv : 0x114

    Illegal instruction (core dumped)

    Exception right before crash was an interrupt in this case (only seen
    that once in this context, though I've seen lots of those in the course
    of the test runs). Frame start calculated from siginfo pointer value in
    this case.

    I'll clean up the test code and combine all signal use cases into a
    single source file so Stan and Finn can give this a spin on their hardware.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Sun Apr 23 11:40:01 2023
    On Sun, 23 Apr 2023, Michael Schmitz wrote:

    Am 23.04.2023 um 13:41 schrieb Michael Schmitz:

    Though the question remains - is this expected behaviour for programs
    that do deep recursion on the stack while taking signals (and the reason
    for the option to run signal handlers on an alternate stack)?


    I don't understand how "deep recursion" can be used to explain this. We've
    seen crashes with only 1.8 MB of stack usage.

    The best reason I can think of for having a signal stack would be that it
    may be better for signal delivery to fail than for the target process to
    fail. But I've no idea whether the kernel makes that kind of defensive programming possible (?)

    And why does this almost always appear to happen after bus error exceptions (frame format b)? The extra exception stack information isn't even accounted for in the above frame end address!

    Result with sa_sigaction handler:

    parent usp : 0xef969e28
    handler tos : 0xef969e6c
    handler stack overwrote usp!
    frame end : 0xef969e7c
    frame start : 0xef969b58
    handler usp : 0xef969b40
    signal usp : 0xef969e04
    signal pc : 0x80000696
    signal fmtv : 0x114

    parent usp : 0xef955008
    handler tos : 0xef955064
    handler stack overwrote usp!
    frame end : 0xef955074
    frame start : 0xef954d50
    handler usp : 0xef954d38
    signal usp : 0xef954ffc
    signal pc : 0x80000680
    signal fmtv : 0xb008

    parent usp : 0xef945eb8
    handler tos : 0xef945f0c
    handler stack overwrote usp!
    frame end : 0xef945f1c
    frame start : 0xef945bf8
    handler usp : 0xef945be0
    signal usp : 0xef945ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x80

    parent usp : 0xef933eb8
    handler tos : 0xef933f0c
    handler stack overwrote usp!
    frame end : 0xef933f1c
    frame start : 0xef933bf8
    handler usp : 0xef933be0
    signal usp : 0xef933ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x80

    parent usp : 0xef921edc
    handler tos : 0xef9aaca4
    handler stack overwrote usp!
    frame end : 0xef9aacb4
    frame start : 0xef9aa990
    handler usp : 0xef9aa978
    signal usp : 0xef9aac40
    signal pc : 0x80000782
    signal fmtv : 0x114

    Illegal instruction (core dumped)


    I don't understand these results. If usp was really overwritten, the
    program would have crashed early, no?

    Exception right before crash was an interrupt in this case (only seen
    that once in this context, though I've seen lots of those in the course
    of the test runs). Frame start calculated from siginfo pointer value in
    this case.


    I didn't realize that you could get a crash from a signal delivered
    following an interrupt. I'll try to modify the kernel such that signals
    are not delivered after page faults.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Finn Thain on Sun Apr 23 22:50:01 2023
    Hi Finn,

    On 23/04/23 21:23, Finn Thain wrote:
    On Sun, 23 Apr 2023, Michael Schmitz wrote:

    Am 23.04.2023 um 13:41 schrieb Michael Schmitz:

    Though the question remains - is this expected behaviour for programs
    that do deep recursion on the stack while taking signals (and the reason
    for the option to run signal handlers on an alternate stack)?

    I don't understand how "deep recursion" can be used to explain this. We've seen crashes with only 1.8 MB of stack usage.
    OK, it's not really deep (though I've managed to get the test case
    aborted by the oom killer once on my rather puny RAM). But it's putting
    lots of frames on the stack in a short span while also utilizing the
    stack for signal delivery.
    The best reason I can think of for having a signal stack would be that it
    may be better for signal delivery to fail than for the target process to fail. But I've no idea whether the kernel makes that kind of defensive programming possible (?)

    I don't think there's any provision for signal delivery to fail - the
    signal handler is started from the return-to-userspace code in entry.S,
    and upon return from the handler, a sigreturn syscall is automatically
    executed to clean up the stack. As long as the handler returns, all's fine.

    Not sure what happens if the process context that the handler runs in is
    killed by the kernel - I suppose the entire process is killed and the
    context removed, so the issue of parent process survival is moot. But
    I'm sure we can place an illegal instruction in the handler as soon as a
    stack overflow is spotted, get a dump and look at that.

    And why does this almost always appear to happen after bus error exceptions >> (frame format b)? The extra exception stack information isn't even accounted >> for in the above frame end address!

    Result with sa_sigaction handler:

    parent usp : 0xef969e28
    handler tos : 0xef969e6c
    handler stack overwrote usp!
    frame end : 0xef969e7c
    frame start : 0xef969b58
    handler usp : 0xef969b40
    signal usp : 0xef969e04
    signal pc : 0x80000696
    signal fmtv : 0x114

    parent usp : 0xef955008
    handler tos : 0xef955064
    handler stack overwrote usp!
    frame end : 0xef955074
    frame start : 0xef954d50
    handler usp : 0xef954d38
    signal usp : 0xef954ffc
    signal pc : 0x80000680
    signal fmtv : 0xb008

    parent usp : 0xef945eb8
    handler tos : 0xef945f0c
    handler stack overwrote usp!
    frame end : 0xef945f1c
    frame start : 0xef945bf8
    handler usp : 0xef945be0
    signal usp : 0xef945ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x80

    parent usp : 0xef933eb8
    handler tos : 0xef933f0c
    handler stack overwrote usp!
    frame end : 0xef933f1c
    frame start : 0xef933bf8
    handler usp : 0xef933be0
    signal usp : 0xef933ea8
    signal pc : 0xc009f37a
    signal fmtv : 0x80

    parent usp : 0xef921edc
    handler tos : 0xef9aaca4
    handler stack overwrote usp!
    frame end : 0xef9aacb4
    frame start : 0xef9aa990
    handler usp : 0xef9aa978
    signal usp : 0xef9aac40
    signal pc : 0x80000782
    signal fmtv : 0x114

    Illegal instruction (core dumped)

    I don't understand these results. If usp was really overwritten, the
    program would have crashed early, no?
    I think we're still at the point where rec() is called recursively,
    before any returns.
    Exception right before crash was an interrupt in this case (only seen
    that once in this context, though I've seen lots of those in the course
    of the test runs). Frame start calculated from siginfo pointer value in
    this case.

    I didn't realize that you could get a crash from a signal delivered
    following an interrupt. I'll try to modify the kernel such that signals
    are not delivered after page faults.

    Yes, that was news to me, too. I've got swap enabled and probably see a
    lot more disk I/O than on your machines.

    Delaying signal return until the next syscall or interrupt after page
    fault ought not be too hard - just replace the 'jra ret_from_exception'
    by 'RESTORE_ALL' (though that would also defer rescheduling until the
    next interrupt). For a proper solution, replicate exit_work without a
    call to do_signal_return ...

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Andreas Schwab on Sun Apr 23 22:20:01 2023
    Hi Andreas,

    On 23/04/23 20:23, Andreas Schwab wrote:
    On Apr 23 2023, Michael Schmitz wrote:

    Wasn't too hard actually. The signo parameter passed to the handler turns
    out to be passed by reference, and signo is located 4 bytes into the
    kernel sigframe.
    That's not "passed by reference". Function arguments are always passed
    on the stack.

    You're right of course, I should have phrased that different.

    signo is passed by value as the first argument on the stack, but I can
    take a reference to signo to find it's address on the stack. That in
    turn will tell me the address of the complete signal frame on the stack.

    This works because at the time the handler is entered, the stack points
    to the return address which has been set up by setup_frame, and that's
    the first longword in struct sigframe (frame.pretcode; points to
    frame.retcode which holds the trampoline code that will call
    sys_sigreturn upon exit from the handler). This in turn works because
    usp is loaded with the address of the sigframe right before calling the handler.

    In effect, at least indirectly the signal handler receives all two
    arguments that matter (signo and sigcontext) even though sigcontext is
    not explicitly passed on the stack (the third longword is frame.code
    which holds the vector number AFAICS). Not sure what third argument you referred to in another mail.

    With the rt_sigframe, pointers to siginfo and ucontext are stored after
    signo so no stack arithmetics is needed there to access all three
    arguments.

    Anyway, the whole point of the exercise was to see whether I can peek at
    the registers in the signal frame and their saved copies on the stack,
    to perhaps spot when the stack corruption begins to happen. I know -
    should have used a debugger in the first place instead instrumenting a
    test case.

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Finn Thain on Sun Apr 23 23:30:01 2023
    Hi Finn,

    On 23/04/23 19:46, Finn Thain wrote:
    On Wed, 19 Apr 2023, Michael Schmitz wrote:

    I wonder what we'd see if we patched the kernel to log every user data
    write fault caused by a MOVEM instruction. I'll try to code that up.
    If these instructions did always cause stack corruption on 030, I think
    we would have noticed long ago?

    I think it probably was noticed long ago, in the form of rare userland crashes on 68030. But it was probably never reported because the actual culprit is too distant from the symptoms.

    But I take your point -- signal delivery seems to be crucial. Would it be difficult to skip signal delivery following a bus error? Perhaps there's
    no need to try that experiment, as we know what would happen.
    Shouldn't be too hard, see my other mail.
    I will take a look at your modified test program and try to use the output
    to figure out the stack gymnastics.

    IIUC, there are two RTEs following the page fault. The first one runs the signal handler, the second one resumes the MOVEM that faulted. Maybe we'll have to intercept the latter (at do_sigreturn() perhaps?) and examine that exception frame.

    There's no second RTE as far as I can see - upon return from buserr_c,
    the asm buserr handler jumps to ret_from_exception. Seeing as the bus
    error was taken from user space, ret_from_exception proceeds to resume_userspace, and seeing the task info flags field non-zero, jumps
    to exit_work where with signal pending, a jump to do_signal_return is
    taken and the signal handler is set up (frame setup to return through
    the sigreturn trampoline, pc set to hander etc). No rte anywhere on that
    path. After setting up for the signal handler, we return to
    resume_userspace and no further signals pending, hit RESTORE_ALL which
    restores registes from the pt_regs struct on the kernel stack, and has
    the rte instruction at the end. We had earlier set usp to the signal
    frame and pc to the signal handler, so that is now run after resuming
    user mode after the rte instruction.

    Exiting from the signal handler, sys_sigreturn runs and cleans up the
    user stack, then returns to the instruction at the pc from the saved
    exception frame that got us into kernel mode in the first instance. This
    is the moment the moveml instruction resumes.

    There should be no difference between ret_from_exception (after buserr)
    jumping to RESTORE_ALL directly (with exception frame still on the
    kernel stack from the bus error exception) and doing so after the detour through signal hander setup, signal handler and sys_sigreturn cleanup.
    If the exception frame on the stack was any different from what it ought
    to be, rte would fail and raise a format error exception.

    If the frame was different from that needed to complete the bus error exception, f.e. one from a trap exception, we'd fail to resume that
    moveml instruction and do something else instead. Hmmm - that's an
    interesting fault mode... might explain why a3 wasn't saved as it ought
    to have been? Can we 'poison' the user stack area that will be used for register save upon rec() entry with some other patterns to prove that
    moveml sometimes does not complete after the bus error?

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Mon Apr 24 00:10:01 2023
    On Apr 24 2023, Michael Schmitz wrote:

    Not sure what third argument you referred to in another mail.

    See struct sigframe and struct rt_sigframe. The non-rt signal handler
    gets signal number, vector number and sigcontext*. The rt signal
    handler gets signal number, siginfo* and ucontext*.

    --
    Andreas Schwab, [email protected]
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Andreas Schwab on Mon Apr 24 06:00:01 2023
    This is a multi-part message in MIME format.
    Hi Andreas,

    On 24/04/23 09:48, Andreas Schwab wrote:
    On Apr 24 2023, Michael Schmitz wrote:

    Not sure what third argument you referred to in another mail.
    See struct sigframe and struct rt_sigframe. The non-rt signal handler
    gets signal number, vector number and sigcontext*. The rt signal
    handler gets signal number, siginfo* and ucontext*.

    Thanks, I see now. Got confused by the sigaction man page (despite
    working out that it's all there on the stack before...). Might need a
    comment in the code (or an update to the man pages).

    I've rewritten my test program to make the non-rt handler take three
    arguments, just to simplify things. Also fixed the end of signal frame calculation for the non-rt case where the exception places additional
    data on the stack.

    Running with the non-rt handler, the crash appears to happen right at
    the end of the recursion (or at least, I take no further SIGCHLD on the
    way back up the stack). With the rt handler, I see the stack depth
    decreased on the last signal taken before the crash.

    When I enable dumping the extra exception frame contents (which will
    show prior stack contents when the exception only used a four-word
    frame) in the rt handler case, I only see saved register data placed
    there at the very end. That's different from previous tests where I saw
    the saved register patterns all the time. (but might have had the
    offsets wrong).

    I'll see what peeking at the registers shows (now that I can be
    confident I have got the offsets correct).

    Cheers,

    Michael




    #include <stdlib.h>
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    #include <string.h>
    #include <ucontext.h>
    #include <asm/ptrace.h>

    /* how deep at recursion */
    int depth = 200000;
    const int max_depth = 200000;
    /* how deep still on the stack while returning */
    int rdepth = 0;

    const unsigned long i0 = 0x91929394;
    const unsigned long i1 = 0xa1a2a3a4;
    const unsigned long i2 = 0xb1b2b3b4;
    const unsigned long i3 = 0xc1c2c3c4;
    const unsigned long i4 = 0xd1d2d3d4;
    const unsigned long i5 = 0xe1e2e3e4;
    const unsigned long i6 = 0xf1f2f3f4;

    unsigned long o0;
    unsigned long o1;
    unsigned long o2;
    unsigned long o3;
    unsigned long o4;
    unsigned long o5;
    unsigned long o6;

    unsigned long parent_usp;

    /* extra frame size used for sa_handler case */
    static const int frame_size_change[16] = {
    [1] = -1, /* sizeof_field(struct frame, un.fmt1), */
    [2] = 4, /* sizeof_field(struct frame, un.fmt2), */
    [3] = 4, /* sizeof_field(struct frame, un.fmt3), */
    [4] = 8, /* sizeof_field(struct frame, un.fmt4), */
    [5] = -1, /* sizeof_field(struct frame, un.fmt5), */
    [6] = -1, /* sizeof_field(struct frame, un.fmt6), */
    [7] = 52, /* sizeof_field(struct frame, un.fmt7), */
    [8] = -1, /* sizeof_field(struct frame, un.fmt8), */
    [9] = 12, /* sizeof_field(struct frame, un.fmt9), */
    [10] = 20, /* sizeof_field(struct frame, un.fmta), */
    [11] = 84, /* sizeof_field(struct frame, un.fmtb), */
    [12] = -1, /* sizeof_field(struct frame, un.fmtc), */
    [13] = -1, /* sizeof_field(struct frame, un.fmtd), */
    [14] = -1, /* sizeof_field(struct frame, un.fmte), */
    [15] = -1, /* sizeof_field(struct frame, un.fmtf), */
    };

    static inline int frame_extra_sizes(int f)
    {
    return frame_size_change[f];
    }

    static void rec(void)
    {
    // initialize registers
    asm( " move.l %0, %%a2\n"
    " move.l %1, %%a3\n"
    " move.l %2, %%a4\n"
    " move.l %3, %%a5\n"
    " move.l %4, %%d2\n"
    " move.l %5, %%d3\n"
    " move.l %6, %%d4\n"
    :
    : "m" (i0), "m" (i1), "m" (i2),
    "m" (i3), "m" (i4), "m" (i5), "m" (i6)
    : "a2", "a3", "a4", "a5", "d2", "d3", "d4"
    );

    // note current usp
    asm( " move.l %%sp, %0\n"
    : "=m" (parent_usp)
    :
    : "memory"
    );

    // maybe fork a short-lived process
    if ((depth & 0x7ff) == 0)
    if (fork() == 0)
    exit(0);

    if (--depth)
    rec(); // callee to save & restore registers

    // compare register contents
    asm( " move.l %%a2, %0\n"
    " move.l %%a3, %1\n"
    " move.l %%a4, %2\n"
    " move.l %%a5, %3\n"
    " move.l %%d2, %4\n"
    " move.l %%d3, %5\n"
    " move.l %%d4, %6\n"
    : "=m" (o0), "=m" (o1), "=m" (o2),
    "=m" (o3), "=m" (o4), "=m" (o5), "=m" (o6)
    :
    : "memory" /* "a2", "a3", "a4", "a5", "d2", "d3", "d4" */
    );
    if (o0 != i0 || o1 != i1 || o2 != i2 ||
    o3 != i3 || o4 != i4 || o5 != i5 || o6 != i6)
    asm("illegal");

    rdepth++;
    }

    static void handler(int signo, int vector, struct sigcontext *ctx)
    {
    // use if only taking signo argument
    // struct sigcontext *ctx = (struct sigcontext *) ((unsigned long) &signo + 24);
    unsigned long *frame = (unsigned long *)((unsigned long) &signo - 4);

    if (signo == SIGCHLD) {
    unsigned long usp, end_frame, extra;
    int i;

    asm( " move.l %%sp, %0\n"
    : "=m" (usp)
    :
    : "memory"
    );

    fprintf(stderr, "stack depth : %d\n", max_depth - depth);
    fprintf(stderr, "retn. depth : %d\n", max_depth - rdepth);
    fprintf(stderr, "parent usp : 0x%lx\n", parent_usp);
    /* top of signal stack - including extra size for
    * exception frames longer than four words
    */
    // end_frame = (unsigned long) frame + 32 + sizeof(struct sigcontext);
    extra = frame_extra_sizes(ctx->sc_formatvec >> 12);
    end_frame = (unsigned long) ctx + sizeof(struct sigcontext) + extra;
    if (end_frame > parent_usp)
    fprintf(stderr, "signal frame overwrote parent usp!\n");
    fprintf(stderr, "frame end : 0x%lx\n", end_frame);
    fprintf(stderr, "frame start : 0x%lx\n", frame);
    fprintf(stderr, "handler usp : 0x%lx\n", usp);
    // fprintf(stderr, "signal vec : 0x%lx\n", vector);
    fprintf(stderr, "signal usp : 0x%lx\n", ctx->sc_usp);
    fprintf(stderr, "signal pc : 0x%lx\n", ctx->sc_pc);
    fprintf(stderr, "signal fmtv : 0x%lx\n", ctx->sc_formatvec);
    fprintf(stderr, "\n");
    }
    }

    static void rthandler(int signo, siginfo_t *info, void *ucontext)
    {
    ucontext_t *ctx = (ucontext_t *) ucontext;
    unsigned long *frame = (unsigned long *)((unsigned long) &signo - 4);

    if (signo == SIGCHLD) {
    unsigned long usp, end_frame;
    int i;

    asm( " move.l %%sp, %0\n"
    : "=m" (usp)
    :
    : "memory"
    );
    fprintf(stderr, "stack depth : %d\n", max_depth - depth);
    fprintf(stderr, "retn. depth : %d\n", max_depth - rdepth);
    fprintf(stderr, "parent usp : 0x%lx\n", parent_usp);
    /* top of signal stack - extra size for exception
    * frames longer than four words follows uc_filler[54]
    */
    // end_frame = (unsigned long) info + sizeof(struct siginfo) + sizeof(struct ucontext);
    end_frame = (unsigned long) ucontext + sizeof(struct ucontext);
    if (end_frame > parent_usp)
    fprintf(stderr, "signal stack overwrote parent usp!\n");
    fprintf(stderr, "frame end : 0x%lx\n", end_frame);
    fprintf(stderr, "frame start : 0x%lx\n", frame);
    fprintf(stderr, "handler usp : 0x%lx\n", usp);
    fprintf(stderr, "signal usp : 0x%lx\n", ctx->uc_mcontext.gregs[15]);
    fprintf(stderr, "signal pc : 0x%lx\n", ctx->uc_mcontext.gregs[16]);
    fprintf(stderr, "signal fmtv : 0x%lx\n", ctx->uc_filler[54]); #if 0
    /* dump frame contents not used for exception frame extra */
    if (frame_extra_sizes(ctx->uc_filler[54] >> 12) <= 0)
    for (i=0; i < 26; i++) {
    if ( !(i & 3) ) fprintf(stderr, "\nextra %d\t", i);
    fprintf(stderr, "0x%lx\t", ctx->uc_filler[55+i]);
    }
    #endif
    fprintf(stderr, "\n");
    #if 0
    /* dump stored registers */
    fprintf(stderr, "signal stack: 0x%lx\n", ctx->uc_stack.ss_sp);
    fprintf(stderr, "signal usp : 0x%lx\n", ctx->uc_mcontext.gregs[15]);
    fprintf(stderr, "signal pc : 0x%lx\n", ctx->uc_mcontext.gregs[16]);
    fprintf(stderr, "signal f.d1 : 0x%lx\n", ctx->uc_filler[41]);
    fprintf(stderr, "signal f.d2 : 0x%lx\n", ctx->uc_filler[42]);
    fprintf(stderr, "signal f.d3 : 0x%lx\n", ctx->uc_filler[43]);
    fprintf(stderr, "signal f.d4 : 0x%lx\n", ctx->uc_filler[44]);
    fprintf(stderr, "signal f.d5 : 0x%lx\n", ctx->uc_filler[45]);
    fprintf(stderr, "signal f.a0 : 0x%lx\n", ctx->uc_filler[46]);
    fprintf(stderr, "signal f.a1 : 0x%lx\n", ctx->uc_filler[47]);
    fprintf(stderr, "signal f.a2 : 0x%lx\n", ctx->uc_filler[48]);
    fprintf(stderr, "signal f.d0 : 0x%lx\n", ctx->uc_filler[49]);
    fprintf(stderr, "signal f.od0: 0x%lx\n", ctx->uc_filler[50]);
    fprintf(stderr, "signal f.sad: 0x%lx\n", ctx->uc_filler[51]);
    fprintf(stderr, "signal f.sr : 0x%lx\n", ctx->uc_filler[52]);
    fprintf(stderr, "signal f.pc : 0x%lx\n", ctx->uc_filler[53]);
    fprintf(stderr, "signal a2 : 0x%lx\n", ctx->uc_mcontext.gregs[10]);
    fprintf(stderr, "signal a3 : 0x%lx\n", ctx->uc_mcontext.gregs[11]);
    fprintf(stderr, "signal a4 : 0x%lx\n", ctx->uc_mcontext.gregs[12]);
    fprintf(stderr, "signal a5 : 0x%lx\n", ctx->uc_mcontext.gregs[13]);
    fprintf(stderr, "signal d2 : 0x%lx\n", ctx->uc_mcontext.gregs[2]);
    fprintf(stderr, "signal d3 : 0x%lx\n", ctx->uc_mcontext.gregs[3]);
    fprintf(stderr, "signal d4 : 0x%lx\n\n", ctx->uc_mcontext.gregs[4]);
    #endif
    }
    }

    int main(int use_rt)

    {
    struct sigaction act;
    stack_t ss;

    use_rt--;
    /* poor man's getopt - use none, one or two args */
    fprintf(stderr, "use_rt = %d\n", use_rt);

    ss.ss_sp = malloc(SIGSTKSZ);
    if (ss.ss_sp == NULL) {
    perror("malloc");
    exit(EXIT_FAILURE);
    }

    ss.ss_size = SIGSTKSZ;
    ss.ss_flags = 0;

    if (use_rt & 2)
    fprintf(stderr, "alt stack 0x%lx - 0x%lx\n", ss.ss_sp, ss.ss_sp + ss.ss_size);

    if (sigaltstack(&ss, NULL) == -1) {
    perror("sigaltstack");
    exit(EXIT_FAILURE);
    }

    memset(&act, 0, sizeof(act));
    sigemptyset(&act.sa_mask);

    if (use_rt) {
    act.sa_flags = SA_SIGINFO;
    if (use_rt & 2)
    act.sa_flags |= SA_ONSTACK;
    fprintf(stderr, "using sa_sigaction\n");
    act.sa_sigaction = rthandler;
    } else {
    fprintf(stderr, "using sa_handler\n");
    act.sa_handler = handler;
    }

    if (sigaction(SIGCHLD, &act, NULL) == -1) {
    perror("sigaction");
    exit(EXIT_FAILURE);
    }

    rec();
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Mon Apr 24 07:50:01 2023
    Hi Finn,

    Am 24.04.2023 um 15:51 schrieb Michael Schmitz:
    Hi Andreas,

    On 24/04/23 09:48, Andreas Schwab wrote:
    On Apr 24 2023, Michael Schmitz wrote:

    Not sure what third argument you referred to in another mail.
    See struct sigframe and struct rt_sigframe. The non-rt signal handler
    gets signal number, vector number and sigcontext*. The rt signal
    handler gets signal number, siginfo* and ucontext*.

    Thanks, I see now. Got confused by the sigaction man page (despite
    working out that it's all there on the stack before...). Might need a
    comment in the code (or an update to the man pages).

    I've rewritten my test program to make the non-rt handler take three arguments, just to simplify things. Also fixed the end of signal frame calculation for the non-rt case where the exception places additional
    data on the stack.

    Running with the non-rt handler, the crash appears to happen right at
    the end of the recursion (or at least, I take no further SIGCHLD on the
    way back up the stack). With the rt handler, I see the stack depth
    decreased on the last signal taken before the crash.

    When I enable dumping the extra exception frame contents (which will
    show prior stack contents when the exception only used a four-word
    frame) in the rt handler case, I only see saved register data placed
    there at the very end. That's different from previous tests where I saw
    the saved register patterns all the time. (but might have had the
    offsets wrong).

    I'll see what peeking at the registers shows (now that I can be
    confident I have got the offsets correct).

    Last two signals received before core dump below:

    stack depth : 197952
    retn. depth : 200000
    parent usp : 0xef933eb8
    signal stack overwrote parent usp!
    frame end : 0xef933f1c
    frame start : 0xef933bf8
    handler usp : 0xef933be0
    signal usp : 0xef933ea8
    signal pc : 0xc009f37a

    signal stack: 0x80004050
    signal usp : 0xef933ea8
    signal pc : 0xc009f37a
    signal f.d1 : 0x0
    signal f.d2 : 0x0
    signal f.d3 : 0x0
    signal f.d4 : 0x0
    signal f.d5 : 0x0
    signal f.a0 : 0x0
    signal f.a1 : 0x0
    signal f.a2 : 0x0
    signal f.d0 : 0x0
    signal f.od0: 0x0
    signal f.sad: 0x0
    signal f.sr : 0x0
    signal f.pc : 0x0
    signal fmtv : 0x80
    signal a2 : 0x91929394
    signal a3 : 0xa1a2a3a4
    signal a4 : 0xb1b2b3b4
    signal a5 : 0xc0104e0c
    signal d2 : 0xd1d2d3d4
    signal d3 : 0xe1e2e3e4
    signal d4 : 0xf1f2f3f4

    stack depth : 200000
    retn. depth : 193942
    parent usp : 0xef921edc
    signal stack overwrote parent usp!
    frame end : 0xef95733c
    frame start : 0xef957018
    handler usp : 0xef957000
    signal usp : 0xef9572c4
    signal pc : 0x8000073a

    signal stack: 0x80004050
    signal usp : 0xef9572c4
    signal pc : 0x8000073a
    signal f.d1 : 0xa1a2a3a4
    signal f.d2 : 0xb1b2b3b4
    signal f.d3 : 0xc1c2c3c4
    signal f.d4 : 0xef957250
    signal f.d5 : 0x80000722
    signal f.a0 : 0xd1d2d3d4
    signal f.a1 : 0xe1e2e3e4
    signal f.a2 : 0xf1f2f3f4
    signal f.d0 : 0x91929394
    signal f.od0: 0xa1a2a3a4
    signal f.sad: 0xb1b2b3b4
    signal f.sr : 0xc1c2c3c4
    signal f.pc : 0xef957274
    signal fmtv : 0x114
    signal a2 : 0x91929394
    signal a3 : 0xa1a2a3a4
    signal a4 : 0xb1b2b3b4
    signal a5 : 0xc1c2c3c4
    signal d2 : 0xd1d2d3d4
    signal d3 : 0xe1e2e3e4
    signal d4 : 0xf1f2f3f4

    Illegal instruction (core dumped)

    What I thought would hold the copies of pt_regs and switch_stack remains
    empty while on the way down the stack, and is filled with saved
    registers when on the way back up.

    The registers as saved on exception, and copied to the mcontext struct,
    are correct in all cases, except for a5 which often contains 0xc0104e0c. Whenever that happens, a trap exception did get us into kernel mode. On
    both page faults and interrupts, a5 is as stored there (0xc1c2c3c4).

    usp stored in the signal frame (i.e. usp when the signal was taken) is
    0x10 below that saved at the start of rec(). That's not enough to store
    7 registers plus return pc...

    I still seem to have the sigframe end address wrong - it's larger than
    both parent usp and usp stored in the sigframe.

    Still baffled.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Tue Apr 25 02:00:01 2023
    On Mon, 24 Apr 2023, Michael Schmitz wrote:

    I don't understand these results. If usp was really overwritten, the program would have crashed early, no?

    I think we're still at the point where rec() is called recursively,
    before any returns.

    Right. I wasn't thinking.

    I'll try to confirm that each "overwrote usp" error from movemlrt.c
    corresponds to visible corruption at the given address in the core dump.

    Exception right before crash was an interrupt in this case (only seen
    that once in this context, though I've seen lots of those in the
    course of the test runs). Frame start calculated from siginfo pointer
    value in this case.

    I didn't realize that you could get a crash from a signal delivered following an interrupt. I'll try to modify the kernel such that
    signals are not delivered after page faults.

    Yes, that was news to me, too.


    That seems to be a mistake (?) I didn't see any failures when I patched
    the kernel to skip signal delivery after a page fault.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Tue Apr 25 07:20:01 2023
    Hi Andreas,

    Am 22.04.2023 um 22:12 schrieb Andreas Schwab:
    On Apr 22 2023, Michael Schmitz wrote:

    This is the definition from the kernel's
    include/uapi/asm-generic/ucontext.h:

    That's not actually used by m68k, it uses
    arch/m68k/include/asm/ucontext.h, which confusingly isn't an uapi
    header.

    As Stan and Finn report, it is actually used by recent m68k gcc version
    12.2.0 (Debian 12.2.0-12):

    Debian/sid has this struct in asm-generic/ucontext.h --

    struct ucontext {
    unsigned long uc_flags;
    struct ucontext *uc_link;
    stack_t uc_stack;
    struct sigcontext uc_mcontext;
    sigset_t uc_sigmask; /* mask last for extensibility */
    };

    but that lacks uc_filler[] so it's not ideal. I adapted the definitions
    from the kernel source.

    Time to move this into an uapi header, or is that more hassle than it's
    worth?

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)