Forum: >>> Magnum BBS <<<

dump, restore

From Stan Johnson@21:1/5 to All on Sat Aug 3 22:50:01 2024

Hello,

It appears that dump is not working in Debian SID m68k (including in QEMU):

root@m68k:/data# dump 0sbf 999999 64 dumpfile.dmp /dev/sda5
DUMP: Date of this level 0 dump: Sat Aug 3 14:41:04 2024
DUMP: Dumping /dev/sda5 (an unlisted file system) to shit.dmp
DUMP: Label: Gentoo-m68k
DUMP: Writing 64 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 2423885 blocks on 0.13 tape(s).
DUMP: Context save fork fails in parent 912

Please let me know whether this error is already known, or if I should
submit a bug report to Debian.

As a workaround, dump works on other platforms (such as x86_64), so it's possible to use those platforms to backup m68k filesystems.

thanks

-Stan Johnson

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Finn Thain@21:1/5 to Stan Johnson on Sun Aug 4 01:40:01 2024

On Sat, 3 Aug 2024, Stan Johnson wrote:

root@m68k:/data# dump 0sbf 999999 64 dumpfile.dmp /dev/sda5
...
DUMP: Context save fork fails in parent 912

The manpage says,

Each reel requires a new process, so parent processes for reels
already written just hang around until the entire tape is written.

So I'd expect fork failures if the number of reels was too high. Have you
tried the default "auto-size" tape length instead of passing -s 999999?

-a
auto-size. Bypass all tape length calculations, and write until an
end-of-media indication is returned. This works best for most
modern tape drives, and is the default. Use of this option is
particularly recommended when appending to an existing tape, or
using a tape drive with hardware compression (where you can never
be sure about the compression ratio).

If auto-size works, that might indicate a bug in the tape length
calculations that would need to be reported.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Finn Thain@21:1/5 to Stan Johnson on Fri Aug 9 13:20:01 2024

On Sat, 3 Aug 2024, Stan Johnson wrote:

Using "-a" appears to be a better option than just specifying a really
long tape size. Unfortunately, it also doesn't work. The problem seems
to affect only m68k; ppc-32, ppc-64, x86-32 and x86-64 all work as expected...

I reproduced the problem in QEMU and found it went away when I ran dump
under Linux v5.6. So I went through a lot of "git bisect" steps and the
culprit appears to be commit ef2c41cf38a7 ("clone3: allow spawning
processes into cgroups"). That seems plausible, since we are seeing an
error from fork_clone_io() below...

#ifdef __linux__
#if defined(SYS_clone) && defined(CLONE_IO)
pid_t
fork_clone_io(void)
{
return syscall(SYS_clone, CLONE_ARGS);
}
#endif
#endif

That code bypasses the C library so I suppose it's not too surprising that different architectures give different results...

Anyway, if I run dump under strace I see no CLONE_INTO_CGROUP flag:

clone(child_stack=NULL, flags=CLONE_IO|SIGCHLD) = -1 EBADF (Bad file descriptor)

The -EBADF result was introduced into cgroup_css_set_fork() by the commit above. That should not happen unless CLONE_INTO_CGROUP was set, but strace
says its not. So I don't know what's going on here.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Finn Thain@21:1/5 to All on Sat Aug 10 03:10:01 2024

On Fri, 9 Aug 2024, I wrote:

On Sat, 3 Aug 2024, Stan Johnson wrote:

Using "-a" appears to be a better option than just specifying a really
long tape size. Unfortunately, it also doesn't work. The problem seems
to affect only m68k; ppc-32, ppc-64, x86-32 and x86-64 all work as expected...

I reproduced the problem in QEMU and found it went away when I ran dump
under Linux v5.6. So I went through a lot of "git bisect" steps and the culprit appears to be commit ef2c41cf38a7 ("clone3: allow spawning
processes into cgroups"). That seems plausible, since we are seeing an
error from fork_clone_io() below...

#ifdef __linux__
#if defined(SYS_clone) && defined(CLONE_IO)
pid_t
fork_clone_io(void)
{
return syscall(SYS_clone, CLONE_ARGS);
}
#endif
#endif

That code bypasses the C library so I suppose it's not too surprising
that different architectures give different results...

Anyway, if I run dump under strace I see no CLONE_INTO_CGROUP flag:

clone(child_stack=NULL, flags=CLONE_IO|SIGCHLD) = -1 EBADF (Bad file descriptor)

The -EBADF result was introduced into cgroup_css_set_fork() by the
commit above. That should not happen unless CLONE_INTO_CGROUP was set,
but strace says its not. So I don't know what's going on here.

Here's what gdb says, FWIW...

# gdb
GNU gdb (Debian 13.1-3) 13.1
...
(gdb) file /usr/sbin/dump
Reading symbols from /usr/sbin/dump...
Reading symbols from /usr/lib/debug/.build-id/24/071a827207bee9c025d364137514447279302b.debug...
(gdb) run -0f /dev/null /dev/sda
Starting program: /usr/sbin/dump -0f /dev/null /dev/sda
DUMP: Date of this level 0 dump: Fri Aug 9 23:37:15 2024
DUMP: Dumping /dev/sda (an unlisted file system) to /dev/null
DUMP: Label: none
DUMP: Writing 10 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 3595695 blocks.
DUMP: Context save fork fails in parent 671
[Inferior 1 (process 671) exited with code 03]
(gdb) b fork_clone_io
Breakpoint 1 at 0x80009dbc: file tape.c, line 740.
(gdb) run -0f /dev/null /dev/sda
Starting program: /usr/sbin/dump -0f /dev/null /dev/sda
DUMP: Date of this level 0 dump: Fri Aug 9 23:38:17 2024
DUMP: Dumping /dev/sda (an unlisted file system) to /dev/null
DUMP: Label: none
DUMP: Writing 10 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 3595695 blocks.

Program received signal SIGSEGV, Segmentation fault.
0x00000001 in ?? ()
(gdb) l fork_clone_io
warning: Source file is more recent than executable.
735
736 #ifdef __linux__
737 #if defined(SYS_clone) && defined(CLONE_IO)
738 pid_t
739 fork_clone_io(void)
740 {
741 pid_t res,parent;
742 parent=getppid(); /* az hackety hack... */
743
744 res=syscall(SYS_clone, CLONE_ARGS);
745 getppid();
746 /* as per clone call manpage: caching! */
747 getpid();
748 #ifdef __alpha__
749 syscall(SYS_getxpid);
750 #else
751 syscall(SYS_getpid);
752 #endif
753
754 /* az: clone manpage doesn't say jack about what the
(gdb) disas fork_clone_io
Dump of assembler code for function fork_clone_io:
0x80009dbc <+0>: movel %d3,%sp@-
0x80009dbe <+2>: movel %d2,%sp@-
0x80009dc0 <+4>: bsrl 0x80004200 <getppid@plt>
0x80009dc6 <+10>: movel %d0,%d3
0x80009dc8 <+12>: clrl %sp@-
0x80009dca <+14>: clrl %sp@-
0x80009dcc <+16>: clrl %sp@-
0x80009dce <+18>: movel #-2147483631,%sp@-
0x80009dd4 <+24>: pea 0x78
0x80009dd8 <+28>: bsrl 0x80003fd0 <syscall@plt>
0x80009dde <+34>: movel %d0,%d2
0x80009de0 <+36>: bsrl 0x80004200 <getppid@plt>
0x80009de6 <+42>: bsrl 0x80003c9c <getpid@plt>
0x80009dec <+48>: pea 0x14
0x80009df0 <+52>: bsrl 0x80003fd0 <syscall@plt>
0x80009df6 <+58>: bsrl 0x80004200 <getppid@plt>
0x80009dfc <+64>: lea %sp@(24),%sp
0x80009e00 <+68>: cmpl %d0,%d3
0x80009e02 <+70>: beqs 0x80009e06 <fork_clone_io+74>
0x80009e04 <+72>: clrl %d2
0x80009e06 <+74>: movel %d2,%d0
0x80009e08 <+76>: movel %sp@+,%d2
0x80009e0a <+78>: movel %sp@+,%d3
0x80009e0c <+80>: rts
End of assembler dump.
(gdb)

Is this clone syscall (0x78) really executing sys_clone3()? Also,
-2147483631 == CLONE_IO|SIGCHLD like strace said.

And why does it crash when I set a break point?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Finn Thain@21:1/5 to Michael Schmitz on Sat Aug 10 07:50:01 2024

On Sat, 10 Aug 2024, Michael Schmitz wrote:

Anyway, if I run dump under strace I see no CLONE_INTO_CGROUP flag:

strace may not be aware of the CLONE_INTO_CGROUP flag yet? How old is
your strace binary?

I don't think strace is the problem. If it was, we should still see all
the flags in the disassembly, in the constant passed to the syscall.

clone(child_stack=NULL, flags=CLONE_IO|SIGCHLD) = -1 EBADF (Bad file
descriptor)

The -EBADF result was introduced into cgroup_css_set_fork() by the
commit above. That should not happen unless CLONE_INTO_CGROUP was set,
but strace says its not. So I don't know what's going on here.

Here's what gdb says, FWIW...

# gdb
GNU gdb (Debian 13.1-3) 13.1
...
(gdb) file /usr/sbin/dump
Reading symbols from /usr/sbin/dump...
Reading symbols from /usr/lib/debug/.build-id/24/071a827207bee9c025d364137514447279302b.debug... (gdb) run -0f /dev/null /dev/sda
Starting program: /usr/sbin/dump -0f /dev/null /dev/sda
DUMP: Date of this level 0 dump: Fri Aug 9 23:37:15 2024
DUMP: Dumping /dev/sda (an unlisted file system) to /dev/null
DUMP: Label: none
DUMP: Writing 10 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 3595695 blocks.
DUMP: Context save fork fails in parent 671
[Inferior 1 (process 671) exited with code 03]
(gdb) b fork_clone_io
Breakpoint 1 at 0x80009dbc: file tape.c, line 740.
(gdb) run -0f /dev/null /dev/sda
Starting program: /usr/sbin/dump -0f /dev/null /dev/sda
DUMP: Date of this level 0 dump: Fri Aug 9 23:38:17 2024
DUMP: Dumping /dev/sda (an unlisted file system) to /dev/null
DUMP: Label: none
DUMP: Writing 10 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 3595695 blocks.

Program received signal SIGSEGV, Segmentation fault.
0x00000001 in ?? ()
(gdb) l fork_clone_io
warning: Source file is more recent than executable.
735
736 #ifdef __linux__
737 #if defined(SYS_clone) && defined(CLONE_IO)
738 pid_t
739 fork_clone_io(void)
740 {
741 pid_t res,parent;
742 parent=getppid(); /* az hackety hack... */
743
744 res=syscall(SYS_clone, CLONE_ARGS);
745 getppid();
746 /* as per clone call manpage: caching! */
747 getpid();
748 #ifdef __alpha__
749 syscall(SYS_getxpid);
750 #else
751 syscall(SYS_getpid);
752 #endif
753
754 /* az: clone manpage doesn't say jack about what the
(gdb) disas fork_clone_io
Dump of assembler code for function fork_clone_io:
0x80009dbc <+0>: movel %d3,%sp@-
0x80009dbe <+2>: movel %d2,%sp@-
0x80009dc0 <+4>: bsrl 0x80004200 <getppid@plt>
0x80009dc6 <+10>: movel %d0,%d3
0x80009dc8 <+12>: clrl %sp@-
0x80009dca <+14>: clrl %sp@-
0x80009dcc <+16>: clrl %sp@-
0x80009dce <+18>: movel #-2147483631,%sp@-
0x80009dd4 <+24>: pea 0x78
0x80009dd8 <+28>: bsrl 0x80003fd0 <syscall@plt>
0x80009dde <+34>: movel %d0,%d2
0x80009de0 <+36>: bsrl 0x80004200 <getppid@plt>
0x80009de6 <+42>: bsrl 0x80003c9c <getpid@plt>
0x80009dec <+48>: pea 0x14
0x80009df0 <+52>: bsrl 0x80003fd0 <syscall@plt>
0x80009df6 <+58>: bsrl 0x80004200 <getppid@plt>
0x80009dfc <+64>: lea %sp@(24),%sp
0x80009e00 <+68>: cmpl %d0,%d3
0x80009e02 <+70>: beqs 0x80009e06 <fork_clone_io+74>
0x80009e04 <+72>: clrl %d2
0x80009e06 <+74>: movel %d2,%d0
0x80009e08 <+76>: movel %sp@+,%d2
0x80009e0a <+78>: movel %sp@+,%d3
0x80009e0c <+80>: rts
End of assembler dump.
(gdb)

Is this clone syscall (0x78) really executing sys_clone3()? Also,

Nope, syscall no. 120 calls __sys_clone() which in turn calls
m68k_clone() which emulates sys_clone() (roundabout way due to different calling conventions on m68k).

clone3 is syscall 435 (calling __sys_clone3() -> m68k_clone3() -> sys_clone3()).

What confused me was that 'git bisect' fingered what looked like a clone3 patch, but it turns out that this patch affects anything that calls cgroup_can_fork(), that is, any syscalls that call copy_process().

But as long as syscall() takes care of the calling convention, I see no reason why that way of calling sys_clone() would fail.

The interesting thing about the calling convention is that the flags make
up a 32-bit quantity when passed to clone as an int, and a 64-bit quantity
when passed to clone3 as struct clone_args.flags.

So I've just added some printk() statements and found that m68k_clone()
messed up the flags in the kernel_clone_args struct: I'm seeing 0xFFFFFFFF80000000 which explains how CLONE_INTO_CGROUP got set.

I'll send a patch.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Rixter
  Thu Jul 30 02:32:09 2026
  from Madison, Nc via Telnet
- Bob Worm
  Wed Jul 29 22:26:45 2026
  from Wales, Uk via Telnet
- Zenobyte
  Wed Jul 29 21:08:05 2026
  from San Juan, Pr via Telnet
- Guest
  Wed Jul 29 14:26:54 2026
  from Balkans via Telnet
- Rixter
  Wed Jul 29 14:18:17 2026
  from Madison, Nc via Telnet
- Rixter
  Wed Jul 29 02:00:40 2026
  from Madison, Nc via Telnet
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	82:03:53
Calls:	12,451
Calls today:	1
Files:	15,194
Messages:	6,537,755

dump, restore

Who's Online

Recent Visitors

System Info