• Bug#1110249: podman corrupted its internal state or something?

    From Ian Jackson@21:1/5 to All on Sat Aug 2 00:20:01 2025
    Package: podman
    Version: 4.3.1+ds1-8+deb12u1+b1

    The tag2upload system is using podman on tag2upload-builder-01 to
    isolate one source package build from the previous one. Recently the
    service failed. I investigated and found that podman was saying this:

    ERRO[0000] invalid internal status, try resetting the pause process with "podman system migrate": could not find any running process: no such process

    I don't know such an invalid internal state could have occurred. The
    machine is a DSA-managed host and the account is `tag2upload-builder`,
    a service account.

    The containers are used exclusively (at least in normal operation) via autopkgtest-virt-podman. They are created with a custom script, but
    that runs weekly, but on Sundays, so doesn't seem implicated. I
    haven't been doing any admin work that might be related and I think
    Sean hasn't either.

    The cause doesn't seem to have been a podman update either.
    /var/log/dpkg.log* on the affected system shows no change to podman
    since 2025-05-17.

    I believe our containers are running in a "rootless" mode. We can
    probably provide more information about the configuration if you can
    tell us how to obtain it.

    Without really knowing anything about what was wrong, but wanting to
    restore service, I ran the command suggested:

    tag2upload-builder@tag2upload-builder-01:~$ podman container ls
    ERRO[0000] invalid internal status, try resetting the pause process with "podman system migrate": could not find any running process: no such process
    tag2upload-builder@tag2upload-builder-01:~$ podman system migrate
    stopped 714e0e504f2ccc42e7e31222ca20ccbe1fa68bd985b9d980a80f0240e8ae8c8b stopped dc6da82beaa9ff6c0f70d8854c18332a7a4875228f8d6771dc00aa36569148b4 tag2upload-builder@tag2upload-builder-01:~$ podman container ls
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES tag2upload-builder@tag2upload-builder-01:~$ podman image ls
    REPOSITORY TAG IMAGE ID CREATED SIZE
    localhost/autopkgtest/debian bookworm 9e23b4f3cdfe 43 hours ago 985 MB
    localhost/autopkgtest/amd64/debian bookworm 9e23b4f3cdfe 43 hours ago 985 MB
    <none> <none> 54738866ce8a 8 days ago 985 MB
    <none> <none> d9dc91a8bf2c 8 weeks ago 981 MB
    tag2upload-builder@tag2upload-builder-01:~$

    Now the service is working again. But, I don't think it should have
    broken down.

    I don't have much more information afraid. I do have a ps listing of
    the system in the failed state. If it were to happen again, and we
    knew what other information to collect, we could do that.

    Thanks to Colin Watson for the original outage report.

    Ian.

    iwj@tag2upload-builder-01:~$ ps -efH
    UID PID PPID C STIME TTY TIME CMD
    root 2 0 0 May30 ? 00:00:00 [kthreadd]
    root 3 2 0 May30 ? 00:00:00 [rcu_gp]
    root 4 2 0 May30 ? 00:00:00 [rcu_par_gp]
    root 5 2 0 May30 ? 00:00:00 [slub_flushwq]
    root 6 2 0 May30 ? 00:00:00 [netns]
    root 10 2 0 May30 ? 00:00:00 [mm_percpu_wq]
    root 11 2 0 May30 ? 00:00:00 [rcu_tasks_kthread]
    root 12 2 0 May30 ? 00:00:00 [rcu_tasks_rude_kthread] root 13 2 0 May30 ? 00:00:00 [rcu_tasks_trace_kthread] root 14 2 0 May30 ? 00:03:42 [ksoftirqd/0]
    root 15 2 0 May30 ? 00:21:58 [rcu_preempt]
    root 16 2 0 May30 ? 00:00:21 [migration/0]
    root 18 2 0 May30 ? 00:00:00 [cpuhp/0]
    root 20 2 0 May30 ? 00:00:00 [kdevtmpfs]
    root 21 2 0 May30 ? 00:00:00 [inet_frag_wq]
    root 22 2 0 May30 ? 00:00:00 [kauditd]
    root 23 2 0 May30 ? 00:00:01 [khungtaskd]
    root 24 2 0 May30 ? 00:00:00 [oom_reaper]
    root 27 2 0 May30 ? 00:00:00 [writeback]
    root 28 2 0 May30 ? 00:03:54 [kcompactd0]
    root 29 2 0 May30 ? 00:00:00 [ksmd]
    root 30 2 0 May30 ? 00:00:46 [khugepaged]
    root 31 2 0 May30 ? 00:00:00 [kintegrityd]
    root 32 2 0 May30 ? 00:00:00 [kblockd]
    root 33 2 0 May30 ? 00:00:00 [blkcg_punt_bio]
    root 34 2 0 May30 ? 00:00:00 [tpm_dev_wq]
    root 35 2 0 May30 ? 00:00:00 [edac-poller]
    root 36 2 0 May30 ? 00:00:00 [devfreq_wq]
    root 38 2 0 May30 ? 00:01:24 [kswapd0]
    root 44 2 0 May30 ? 00:00:00 [kthrotld]
    root 46 2 0 May30 ? 00:00:00 [acpi_thermal_pm]
    root 48 2 0 May30 ? 00:00:00 [mld]
    root 49 2 0 May30 ? 00:00:00 [ipv6_addrconf]
    root 54 2 0 May30 ? 00:00:00 [kstrp]
    root 59 2 0 May30 ? 00:00:00 [zswap-shrink]
    root 60 2 0 May30 ? 00:00:00 [kworker/u3:0]
    root 127 2 0 May30 ? 00:00:00 [ata_sff]
    root 128 2 0 May30 ? 00:00:00 [scsi_eh_0]
    root 129 2 0 May30 ? 00:00:00 [scsi_tmf_0]
    root 130 2 0 May30 ? 00:00:00 [scsi_eh_1]
    root 131 2 0 May30 ? 00:00:00 [scsi_tmf_1]
    root 154 2 0 May30 ? 00:04:14 [jbd2/vda1-8]
    root 155 2 0 May30 ? 00:00:00 [ext4-rsv-conver]
    root 166 2 0 May30 ? 00:00:02 [hwrng]
    root 277 2 0 May30 ? 00:00:00 [cryptd]
    root 1313 2 0 May30 ? 00:00:00 [tls-strp]
    root 1596194 2 0 Jul30 ? 00:00:07 [kworker/0:2H-kblockd] root 1999667 2 0 19:27 ? 00:00:00 [kworker/0:0H-kblockd] root 2017297 2 0 21:10 ? 00:00:00 [kworker/u2:0-events_unbound]
    root 2020032 2 0 21:27 ? 00:00:00 [kworker/0:1-ata_sff] root 2021391 2 0 21:34 ? 00:00:00 [kworker/u2:1-flush-254:0]
    root 2021805 2 0 21:38 ? 00:00:00 [kworker/0:2-cgroup_destroy]
    root 2022259 2 0 21:39 ? 00:00:00 [kworker/u2:2-flush-254:0]
    root 2022639 2 0 21:43 ? 00:00:00 [kworker/0:0-rcu_gp]
    root 2023028 2 0 21:43 ? 00:00:00 [kworker/u2:3-ext4-rsv-conversion]
    root 2023029 2 0 21:43 ? 00:00:00 [kworker/u2:4-events_unbound]
    root 2023030 2 0 21:43 ? 00:00:00 [kworker/u2:5]
    root 2023330 2 0 21:46 ? 00:00:00 [kworker/0:1H]
    root 2023831 2 0 21:47 ? 00:00:00 [kworker/0:3-events]
    root 1 0 0 May30 ? 00:07:40 /lib/systemd/systemd --system --deserialize=35
    root 378 1 0 May30 ? 00:00:17 /usr/sbin/cron -f message+ 379 1 0 May30 ? 00:01:21 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
    daemon 391 1 0 May30 ? 00:00:00 /usr/sbin/uptimed -f
    ulog 392 1 0 May30 ? 00:02:13 /usr/sbin/ulogd --daemon --uid ulog --pidfile /run/ulog/ulogd.pid
    daemon 394 1 0 May30 ? 00:00:00 /usr/sbin/atd -f
    root 585 1 0 May30 ttyS0 00:00:00 /sbin/agetty -o -p -- \u --keep-baud 115200,57600,38400,9600 - vt220
    root 599 1 0 May30 ? 00:03:59 /usr/bin/perl -wT /usr/sbin/munin-node --foreground
    root 646 1 0 May30 tty1 00:00:00 /sbin/agetty -o -p -- \u --noclear - linux
    munin-a+ 884 1 0 May30 ? 00:03:39 munin-asyncd [tag2upload-builder-01.debian.org] [idle]
    iwj 3911059 1 0 Jul19 ? 00:00:00 SCREEN
    iwj 3911060 3911059 0 Jul19 pts/3 00:00:00 /bin/bash
    root 3911078 3911060 0 Jul19 pts/3 00:00:00 sudo -u tag2upload-builder bash
    tag2upl+ 3911079 3911078 0 Jul19 pts/3 00:00:00 bash
    iwj 3911091 3911059 0 Jul19 pts/4 00:00:00 /bin/bash
    root 3911095 3911091 0 Jul19 pts/4 00:00:00 sudo -u tag2upload-builder bash
    tag2upl+ 3911113 3911095 0 Jul19 pts/4 00:00:00 bash
    root 1756030 1 0 Jul31 ? 00:00:02 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
    root 1757539 1756030 0 Jul31 ? 00:00:00 sshd: munin-async [priv]
    munin-a+ 1757562 1757539 0 Jul31 ? 00:00:03 sshd: munin-async@notty
    root 2023897 1756030 0 21:47 ? 00:00:00 sshd: iwj [priv]
    iwj 2023924 2023897 0 21:48 ? 00:00:00 sshd: iwj@pts/0
    iwj 2023940 2023924 0 21:48 pts/0 00:00:00 -bash
    iwj 2023947 2023940 0 21:48 pts/0 00:00:00 ps -efH
    root 1756238 1 0 Jul31 ? 00:00:00 /usr/sbin/ekeyd-egd-linux -r300 -D/var/run/ekeyd-egd-linux.pid
    unbound 1756262 1 0 Jul31 ? 00:00:00 /usr/sbin/unbound -d -p root 1756271 1 0 Jul31 ? 00:00:12 /usr/sbin/syslog-ng -F Debian-+ 1756298 1 0 Jul31 ? 00:00:00 /usr/sbin/exim4 -bd -q30m bacula 1756327 1 0 Jul31 ? 00:00:14 /usr/sbin/bacula-fd -c /etc/bacula/bacula-fd.conf -f -u bacula -k
    _lldpd 1756339 1 0 Jul31 ? 00:00:00 lldpd: monitor.
    _lldpd 1756342 1756339 0 Jul31 ? 00:00:00 lldpd: connected to ubc-node01.debian.org.
    nagios 1756365 1 0 Jul31 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f --no-ssl
    root 1756377 1 0 Jul31 ? 00:00:03 /usr/bin/python3 /usr/bin/ud-replicated -d
    tag2upl+ 1756898 1 0 Jul31 ? 00:00:00 /lib/systemd/systemd --user
    tag2upl+ 1756900 1756898 0 Jul31 ? 00:00:00 (sd-pam)
    root 1756926 1 0 Jul31 ? 00:00:09 /lib/systemd/systemd-logind
    systemd+ 1756969 1 0 Jul31 ? 00:00:00 /lib/systemd/systemd-timesyncd
    root 1756977 1 0 Jul31 ? 00:00:00 /lib/systemd/systemd-udevd
    nagios 1757005 1 0 Jul31 ? 00:00:16 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e
    nagios 1757023 1757005 0 Jul31 ? 00:00:39 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e
    nagios 1757026 1757023 0 Jul31 ? 00:00:05 /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close-stdio -e
    root 1757057 1 0 Jul31 ? 00:00:14 /lib/systemd/systemd-journald
    root 1757493 1 0 Jul31 ? 00:01:06 /usr/sbin/samhain
    root 1798554 1757493 0 00:00 ? 00:00:09 /usr/sbin/samhain munin-a+ 1757542 1 0 Jul31 ? 00:00:00 /lib/systemd/systemd --user
    munin-a+ 1757543 1757542 0 Jul31 ? 00:00:00 (sd-pam)
    stunnel4 1776071 1 0 Jul31 ? 00:00:00 /usr/bin/stunnel4 /etc/stunnel/puppet-ekeyd.conf
    iwj 2023904 1 0 21:48 ? 00:00:00 /lib/systemd/systemd --user
    iwj 2023905 2023904 0 21:48 ? 00:00:00 (sd-pam) iwj@tag2upload-builder-01:~$

    --
    Ian Jackson <[email protected]> These opinions are my own.

    Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
    that is a private address which bypasses my fierce spamfilter.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)