• Bug#1109119: upgrade-reports: systemctl occasionally fails loading libc

    From Helmut Grohne@21:1/5 to All on Fri Jul 11 19:20:01 2025
    XPost: linux.debian.devel, linux.debian.devel.testing

    Package: upgrade-reports
    Severity: important
    X-Debbugs-Cc: [email protected],debian-release.debian.org,[email protected],[email protected],[email protected]

    Hello,

    while working on a bookworm -> trixie upgrade failure, I noticed a strange line showing up.

    | Preparing to unpack .../openssh-server_1%3a10.0p1-5_amd64.deb ...
    | systemctl: error while loading shared libraries: libcrypto.so.3: cannot open shared object file: No such file or directory

    This is openssh-server.preinst failing to systemctl stop
    rescue-ssh.service. I talked to Colin and both of us agreed that this
    instance is probably practically irrelevant. However, I still think
    there is a problem. Due to the time64 transition, libssl3 was renamed to libssl3t64 and for some reason apt ends up removing libssl3 before
    unpacking libssl3t64. Given Breaks+Replaces, unpacking libssl3t64 after
    having deconfigured libssl3 before removing libssl3 should be fine, but
    apt does not like that solution. As a result, libcrypto.so.3 is
    temporarily removed.

    deb-systemd-invoke is part of init-system-helpers and therefore
    essential. It calls out to systemctl, which is not essential but for all practical matters we really should be treating it as if it was and
    maintainer scripts expect it to work at all times. libssl3 or libssl3t64
    are pseudo-essential. Some part (apt or openssl) violates policy during
    the upgrade as being pseudo-essential requires it to work at all times
    even when unpacked.

    In practice, this means that systemctl cannot be expected to work in
    maintainer scripts. This will mostly affect preinst scripts (not just openssh-server) trying to stop services. For instance, it is conceivable
    that we could fail to stop mariadb or postgresql due to this (but there
    is no practical evidence of this ever having happened). Failure to stop services violates assumptions placed by package maintainers and that may
    have all sorts of consequences. I have several reports of systemctl
    having failed during release upgrades without having failed the upgrade transaction as a whole.

    It really is unclear whether this has practical consequences and whether
    there is a dataloss scenario something else that makes this problem
    practically relevant. We typically reboot after a dist upgrade (at least
    that's what release notes strongly recommend) and doing so tends to fix
    any failure to stop or start services. I have no evidence of this
    problem having caused a real issue (beyond that message).

    If you have earlier upgraded from bookworm to trixie. You should be able
    to search in your /var/log/apt/term.log* for the earlier message to see
    whether you were affected.

    In talking to Ivo and Paul, we agreed to report the problem to d-devel
    via upgrade-reports. At this stage we want to gauge the impact and
    better understand how serious this actually is, so following up on the
    bug report with evidence (dropping all lists if that's all you add) is
    highly appreciated.

    The options for fixing this are dim. Reverting the t64 transition for
    openssl and going dual-ABI seems highly unlikely even though it would
    fix the problem at the root. Other options are dim, because we have no
    scripts that are guarantueed to run before apt chooses to remove
    bookworm's libssl3. We considered doing changes to bookworm to mitigate. Conceivably, a bookworm update could add a libssl3.preinst that diverts
    the library to keep it around until it is overwritten by libssl3t64.

    I invite others to work on the problem as I have no capacity to do it.
    I'm still yak shaving another release upgrade problem and would like to
    enjoy DebConf. Thank you

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Chris Hofstaedtler on Tue Jul 15 00:30:01 2025
    XPost: linux.debian.devel, linux.debian.devel.testing

    Hi Chris,

    On Sat, Jul 12, 2025 at 12:36:20AM +0200, Chris Hofstaedtler wrote:
    I was somewhat hoping this is caused by the existing upgrade issue which Helmut is working on AFAIU. Was it proven that this is not the same problem?

    I believe this is independent and I merely caught it as I was carefully
    reading test results for your other issue. This particular one is solely
    caused by libssl3 having done a t64 transition when it was not
    appropriate to do so. I do not expect my proposed change to glibc to
    improve the situation regarding systemctl.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to Helmut Grohne on Thu Jul 24 02:50:01 2025
    XPost: linux.debian.devel, linux.debian.devel.testing

    On Sun, Jul 13, 2025 at 10:29:25AM +0200, Helmut Grohne wrote:
    Hi Chris,

    On Sat, Jul 12, 2025 at 12:36:20AM +0200, Chris Hofstaedtler wrote:
    I was somewhat hoping this is caused by the existing upgrade issue which Helmut is working on AFAIU. Was it proven that this is not the same problem?

    I believe this is independent and I merely caught it as I was carefully reading test results for your other issue. This particular one is solely caused by libssl3 having done a t64 transition when it was not
    appropriate to do so. I do not expect my proposed change to glibc to
    improve the situation regarding systemctl.

    Is this problem still happening?

    If so, are there any ideas on what to do about it?

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Michael Biebl on Fri Jul 25 00:30:01 2025
    XPost: linux.debian.devel.testing

    Hi Michael,

    I appreciate that you chime in here.

    On Thu, Jul 24, 2025 at 11:18:31PM +0200, Michael Biebl wrote:
    We are rather late in the trixie release cycle, so maybe the safest approach is to revert 8ca2b266789bff0b1af348100e72da7245864174 for trixie and
    re-apply it early during the forky release?

    Could you help me understand why reverting this commit would improve the symptoms of the bug report at hand?

    I understand that having libsystemd-shared and systemctl can be upgraded separately and that doing so results in yet another window where
    systemctl is dysfunctional. The problem reported here is about systemctl linking libcrypto.so.3 however. Both bookworm and trixie do that.
    Therefore my understanding is that the aforementioned commit does not
    cause the libcrypto.so.3 linkage that is the problem here. Conversely, reverting it will not remove libcrypto.so.3 and therefore will not help
    with the t64-induced library rename.

    What I'm trying to argue here is that even if reverting it improves
    robustness in some way, it does not improve robustness regarding libssl3
    to libssl3t64 upgrades. Do you concur here?

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to Chris Hofstaedtler on Sun Jul 27 15:40:01 2025
    XPost: linux.debian.devel, linux.debian.devel.testing

    On Thu, Jul 24, 2025 at 02:43:27AM +0200, Chris Hofstaedtler wrote:
    On Sun, Jul 13, 2025 at 10:29:25AM +0200, Helmut Grohne wrote:
    Hi Chris,

    On Sat, Jul 12, 2025 at 12:36:20AM +0200, Chris Hofstaedtler wrote:
    I was somewhat hoping this is caused by the existing upgrade issue which Helmut is working on AFAIU. Was it proven that this is not the same problem?

    I believe this is independent and I merely caught it as I was carefully reading test results for your other issue. This particular one is solely caused by libssl3 having done a t64 transition when it was not
    appropriate to do so. I do not expect my proposed change to glibc to improve the situation regarding systemctl.

    Is this problem still happening?

    Maybe a better question to ask would have been: is there a known
    reproducer for this?

    Given there were unclear reasons for the deconfigure/unpack order of libssl3(t64), I do wonder if other changes in the archive might
    have improved the situation.

    Best,
    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)