• Bug#1110033: unblock: openssh/1:10.0p1-6

    From Colin Watson@21:1/5 to All on Mon Jul 28 14:00:02 2025
    XPost: linux.debian.devel.release

    Package: release.debian.org
    Severity: normal
    X-Debbugs-Cc: [email protected]
    Control: affects -1 + src:openssh
    User: [email protected]
    Usertags: unblock

    [ Reason ]
    In OpenSSH 9.8, upstream split some of the responsibilities of sshd out
    to a new sshd-session binary, in order to reduce the attack surface of
    the process that listens on port 22. (Parts of sshd-session were later
    split out to sshd-auth as well.)

    When sshd receives a new connection, it used to fork and re-exec itself
    with special command-line parameters to handle the session. Following
    the split, it forks and execs sshd-session instead, and sshd no longer
    supports the parameters telling it to act as a session process. Since
    the listener process is only restarted in the postinst, this means that
    between the unpack and configure stages of openssh-server across this
    change (in particular, during an upgrade from bookworm to trixie), the
    listener is unable to start any new SSH sessions. This may cover a
    large part of the duration of the upgrade.

    [ Impact ]
    As described in https://bugs.debian.org/1109742, during most of the
    upgrade process from bookworm to trixie, it's impossible to initiate new
    SSH connections. If the upgrade fails, and the user forgets to maintain
    a separate SSH connection or their network connection is interrupted,
    the result may be a failed remote upgrade with no way to access the
    system.

    [ Tests ]
    I've tested this manually by creating a bookworm container and running
    the relevant parts of the upgrade step by step, something like this
    (obviously set up for me, but adjust as needed):

    $ incus launch images:debian/bookworm openssh-upgrade
    $ incus exec openssh-upgrade -- apt -y install openssh-server
    $ incus exec openssh-upgrade -- adduser --disabled-password --comment 'Colin Watson' cjwatson
    $ incus file push -p --uid 1000 --gid 1000 --mode=600 .ssh/id_ed25519.pub openssh-upgrade/home/cjwatson/.ssh/authorized_keys

    Then run "while :; do date -Ins; ssh openssh-upgrade.incus true; sleep
    0.1; done" in a separate terminal to monitor connectivity, and continue
    the upgrade with:

    $ dcmd incus file push openssh_10.0p1-6_amd64.changes openssh-upgrade/root/
    $ incus exec openssh-upgrade -- dpkg --unpack openssh-{client,server,sftp-server}_10.0p1-6_amd64.deb
    $ incus exec openssh-upgrade -- sed -i 's/bookworm/trixie/' /etc/apt/sources.list
    $ incus exec openssh-upgrade -- apt update
    $ incus exec openssh-upgrade -- apt -f install

    [ Risks ]
    My first approach was to patch the new sshd to notice when it's been
    invoked with -R and exec sshd-session instead, but that turned out not
    to work reliably: it's possible for the new sshd to fail at the dynamic
    linker stage immediately after openssh-server has been unpacked, e.g.
    because it needs a newer libc6.

    The self-diversion approach is a bit alarming, but it limits the scope
    of the workaround code to just the affected upgrade scenarios, and the
    code is mechanically simple enough even if it requires some careful
    thinking. I can't think of any better approaches.

    [ Checklist ]
    [x] all changes are documented in the d/changelog
    [x] I reviewed all changes and I approve them
    [x] attach debdiff against the package in testing

    [ Other info ]
    Connections during upgrade still fail between libssl3 being replaced by libssl3t64 at a newer version and the new openssh-server being
    configured. While that's a similar symptom, it's a separate problem
    that will need a stable update to bookworm; that's covered by https://bugs.debian.org/1110030.

    unblock openssh/1:10.0p1-6

    --
    Colin Watson (he/him) [[email protected]]

    diff --git a/debian/changelog b/debian/changelog
    index 8aa7ac0aa..8fedadf2f 100644
    --- a/debian/changelog
    +++ b/debian/changelog
    @@ -1,3 +1,11 @@
    +openssh (1:10.0p1-6) unstable; urgency=medium
    +
    + * Temporarily divert /usr/sbin/sshd during upgrades from before
    + 1:9.8p1-1~, to avoid new connections failing between unpack and
    + configure (closes: #1109742).
    +
    + -- Colin Watson <[email protected]> Mon, 28 Jul 2025 12:17:42 +0100
    +
    openssh (1:10.0p1-5) unstable; urgency=medium

    * Ensure that configure knows the path to passwd; fixes reproducibility of diff --git a/debian/openssh-server.postinst b/debian/openssh-server.postinst index 7e4a62c65..c0d43006d 100644
    --- a/debian/openssh-server.postinst
    +++ b/debian/openssh-server.postinst
    @@ -11,10 +11,16 @@ umask 022

    get_config_option() {
    option="$1"
    + sshd_path=/usr/sbin/sshd

    [ -f /etc/ssh/sshd_config ] || return

    - /usr/sbin/sshd -G | sed -n "s/^$option //Ip"
    + # begin-remove-after: released:forky
    + if [ -e /usr/sbin/sshd.session-split ]; then
    + sshd_path=/usr/sbin/sshd.session-split
    + fi
    + # e
  • From Colin Watson@21:1/5 to Ivo De Decker on Fri Aug 1 13:10:01 2025
    XPost: linux.debian.devel.release

    On Wed, Jul 30, 2025 at 03:22:29PM +0000, Ivo De Decker wrote:
    On Mon, Jul 28, 2025 at 12:54:40PM +0100, Colin Watson wrote:
    The self-diversion approach is a bit alarming, but it limits the scope
    of the workaround code to just the affected upgrade scenarios, and the
    code is mechanically simple enough even if it requires some careful
    thinking. I can't think of any better approaches.

    I'm leaning towards unblocking this, as it's probably the least bad option. I >wonder if there are any corner cases where the result of this change is worse >than not doing it.

    Fortunately, I haven't been able to come up with such a case yet.

    Some questions:

    Thanks. I agree that these are all reasonable questions to consider.

    Can you think of any scenario where the system would end up without a >/usr/sbin/sshd binary?

    I think this is impossible. The dpkg-divert calls themselves are run
    with --no-rename, so they only touch the dpkg database; dpkg is going to
    unpack either to /usr/sbin/sshd or to /usr/sbin/sshd.session-split, but
    in either case it does so atomically; and we atomically move the
    diverted file to the real location immediately after removing the
    diversion.

    What happens if the system crashes during the upgrade, after the diversion is >added, but before it is removed? Will sshd work after reboot (it's possible >that sshd wouldn't work in this scenario without the change anyway)? If not, >will it work after the upgrade is finished (by an admin connected in a >different way)?

    I just tested this by restarting the container after the "dpkg --unpack"
    step in the test procedure I gave in my original message to this bug,
    and sshd continues to work fine. Makes sense, since /usr/sbin/sshd
    hasn't changed yet in this scenario.

    In fact, this change arguably makes things _more_ resilient in this
    scenario for this particular upgrade: it used to be that if you unpacked
    the new openssh-server before the new libc6 then a system crash would
    leave you without a working sshd until openssh-server is configured
    again, but with this change it works. (The complexity cost is high
    enough that I wouldn't suggest dropping the version guards, though.)

    I think the least obvious case is where the system crashes between
    dpkg-divert and mv in the postinst, and that's hard to test precisely.
    But I believe in that case dpkg will still think the package needs to be configured, so it will try the dpkg-divert --remove call again, and I've
    tested that that exits zero with just a warning message if the diversion
    has already been removed. So that case should work fine too.

    Can you think of a scenario where dpkg thinks the upgrade of openssh-server is >done, but the diversion is still there? In that case, even (purging and) >reinstalling openssh-server won't help, because the code removing the >diversion will no longer be triggered.

    While it's hard to give a categorical "that can't happen", I can't
    currently think of such a scenario. I suppose the postinst might be a
    little more resilient against such problems if I did this:

    diff --git a/debian/openssh-server.postinst b/debian/openssh-server.postinst index c0d43006d..2a68f6f85 100644
    --- a/debian/openssh-server.postinst
    +++ b/debian/openssh-server.postinst
    @@ -116,7 +116,7 @@ if [ "$action" = configure ]; then
    systemctl disable ssh.service
    fi
    # begin-remove-after: released:forky
    - if dpkg --compare-versions "$2" lt-nl 1:9.8p1-1~; then
    + if [ -e /usr/sbin/sshd.session-split ]; then
    # We're ready to restart the listener process so that it
    # executes sshd-session rather than sshd for new
    # connections, so we can remove this diversion now. This
    @@ -128,9 +128,7 @@ if [ "$action" = configure ]; then
    # name.
    dpkg-divert --package openssh-client --remove --no-rename \
    --divert /usr/sbin/sshd.session-split /usr/sbin/sshd
    - if [ -e /usr/sbin/sshd.session-split ]; then
    - mv -f /usr/sbin/sshd.session-split /usr/sbin/sshd
    - fi
    + mv -f /usr/sbin/sshd.session-split /usr/sbin/sshd
    fi
    # end-remove-after
    fi

    I haven't tested this as yet, but do you think it would be better? It
    seemed clearest to use the same condition in the preinst
  • From Ivo De Decker@21:1/5 to Colin Watson on Fri Aug 1 15:20:01 2025
    XPost: linux.debian.devel.release

    Hi Colin,

    Thanks for your reply.

    As mentioned on IRC, I added the unblock, but I'm leaving the bug open for
    now, to see if we want to add additional changes.

    At least this will allow the change to get into trixie, and should allow us to collect feedback from users who upgrade in the coming days.

    On Fri, Aug 01, 2025 at 12:06:50PM +0100, Colin Watson wrote:
    On Wed, Jul 30, 2025 at 03:22:29PM +0000, Ivo De Decker wrote:
    On Mon, Jul 28, 2025 at 12:54:40PM +0100, Colin Watson wrote:
    The self-diversion approach is a bit alarming, but it limits the scope
    of the workaround code to just the affected upgrade scenarios, and the code is mechanically simple enough even if it requires some careful thinking. I can't think of any better approaches.

    I'm leaning towards unblocking this, as it's probably the least bad option. I
    wonder if there are any corner cases where the result of this change is worse
    than not doing it.

    Fortunately, I haven't been able to come up with such a case yet.

    Some questions:

    Thanks. I agree that these are all reasonable questions to consider.

    Can you think of any scenario where the system would end up without a /usr/sbin/sshd binary?

    I think this is impossible. The dpkg-divert calls themselves are run with --no-rename, so they only touch the dpkg database; dpkg is going to unpack either to /usr/sbin/sshd or to /usr/sbin/sshd.session-split, but in either case it does so atomically; and we atomically move the diverted file to the real location immediately after removing the diversion.

    OK.

    What happens if the system crashes during the upgrade, after the diversion is
    added, but before it is removed? Will sshd work after reboot (it's possible that sshd wouldn't work in this scenario without the change anyway)? If not,
    will it work after the upgrade is finished (by an admin connected in a different way)?

    I just tested this by restarting the container after the "dpkg --unpack"
    step in the test procedure I gave in my original message to this bug, and sshd continues to work fine. Makes sense, since /usr/sbin/sshd hasn't changed yet in this scenario.

    In fact, this change arguably makes things _more_ resilient in this scenario for this particular upgrade: it used to be that if you unpacked the new openssh-server before the new libc6 then a system crash would leave you without a working sshd until openssh-server is configured again, but with this change it works. (The complexity cost is high enough that I wouldn't suggest dropping the version guards, though.)

    I think the least obvious case is where the system crashes between dpkg-divert and mv in the postinst, and that's hard to test precisely. But
    I believe in that case dpkg will still think the package needs to be configured, so it will try the dpkg-divert --remove call again, and I've tested that that exits zero with just a warning message if the diversion has already been removed. So that case should work fine too.

    OK.

    Can you think of a scenario where dpkg thinks the upgrade of openssh-server is
    done, but the diversion is still there? In that case, even (purging and) reinstalling openssh-server won't help, because the code removing the diversion will no longer be triggered.

    While it's hard to give a categorical "that can't happen", I can't currently think of such a scenario. I suppose the postinst might be a little more resilient against such problems if I did this:

    diff --git a/debian/openssh-server.postinst b/debian/openssh-server.postinst index c0d43006d..2a68f6f85 100644
    --- a/debian/openssh-server.postinst
    +++ b/debian/openssh-server.postinst
    @@ -116,7 +116,7 @@ if [ "$action" = configure ]; then
    systemctl disable ssh.service
    fi
    # begin-remove-after: released:forky
    - if dpkg --compare-versions "$2" lt-nl 1:9.8p1-1~; then
    + if [ -e /usr/sbin/sshd.session-split ]; then
    # We're ready to restart the listener process so that it
    # executes sshd-session rather than sshd for new
    # connections, so we can remove this diversion now. This
    @@ -128,9 +128,7 @@ if [ "$action" = configure ]; then
    # name.
    dpkg-divert --package openssh-client --remove --no-rename \
    --divert /usr/sbin/sshd.session-split /usr/sbin/sshd
    - if [ -e /usr/sbin/sshd.session-split ]; then
    - mv -f /usr/sbin/sshd.session-split /usr/sbin/sshd
    - fi
    + mv -f /usr/sbin/sshd.session-split /usr/sbin/sshd
    fi
    # end-remove-after
    fi

    I haven't tested this as yet, but do you think it would be better? It
    seemed clearest to use the same condition in the preinst and postinst, but I could be persuaded either way.

    I'm inclined to prefer the version that removes the diversion in all cases where /usr/sbin/sshd.session-split exists. If that exists, it means the diversion is still there, and it must be removed, even if the postinst doesn't think we're upgrading from an older version. If it doesn't exist, there's no harm in having this code in the postinst.

    Maybe it could also be useful to add some specific output when this is happening. That could make it easier to debug things if unexpected corner
    cases were to show up. I don't really have a good suggestion of the conditions under which it would be good to give additional output (without alarming users in the standard scenario), though.

    Thanks,

    Ivo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Ivo De Decker on Fri Aug 1 17:10:01 2025
    XPost: linux.debian.devel.release

    On Fri, Aug 01, 2025 at 01:10:47PM +0000, Ivo De Decker wrote:
    As mentioned on IRC, I added the unblock, but I'm leaving the bug open for >now, to see if we want to add additional changes.

    At least this will allow the change to get into trixie, and should allow us to >collect feedback from users who upgrade in the coming days.

    Thanks!

    On Fri, Aug 01, 2025 at 12:06:50PM +0100, Colin Watson wrote:
    I haven't tested this as yet, but do you think it would be better? It
    seemed clearest to use the same condition in the preinst and postinst, but I >> could be persuaded either way.

    I'm inclined to prefer the version that removes the diversion in all cases >where /usr/sbin/sshd.session-split exists. If that exists, it means the >diversion is still there, and it must be removed, even if the postinst doesn't >think we're upgrading from an older version. If it doesn't exist, there's no >harm in having this code in the postinst.

    Maybe it could also be useful to add some specific output when this is >happening. That could make it easier to debug things if unexpected corner >cases were to show up. I don't really have a good suggestion of the conditions >under which it would be good to give additional output (without alarming users >in the standard scenario), though.

    OK, I added a message which I think is not too alarming, and ran it
    through all the same tests as before:

    Setting up openssh-server (1:10.0p1-7) ...
    Installing new version of config file /etc/pam.d/sshd ...
    Installing new version of config file /etc/ssh/moduli ...
    Replacing config file /etc/ssh/sshd_config with new version
    Finishing upgrade from pre-9.8 monolithic sshd ...
    Removing 'diversion of /usr/sbin/sshd to /usr/sbin/sshd.session-split by openssh-client'
    ssh.socket is a disabled or a static unit not running, not starting it.
    Created symlink /etc/systemd/system/ssh.service.wants/sshd-keygen.service → /lib/systemd/system/sshd-keygen.service.
    Created symlink /etc/systemd/system/sshd.service.wants/sshd-keygen.service → /lib/systemd/system/sshd-keygen.service.
    Created symlink /etc/systemd/system/[email protected]/sshd-keygen.service → /lib/systemd/system/sshd-keygen.service.
    Created symlink /etc/systemd/system/ssh.socket.wants/sshd-keygen.service → /lib/systemd/system/sshd-keygen.service.

    debdiff attached, and I've uploaded this to unstable since (as mentioned
    on IRC) I'm about to be away for a couple of days and you probably want
    to be able to get the refined version in ASAP.

    Thanks,

    --
    Colin Watson (he/him) [[email protected]]

    diff --git a/debian/changelog b/debian/changelog
    index 8fedadf2f..eadb5be63 100644
    --- a/debian/changelog
    +++ b/debian/changelog
    @@ -1,3 +1,9 @@
    +openssh (1:10.0p1-7) unstable; urgency=medium
    +
    + * Make postinst logic for cleaning up the sshd diversion more robust.
    +
    + -- Colin Watson <[email protected]> Fri, 01 Aug 2025 16:02:27 +0100
    +
    openssh (1:10.0p1-6) unstable; urgency=medium

    * Temporarily divert /usr/sbin/sshd during upgrades from before
    diff --git a/debian/openssh-server.postinst b/debian/openssh-server.postinst index c0d43006d..498777ad6 100644
    --- a/debian/openssh-server.postinst
    +++ b/debian/openssh-server.postinst
    @@ -116,7 +116,7 @@ if [ "$action" = configure ]; then
    systemctl disable ssh.service
    fi
    # begin-remove-after: released:forky
    - if dpkg --compare-versions "$2" lt-nl 1:9.8p1-1~; then
    + if [ -e /usr/sbin/sshd.session-split ]; then
    # We're ready to restart the listener process so that it
    # executes sshd-session rather than sshd for new
    # connections, so we can remove this diversion now. This
    @@ -126,11 +126,10 @@ if [ "$