• Bug#1109742: upgrade-reports: No new SSH connections possible during la

    From Manfred Stock@21:1/5 to All on Wed Jul 23 01:10:01 2025
    XPost: linux.debian.devel.testing

    Package: upgrade-reports
    Severity: normal

    My previous release is: Debian Bookworm/12
    I am upgrading to: Debian Trixie/13
    Archive date: From https://mirror.init7.net/debian/project/trace/ftp-master.debian.org:
    Tue Jul 22 14:36:00 UTC 2025
    Creator: dak g7a63da59
    Running on host: fasolo.debian.org
    Archive serial: 2025072203
    Date: Tue, 22 Jul 2025 14:36:00 +0000
    Architectures: all amd64 arm64 armel armhf hurd-i386 i386 ia64 kfreebsd-amd64 kfreebsd-i386 mips mips64el mipsel powerpc ppc64el riscv64 s390 s390x sparc source
    Upgrade date: 2025-07-22, ~17:15 CEST
    uname -a before upgrade: Not recorded
    uname -a after upgrade: Linux monitoring 6.12.35+deb13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.35-1 (2025-07-03) x86_64 GNU/Linux
    Method: Roughly `apt update; apt dist-upgrade --autoremove --purge`, via SSH

    Contents of /etc/apt/sources.list:
    deb https://mirror.init7.net/debian/ trixie main
    deb-src https://mirror.init7.net/debian/ trixie main
    deb https://mirror.init7.net/debian/ trixie-backports main
    deb-src https://mirror.init7.net/debian/ trixie-backports main
    deb https://mirror.init7.net/debian/ trixie-updates main
    deb-src https://mirror.init7.net/debian/ trixie-updates main

    deb https://security.debian.org/debian-security trixie-security main
    deb-src https://security.debian.org/debian-security trixie-security main

    - Were there any non-Debian packages installed before the upgrade? If
    so, what were they? => No, there should not have been any.

    - Was the system pre-update a 'pure' system only containing packages
    from the previous release? If not, which packages were not from that
    release? => Yes, it should have been pure.

    - Did any packages fail to upgrade? => No, there were no failures.

    - Were there any problems with the system after upgrading? => No
    problems that I have noticed so far.


    Further Comments/Problems: I've upgraded several Bookworm systems to
    Trixie so far, which went pretty smooth. But there's one thing I keep
    noticing, and which I observed a bit more closely while upgrading the
    system I'm sending this report from: Starting at roughly the time when
    dpkg says something like

    Unpacking openssh-server (1:10.0p1-5) over (1:9.2p1-2+deb12u6) ...

    I'm not able anymore to open new SSH connections to the system I'm
    upgrading. The SSH daemon is still running, and the existing connections
    also still work, but new connections fail with

    kex_exchange_identification: read: Connection reset by peer
    Connection reset by fd... port 22

    on the client. At this time, I see messages like the following in the
    output from `systemctl status openssh-server.service` (the SSH daemon is
    still running, usually since the last reboot, or in this case since the
    libc upgrade earlier during the upgrade process, so the daemon process
    itself should still be running the binaries from Bookworm, even though
    the new binaries have already been extracted):

    Jul 22 17:37:32 monitoring sshd[492742]: -R not supported here

    The upgrade continues as usual. At some point, I get asked if I want to
    install the new SSH configuration from the package or keep my modified
    version (and it doesn't seem to matter what I answer to the question) -
    but once dpkg restarts the SSH daemon afterwards, new connections are
    possible again.

    To me, it seems like the old binary, which is still running, is passing
    an unsupported parameter to the new binary that was already unpacked
    when trying to fork off a new process for the new connection (but I
    haven't checked if that's how it actually works when a new connection is opened, I'm just guessing). The "-R not supported here" string seems to
    be 'new', i.e. I didn't find it in the openssh package source on
    Bookworm, but it exists in the version from Trixie.

    I can't preclude that I'm consistently doing something
    wrong/unusual/strange during the upgrade or that my SSH daemon
    configuration contains something weird (although I'm not aware of
    anything special in there), so maybe this doesn't affect others. So far,
    I haven't noticed any bug report against the openssh package, an entry
    in the release notes for Trixie or the NEWS file for openssh which
    mentions an issue like this one, but I'm sorry if I missed that.

    Hope this helps, and many thanks for your efforts!
    Manfred

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Manfred Stock on Thu Jul 24 14:30:01 2025
    XPost: linux.debian.devel.testing

    On Tue, Jul 22, 2025 at 07:42:07PM +0200, Manfred Stock wrote:
    Further Comments/Problems: I've upgraded several Bookworm systems to
    Trixie so far, which went pretty smooth. But there's one thing I keep >noticing, and which I observed a bit more closely while upgrading the
    system I'm sending this report from: Starting at roughly the time when
    dpkg says something like

    Unpacking openssh-server (1:10.0p1-5) over (1:9.2p1-2+deb12u6) ...

    I'm not able anymore to open new SSH connections to the system I'm
    upgrading. The SSH daemon is still running, and the existing connections
    also still work, but new connections fail with

    kex_exchange_identification: read: Connection reset by peer
    Connection reset by fd... port 22

    on the client. At this time, I see messages like the following in the
    output from `systemctl status openssh-server.service` (the SSH daemon is >still running, usually since the last reboot, or in this case since the
    libc upgrade earlier during the upgrade process, so the daemon process
    itself should still be running the binaries from Bookworm, even though
    the new binaries have already been extracted):

    Jul 22 17:37:32 monitoring sshd[492742]: -R not supported here
    [...]
    To me, it seems like the old binary, which is still running, is passing
    an unsupported parameter to the new binary that was already unpacked
    when trying to fork off a new process for the new connection (but I
    haven't checked if that's how it actually works when a new connection is >opened, I'm just guessing). The "-R not supported here" string seems to
    be 'new', i.e. I didn't find it in the openssh package source on
    Bookworm, but it exists in the version from Trixie.

    Thanks for the report. This will be due to the split of sshd-session
    from the main sshd binary; the old sshd re-executed itself with
    different arguments, but the new sshd executes sshd-session instead and
    has removed support for the parameters that it used to rely on during re-execution.

    I'll have to set up a suitable environment to test this, but my best
    idea for now is to have openssh-server.preinst take a copy of the old
    sshd binary before dpkg unpacks the new files, and patch sshd to re-exec
    that copy if it exists and it receives the -R option. The postinst can
    then remove the copy after it's restarted the new sshd.

    Tricky!

    --
    Colin Watson (he/him) [[email protected]]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Colin Watson on Thu Jul 24 17:00:01 2025
    XPost: linux.debian.devel.release, linux.debian.devel.testing

    Control: affects -1 openssh-server

    [TL;DR: I think it may not be possible to properly solve this without a bookworm update as well as a change to trixie.]

    On Thu, Jul 24, 2025 at 01:19:40PM +0100, Colin Watson wrote:
    On Tue, Jul 22, 2025 at 07:42:07PM +0200, Manfred Stock wrote:
    Further Comments/Problems: I've upgraded several Bookworm systems to
    Trixie so far, which went pretty smooth. But there's one thing I keep >>noticing, and which I observed a bit more closely while upgrading the >>system I'm sending this report from: Starting at roughly the time when
    dpkg says something like

    Unpacking openssh-server (1:10.0p1-5) over (1:9.2p1-2+deb12u6) ...

    I'm not able anymore to open new SSH connections to the system I'm >>upgrading. The SSH daemon is still running, and the existing connections >>also still work, but new connections fail with

    kex_exchange_identification: read: Connection reset by peer
    Connection reset by fd... port 22

    on the client.
    [...]
    Thanks for the report. This will be due to the split of sshd-session
    from the main sshd binary; the old sshd re-executed itself with
    different arguments, but the new sshd executes sshd-session instead
    and has removed support for the parameters that it used to rely on
    during re-execution.

    I'll have to set up a suitable environment to test this, but my best
    idea for now is to have openssh-server.preinst take a copy of the old
    sshd binary before dpkg unpacks the new files, and patch sshd to
    re-exec that copy if it exists and it receives the -R option. The
    postinst can then remove the copy after it's restarted the new sshd.

    This approach failed in my first test. To control the order of
    operations, I just ran "dpkg --unpack" on the new .deb, and then the new /usr/sbin/sshd failed before it got as far as re-execing the temporary
    copy because it needed a newer libc. apt might not do that in normal situations, but we clearly want to avoid this.

    My next approach was to try a temporary diversion of /usr/sbin/sshd.
    This works, although it involves a slightly odd invocation for
    openssh-server to be able to divert one of its own files. See the "openssh_10.0p1-6.debdiff" attachment. Would the release team accept
    something like this for trixie?


    However, this isn't the whole story. Once the new libssl3t64 is
    unpacked, new connections fail with "OpenSSL version mismatch. Built
    against 30000100, you have 30500010". This part of the problem can't be
    fixed by a change in trixie, because the problem is that the _old_ sshd,
    before restarting, fails to tolerate newer minor versions of OpenSSL.
    This was fixed upstream in OpenSSH 9.4, and if I'd noticed previously
    that this would be an upgrade problem I'd already have included it in a bookworm update.

    So, I think we also need to fix that in bookworm. See the "openssh_9.2p1-2+deb12u7.debdiff" attachment (for brevity I've pruned
    some noise from git-dpm that just updates some commit IDs in patches).

    Timing-wise, this is tricky. IMO we really need to get this out before
    trixie releases to minimize the chance of users running into this if
    they rush to upgrade. Would the security team be willing to consider
    pushing this out via -security? Failing that, we'd have to wait until
    the next point release of bookworm, which I think would be unfortunate
    given that the consequences of sshd being broken between unpack and
    configure can include a failed remote upgrade with no way to access the
    system (if you forget to maintain a separate ssh connection, or if your
    network connection is interrupted).

    Thanks,

    --
    Colin Watson (he/him) [[email protected]]

    diff -Nru openssh-10.0p1/debian/changelog openssh-10.0p1/debian/changelog
    --- openssh-10.0p1/debian/changelog 2025-05-09 13:40:49.000000000 +0100
    +++ openssh-10.0p1/debian/changelog 2025-06-05 02:53:41.000000000 +0100
    @@ -1,3 +1,11 @@
    +openssh (1:10.0p1-6) UNRELEASED; urgency=medium
    +
    + * Temporarily divert /usr/sbin/sshd during upgrades from before
    + 1:9.8p1-1~, to avoid new connections failing between unpack and
    + configure (closes: #1109742).
    +
    + -- Colin Watson <[email protected]> Thu, 05 Jun 2025 02:53:41 +0100
    +
    openssh (1:10.0p1-5) unstable; urgency=medium

    * Ensure that configure knows the path to passwd; fixes reproducibility of diff -Nru openssh-10.0p1/debian/openssh-server.postinst openssh-10.0p1/debian/openssh-server.postinst
    --- openssh-10.0p1/debian/openssh-server.postinst 2025-05-09 13:40:49.000000000 +0100
    +++ openssh-10.0p1/debian/openssh-server.postinst 2025-06-05 02:53:41.000000000 +0100
    @@ -11,10 +11,16 @@

    get_config_option() {
    option="$1"
    + sshd_path=/usr/sbin/sshd

    [ -f /etc/ssh/sshd_config ] || return

    - /usr/sbin/
  • From Salvatore Bonaccorso@21:1/5 to Colin Watson on Thu Jul 24 18:50:01 2025
    XPost: linux.debian.devel.release, linux.debian.devel.testing

    Hi,

    On Thu, Jul 24, 2025 at 03:53:05PM +0100, Colin Watson wrote:
    Control: affects -1 openssh-server

    [TL;DR: I think it may not be possible to properly solve this without a bookworm update as well as a change to trixie.]

    On Thu, Jul 24, 2025 at 01:19:40PM +0100, Colin Watson wrote:
    On Tue, Jul 22, 2025 at 07:42:07PM +0200, Manfred Stock wrote:
    Further Comments/Problems: I've upgraded several Bookworm systems to Trixie so far, which went pretty smooth. But there's one thing I keep noticing, and which I observed a bit more closely while upgrading the system I'm sending this report from: Starting at roughly the time when dpkg says something like

    Unpacking openssh-server (1:10.0p1-5) over (1:9.2p1-2+deb12u6) ...

    I'm not able anymore to open new SSH connections to the system I'm upgrading. The SSH daemon is still running, and the existing connections also still work, but new connections fail with

    kex_exchange_identification: read: Connection reset by peer
    Connection reset by fd... port 22

    on the client.
    [...]
    Thanks for the report. This will be due to the split of sshd-session
    from the main sshd binary; the old sshd re-executed itself with
    different arguments, but the new sshd executes sshd-session instead and
    has removed support for the parameters that it used to rely on during re-execution.

    I'll have to set up a suitable environment to test this, but my best
    idea for now is to have openssh-server.preinst take a copy of the old
    sshd binary before dpkg unpacks the new files, and patch sshd to re-exec that copy if it exists and it receives the -R option. The postinst can then remove the copy after it's restarted the new sshd.

    This approach failed in my first test. To control the order of operations,
    I just ran "dpkg --unpack" on the new .deb, and then the new /usr/sbin/sshd failed before it got as far as re-execing the temporary copy because it needed a newer libc. apt might not do that in normal situations, but we clearly want to avoid this.

    My next approach was to try a temporary diversion of /usr/sbin/sshd. This works, although it involves a slightly odd invocation for openssh-server to be able to divert one of its own files. See the "openssh_10.0p1-6.debdiff" attachment. Would the release team accept something like this for trixie?


    However, this isn't the whole story. Once the new libssl3t64 is unpacked, new connections fail with "OpenSSL version mismatch. Built against 30000100, you have 30500010". This part of the problem can't be fixed by a change in trixie, because the problem is that the _old_ sshd, before restarting, fails to tolerate newer minor versions of OpenSSL. This was fixed upstream in OpenSSH 9.4, and if I'd noticed previously that this would be an upgrade problem I'd already have included it in a bookworm update.

    So, I think we also need to fix that in bookworm. See the "openssh_9.2p1-2+deb12u7.debdiff" attachment (for brevity I've pruned some noise from git-dpm that just updates some commit IDs in patches).

    Timing-wise, this is tricky. IMO we really need to get this out before trixie releases to minimize the chance of users running into this if they rush to upgrade. Would the security team be willing to consider pushing
    this out via -security? Failing that, we'd have to wait until the next
    point release of bookworm, which I think would be unfortunate given that the consequences of sshd being broken between unpack and configure can include a failed remote upgrade with no way to access the system (if you forget to maintain a separate ssh connection, or if your network connection is interrupted).

    IMHO, as this is not a security-update releaseing via a DSA is wrong,
    but the correct target would be preparing it for bookworm point
    release but release the updates with the reasoning above earlier via a
    SUA (the release team obviously have to agree with that suggestion).
    But IMHO stable-updates would be a perfect candidate for this usecase,
    correct?

    Regards,
    Salvatore

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Salvatore Bonaccorso on Thu Jul 24 22:40:01 2025
    XPost: linux.debian.devel.release, linux.debian.devel.testing

    On Thu, Jul 24, 2025 at 06:44:21PM +0200, Salvatore Bonaccorso wrote:
    IMHO, as this is not a security-update releaseing via a DSA is wrong,
    but the correct target would be preparing it for bookworm point
    release but release the updates with the reasoning above earlier via a
    SUA (the release team obviously have to agree with that suggestion).
    But IMHO stable-updates would be a perfect candidate for this usecase, >correct?

    I think I'd somehow managed to miss that SUAs were a thing. Yes, if the release team is happy with that then it sounds fine to me.

    Thanks,

    --
    Colin Watson (he/him) [[email protected]]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jonathan Wiltshire@21:1/5 to Colin Watson on Sat Jul 26 21:50:01 2025
    XPost: linux.debian.devel.testing

    On Thu, Jul 24, 2025 at 09:36:43PM +0100, Colin Watson wrote:
    On Thu, Jul 24, 2025 at 06:44:21PM +0200, Salvatore Bonaccorso wrote:
    IMHO, as this is not a security-update releaseing via a DSA is wrong,
    but the correct target would be preparing it for bookworm point
    release but release the updates with the reasoning above earlier via a
    SUA (the release team obviously have to agree with that suggestion).
    But IMHO stable-updates would be a perfect candidate for this usecase, correct?

    I think I'd somehow managed to miss that SUAs were a thing. Yes, if the release team is happy with that then it sounds fine to me.

    Yes please, at least for SRMs. stable-updates was my first thought when I
    saw this bug and it's exactly the case they are designed for. The trixie
    part will still need a normal unblock.

    Please bear in mind that wider pre-release testing of SUAs is approximately zero, so no pressure :)

    Thanks,

    --
    Jonathan Wiltshire [email protected]
    Debian Developer http://people.debian.org/~jmw

    4096R: 0xD3524C51 / 0A55 B7C5 1223 3942 86EC 74C3 5394 479D D352 4C51 ed25519/0x196418AAEB74C8A1: CA619D65A72A7BADFC96D280196418AAEB74C8A1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)