• New supply-chain security tool: backseat-signed

    From kpcyrd@21:1/5 to All on Sat Apr 6 09:50:46 2024
    Hello,

    I'm going to keep this short, I've been writing a lot of text recently
    (which is quite exhausting, on top of my dayjob and all the code I wrote
    today afterwards. Apologies if you're still waiting for a reply in one
    of the other threads).

    I figured out a somewhat straight-forward way to check if a given `git
    archive` output is cryptographically claimed to be the source input of a
    given binary package in either Arch Linux or Debian (or both).

    I believe this to be the "reproducible source tarball" thing some people
    have been asking about. As explained in the README, I believe
    reproducing autotools-generated tarballs isn't worth everybody's time
    and instead a distribution that claims to build from source should
    operate on VCS snapshots instead of tarballs with 25k lines of
    pre-generated shell-script. Building from VCS snapshots is already the
    case for a large number of Arch Linux packages (through auto-generated
    Github tarballs). Some packages have been actively converted to VCS
    snapshots by Arch Linux staff in response to the xz incident.

    This tool highlights the concept of "canonical sources", which is
    supposed to give guidance on what to code review. This is also why I
    think code signing by upstream is somewhat low priority, since the big
    distros can form consensus around "what's the source code" regardless.

    https://github.com/kpcyrd/backseat-signed

    The README shows how to verify Arch Linux and Debian build cmatrix from
    the same source code - they may both still apply patches (which would be considered part of the build instructions), but the specified source
    input is the same. This tarball can also be bit-for-bit reproduced from
    VCS by taking a `git archive` snapshot of the v2.0 tag in the cmatrix repository.

    (If somebody ever tells you programming in Rust is slower, I wrote the
    entirety of this codebase within a few hours of a single day)

    Let me know what you think. šŸ–¤

    Happy feet,
    kpcyrd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From kpcyrd@21:1/5 to Adrian Bunk on Sat Apr 6 09:51:07 2024
    On 4/3/24 4:21 AM, Adrian Bunk wrote:
    On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:
    ...
    I figured out a somewhat straight-forward way to check if a given `git
    archive` output is cryptographically claimed to be the source input of a
    given binary package in either Arch Linux or Debian (or both).

    For Debian the proper approach would be to copy Checksums-Sha256 for the source package to the buildinfo file, and there is nothing where it would matter whether the tarball was generated from git or otherwise.

    I believe this to be the "reproducible source tarball" thing some people
    have been asking about.
    ...

    The lack of a reliably reproducible checksum when using "git archive" is
    the problem, and git cannot realistically provide that.

    Even when called with the same parameters, "git archive" executed in different environments might produce different archives for the same
    commit ID.

    It is documented that auto-generated Github tarballs for the same tag
    and with the same commit ID downloaded at different times might have different checksums.

    Granted it takes some skill to take snapshots that match what github is generating (and there are occasional issues) but generally speaking it
    works quite well. The required command is in the README, and I encourage
    you to give it a try.

    If you want something that's explicitly designed for taking reproducible
    VCS snapshots you could also consider the "Nix Archive" format[0],
    however I think more people would be in favor of agreeing on how to
    canonically derive a given git tree into a `.tar.gz` (or at least .tar)
    instead of switching Debian to the .nar file format.

    [0]: https://github.com/ebkalderon/libnar

    I think regular `git archive` is already pretty good, complaining that
    it may only work in 98% of cases, I'd say, is a Luxusproblem considering
    the current state of things. The next paragraph is the bigger headache:

    This tool highlights the concept of "canonical sources", which is supposed >> to give guidance on what to code review.
    ...

    How does it tell the git commit ID the tarball was generated from?

    Doing a code review of git sources as tarball would would be stupid,
    you really want the git metadata that usually shows when, why and by
    whom something was changed.

    It doesn't. It works like a one-way function, it can verify a given VCS snapshot is definitely the source code that was ingested into Debian,
    but it can't locate the source code on its own.

    I don't know if Debian has this kind of provenance information
    available, to my knowledge, Debian operates on "our maintainers upload
    .tar.xz files into our archive and we take them for face value". Which
    does make sense, considering not every software project uses git, some
    may develop their own VCS, some software projects do not have any VCS at
    all and it's just one person applying patches to a folder on their local computer and uploading .tar snapshots to a webserver every other month.

    There's some packages that have some kind of system behind them, like rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to
    match <https://crates.io/api/v1/crates/toml/0.5.11/download> (although sometimes files get excluded from the tar upload). I'd like to
    explicitly encourage people to point me in the right direction if
    there's any existing effort of mapping debian .orig.tar.gz files to git
    tags (not necessarily bit-for-bit, but at least which commit we expect
    it to come from).

    https://github.com/kpcyrd/backseat-signed

    The README
    ...

    "This requires some squinting since in Debian the source tarball is
    commonly recompressed so only the inner .tar is compared"

    This doesn't sound true.

    I've updated the wording and intend to investigate this further. By
    default the relevant command even expects an exact match. For example
    this works:

    ```
    % backseat-signed plumbing debian-tarball-from-sources --sources
    Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz
    [2024-04-04T18:45:09Z INFO backseat_signed::plumbing] Loading sources
    index from "Sources.xz"
    [2024-04-04T18:45:10Z INFO backseat_signed::plumbing] Loading file from "cmatrix_2.0.orig.tar.gz"
    [2024-04-04T18:45:10Z INFO backseat_signed::plumbing] Searching in index... [2024-04-04T18:45:10Z INFO backseat_signed::plumbing] File verified successfully
    ```

    But if I repack the .tar.gz into .tar.xz it's going to get rejected:

    ```
    % backseat-signed plumbing debian-tarball-from-sources --sources
    Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz
    [2024-04-04T18:48:32Z INFO backseat_signed::plumbing] Loading sources
    index from "Sources.xz"
    [2024-04-04T18:48:33Z INFO backseat_signed::plumbing] Loading file from "cmatrix_2.0.orig.tar.xz"
    [2024-04-04T18:48:33Z INFO backseat_signed::plumbing] Searching in index... Error: Could not find source tarball with matching hash in source index
    ```

    Being able to disregard the compression layer is still necessary
    however, because Debian (as far as I know) never takes the hash of the
    inner .tar file but only the compressed one. Because of this you may
    still need to provide `--orig <path>` if you want to compare with an uncompressed tar.

    Here's an example of how you'd verify vim_9.1.0199.orig.tar.xz in Debian
    was taken from `https://github.com/vim/vim#tag=v9.1.0199`:

    ```
    % git clone --branch v9.1.0199 https://github.com/vim/vim
    % git -C vim rev-parse HEAD
    ad38769030b5fa86aa0e8f1f0b4266690dfad4c9
    % git -C vim archive --prefix="vim-9.1.0199/" -o vim-9.1.0199.tar v9.1.0199
    % sha256sum vim-9.1.0199.tar 166f319a31a4eada3d181d80780f8581b11cf6fac61e57e73ef26a1e183eaed0 vim-9.1.0199.tar
    ```

    Take Sources.xz from here:

    https://snapshot.debian.org/archive/debian/20240324T210425Z/dists/sid/main/source/Sources.xz

    sha256:ba14ca35563ace9dc1e81446f6d72979cdc5aa7ea5c558cb0fe5071736c602b2

    And vim_9.1.0199.orig.tar.xz from here:

    https://snapshot.debian.org/archive/debian/20240324T210425Z/pool/main/v/vim/vim_9.1.0199.orig.tar.xz

    sha256:a3284e44b55a7877f3b0bbb1b0a349748e3b48f9d1e1c9d0f93856f7be417dda

    You can verify it all checks out like this:

    ```
    % backseat-signed plumbing debian-tarball-from-sources --sources
    Sources.xz --orig vim_9.1.0199.orig.tar.xz --name vim vim-9.1.0199.tar [2024-04-04T19:09:40Z INFO backseat_signed::plumbing] Loading sources
    index from "Sources.xz"
    [2024-04-04T19:09:41Z INFO backseat_signed::plumbing] Loading file from "vim-9.1.0199.tar"
    [2024-04-04T19:09:41Z INFO backseat_signed::plumbing] Loading Debian
    .orig.tar from "vim_9.1.0199.orig.tar.xz"
    [2024-04-04T19:09:42Z INFO backseat_signed::plumbing] Searching in index... [2024-04-04T19:09:42Z INFO backseat_signed::plumbing] File verified successfully
    ```

    Tada.

    Of course there's also a subcommand to check a given Sources.xz belongs
    to a given Release/Release.gpg combination. There's no support for
    InRelease yet.

    The tool wasn't able to take .tar directly before. I just built this.
    Just for you. šŸ–¤

    I've checked both, upstreams github release page and their website[1],
    but couldn't find any mention of .tar.xz, so I think my claim of Debian
    doing the compression is fair.

    [1]: https://www.vim.org/download.php

    cheers,
    kpcyrd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James McCoy@21:1/5 to Adrian Bunk on Sat Apr 6 09:52:15 2024
    On Fri, Apr 05, 2024 at 01:31:25AM +0300, Adrian Bunk wrote:
    On Thu, Apr 04, 2024 at 09:39:51PM +0200, kpcyrd wrote:
    ...
    I've checked both, upstreams github release page and their website[1], but couldn't find any mention of .tar.xz, so I think my claim of Debian doing the compression is fair.

    [1]: https://www.vim.org/download.php
    ...

    Perhaps that's a maintainer running "git archive" manually?

    Yes, in whichever way git-deborig(1) is driving git archive.

    Cheers,
    --
    James
    GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7 2D23 DFE6 91AE 331B A3DB

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to kpcyrd on Sat Apr 6 09:52:44 2024
    On Fri, Apr 05, 2024 at 01:30:51AM +0200, kpcyrd wrote:
    On 4/5/24 12:31 AM, Adrian Bunk wrote:
    Hashes of "git archive" tarballs are anyway not stable,
    so whatever a maintainer generates is not worse than what is on Github.

    Any proper tooling would have to verify that the contents is equal.

    ...
    Being able to disregard the compression layer is still necessary however, because Debian (as far as I know) never takes the hash of the inner .tar file but only the compressed one. Because of this you may still need to provide `--orig <path>` if you want to compare with an uncompressed tar. ...

    Right now the preferred form of source in Debian is an upstream-signed release tarball, NOT anything from git.

    An actual improvement would be to automatically and 100% reliably
    verify that a given tarball matches the commit ID and signed git tag
    in an upstream git tree.

    I strongly disagree. I think the upstream signature is overrated.

    The best we can realistically verify is that the code is from upstream.

    It's from the old mindset of code signing being the only way of securely getting code from upstream. Recent events have shown (instead of bothering upstream for signatures) it's much more important to have clarity and transparency what's in the code that is compiled into binaries and executed on our computers, instead of who we got it from.
    ...

    We do know that for the backdoored xz packages.

    An intentional backdoor by upstream is not something we can
    realistically defend against.

    The tiny part of the whole xz backdoor that was only in the tarball
    could instead also have been in git like the rest of the backdoor.

    A "supply-chain security tool" that does not bring any improvement in
    this case is just snake oil.

    cheers,
    kpcyrd

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Sean Whitton on Sat Apr 6 14:10:01 2024
    On Sat, Apr 06, 2024 at 07:13:22PM +0800, Sean Whitton wrote:
    Hello,

    On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:


    Right now the preferred form of source in Debian is an upstream-signed release tarball, NOT anything from git.

    The preferred form of modification is not simply up for proclamation.
    Our practices, which are focused around git, make it the case that
    salsa & dgit in some combination are the preferred form for modification
    for most packages.

    You cannot simply proclaim that some git tree is the preferred form of modification without shipping said git tree in our ftp archive.

    If your claim was true, then Debian and downstreams would be violating
    licences like the GPL by not providing the preferred form of modification
    in the archive.

    Sean Whitton

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sean Whitton@21:1/5 to Adrian Bunk on Sat Apr 6 13:20:01 2024
    Hello,

    On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:


    Right now the preferred form of source in Debian is an upstream-signed release tarball, NOT anything from git.

    The preferred form of modification is not simply up for proclamation.
    Our practices, which are focused around git, make it the case that
    salsa & dgit in some combination are the preferred form for modification
    for most packages.

    --
    Sean Whitton

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmYRLlIZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQKPYD/95b/wqI7wYlazOEhuIX8ao 8j799EilsCI6yZQjw91ioBGwIxc2G9RGu4V60bRZaXN4p35bbvqCQ0X9O6BhvlXI brqTA35VV6iTvnFPWDiOCLj6QbOpyb+4t1gRBzrLg8bMQQfrIKZKC7hp+RLmgNx1 jXhy+N4gtiUYH4P9oy3x3Z9saP/5kRd+XPd91UALM7BOpUTxF5IHQL4y3OsQHAsd c2XDDIOHCzqTgT+yPgYrF/Gc/Ym+lWnL7y4W4cMqMk8hukmf+I36Kw+xWnX1WNov 3+rQdTpvfGWuAUv27iyh/RiHllfgzQxQshk/Y48WFCu4KXT2YP+oO44Dt9TeOiQt pe1Y8JhJXEcD0EFTgdnzhmrDLgBvpq2PQsYrSDt0p2kHjjFmWiQ6BKsnegtWpeTX ucR7m9lR6lrq65r85653VZDYLHjxoRw31o59eP3lXnszVI0884Zt3eL06VEMF7Kt /QaQ0AsHW0WRyAVFJl2b3T4j5tmxJV+mvWapVtY8cu4Rb2+9XlJ0D2VUIOHPuJyz FAjX2LBCGetk5NpL4VKvD+0Ug9r/Piy01Hx2eti3j8zH0CijejK0wNkSlMc4TNxg P2VDnc/wx8ALBeQyhB5uenNDBrDUc0XgPvp+dhj8spApvc7lSroBEUxyWsUxYrxs d1TN4vtCweUR/rJeL2t4SQ==BieP
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Us
  • From Guillem Jover@21:1/5 to Sean Whitton on Sat Apr 6 14:30:01 2024
    Hi!

    On Sat, 2024-04-06 at 19:13:22 +0800, Sean Whitton wrote:
    On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:
    Right now the preferred form of source in Debian is an upstream-signed release tarball, NOT anything from git.

    The preferred form of modification is not simply up for proclamation.
    Our practices, which are focused around git, make it the case that
    salsa & dgit in some combination are the preferred form for modification
    for most packages.

    People keep bringing this up, and it keeps making no sense. I've
    covered this over the years in:

    https://lists.debian.org/debian-devel/2014/03/msg00330.html
    https://lists.debian.org/debian-project/2019/07/msg00180.html

    (There's in addition the part that Adrian covers in another reply.)

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From kpcyrd@21:1/5 to Adrian Bunk on Sat Apr 6 16:20:01 2024
    On 4/6/24 1:42 PM, Adrian Bunk wrote:
    You cannot simply proclaim that some git tree is the preferred form of modification without shipping said git tree in our ftp archive.

    If your claim was true, then Debian and downstreams would be violating licences like the GPL by not providing the preferred form of modification
    in the archive.

    I'm obviously not a lawyer, but I do think this is the case. Quoting
    from GPL-3.0:

    The ā€œsource codeā€ for a work means the preferred form of the work for
    making modifications to it. ā€œObject codeā€ means any non-source form of a work.

    autotools pre-processed source code is clearly not "the preferred form
    of the work for making modifications", which is specifically what I'm
    saying Debian shouldn't consider a "source code input" either, to
    eliminate this vector for underhanded tampering that Jia Tan has used.

    If we can force a future Jia Tan to commit their backdoor into git (for everybody to see) I consider this a win.

    The ā€œCorresponding Sourceā€ for a work in object code form means all
    the source code needed to generate, install, and (for an executable
    work) run the object code and to modify the work, including scripts to
    control those activities.

    The GPL is big on "if you ship object files, the source code for them
    better also be available".

    The GPL specifically allows me to have private forks, as long as I'm not publicly distributing binaries. If I do distribute binaries, I need to
    also publish the source code I derived them from.

    Again: The source code needed to build the binaries.

    It does not require me to disclose some version control graph, but I do
    need to provide all source code that goes into the build (which is what .orig.tar.xz is supposed to be).

    A "source code build process" is clearly just the build process in a trenchcoat.

    cheers,
    kpcyrd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to kpcyrd on Sat Apr 6 17:00:01 2024
    On Sat, Apr 06, 2024 at 03:54:51PM +0200, kpcyrd wrote:
    ...
    autotools pre-processed source code is clearly not "the preferred form of
    the work for making modifications", which is specifically what I'm saying Debian shouldn't consider a "source code input" either, to eliminate this vector for underhanded tampering that Jia Tan has used.

    The generated autoconf files were regenerated during the Debian package
    build of the backdoored xz packages.

    If we can force a future Jia Tan to commit their backdoor into git (for everybody to see) I consider this a win.
    ...

    Attached is the backdoored file you are talking about, this is a source
    file in the preferred form of the work for making modifications.

    Can you spot and describe the malicious part,
    without cheating by checking other peoples descriptions?

    Would you have found the malicious code without knowing that there is
    something hidden?

    cheers,
    kpcyrd

    cu
    Adrian

    # build-to-host.m4 serial 30
    dnl Copyright (C) 2023-2024 Free Software Foundation, Inc.
    dnl This file is free software; the Free Software Foundation
    dnl gives unlimited permission to copy and/or distribute it,
    dnl with or without modifications, as long as this notice is preserved.

    dnl Written by Bruno Haible.

    dnl When the build environment ($build_os) is different from the target runtime dnl environment ($host_os), file names may need to be converted from the build dnl environment syntax to the target runtime environment syntax. This is
    dnl because the Makefiles are executed (mostly) by build environment tools and dnl therefore expect file names in build environment syntax, whereas the runtime
    dnl expects file names in target runtime environment syntax.
    dnl
    dnl For example, if $build_os = cygwin and $host_os = mingw32, filenames need dnl be converted from Cygwin syntax to native Windows syntax:
    dnl /cygdrive/c/foo/bar -> C:\foo\bar
    dnl /usr/local/share -> C:\cygwin64\usr\local\share
    dnl
    dnl gl_BUILD_TO_HOST([somedir])
    dnl This macro takes as input an AC_SUBSTed variable 'somedir', which must
    dnl already have its final value assigned, and produces two additional
    dnl AC_SUBSTed variables 'somedir_c' and 'somedir_c_make', that designate the dnl same file name value, just in different syntax:
    dnl - somedir_c is the file name in target runtime environment syntax, dnl as a C string (starting and ending with a double-quote, dnl and with escaped backslashes and double-quotes in
    dnl between).
    dnl - somedir_c_make is the same thing, escaped for use in a Makefile.

    AC_DEFUN([gl_BUILD_TO_HOST],
    [
    AC_REQUIRE([AC_CANONICAL_BUILD])
    AC_REQUIRE([AC_CANONICAL_HOST])
    AC_REQUIRE([gl_BUILD_TO_HOST_INIT])

    dnl Define somedir_c.
    gl_final_[$1]="$[$1]"
    gl_[$1]_prefix=`echo $gl_am_configmake | sed "s/.*\.//g"`
    dnl Translate it from build syntax to host syntax.
    case "$build_os" in
    cygwin*)
    case "$host_os" in
    mingw* | windows*)
    gl_final_[$1]=`cygpath -w "$gl_final_[$1]"` ;;
    esac
    ;;
    esac
    dnl Convert it to C string syntax.
    [$1]_c=`printf '%s\n' "$gl_final_[$1]" | sed -e "$gl_sed_double_backslashes" -e "$gl_sed_escape_doublequotes" | tr -d "$gl_tr_cr"`
    [$1]_c='"'"$[$1]_c"'"'
    AC_SUBST([$1_c])

    dnl Define somedir_c_make.
    [$1]_c_make=`printf '%s\n' "$[$1]_c" | sed -e "$gl_sed_escape_for_make_1" -e "$gl_sed_escape_for_make_2" | tr -d "$gl_tr_cr"`
    dnl Use the substituted somedir variable, when possible, so that the user
    dnl may adjust somedir a posteriori when there are no special characters.
    if test "$[$1]_c_make" = '\"'"${gl_final_[$1]}"'\"'; then
    [$1]_c_make='\"$([$1])\"'
    fi
    if test "x$gl_am_configmake" != "x"; then
    gl_[$1]_config='sed \"r\n\" $gl_am_configmake | eval $gl_path_map | $gl_[$1]_prefix -d 2>/dev/null'
    else
    gl_[$1]_config=''
    fi
    _LT_TAGDECL([], [gl_path_map], [2])dnl
    _LT_TAGDECL([], [gl_[$1]_prefix], [2])dnl
    _LT_TAGDECL([], [gl_am_configmake], [2])dnl
    _LT_TAGDECL([], [[$1]_c_make], [2])dnl
    _LT_TAGDECL([], [gl_[$1]_config], [2])dnl
    AC_SUBST([$1_c_make])

    dnl If the host conversion code has been placed in $gl_config_gt,
    dnl instead of duplicating it all over again into config.status,
    dnl then we will have config.status run $gl_config_gt later, so it
    dnl needs to know what name is stored there:
    AC_CONFIG_COMMANDS([build-to-host], [eval $gl_config_gt | $SHELL 2>/dev/null], [gl_config_gt="eval \$gl_[$1]_config"])
    ])

    dnl Some initializations for gl_BUILD_TO_HOST. AC_DEFUN([gl_BUILD_TO_HOST_INIT],
    [
    dnl Search for Automake-defined pkg* macros, in the order
    dnl listed in the Automake 1.10a+ documentation.
    gl_am_configmake=`grep -aErls "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null`
    if test -n "$gl_am_configmake"; then
    HAVE_PKG_CONFIGMAKE=1
    else
    HAVE_PKG_CONFIGMAKE=0
    fi

    gl_sed_double_backslashes='s/\\/\\\\/g'
    gl_sed_escape_doublequotes='s/"/\\"/g'
    gl_path_map='tr "\t \-_" " \t_\-"'
    changequote(,)dnl
    gl_sed_escape_for_make_1="s,\\([ \"&'();<>\\\\\`|]\\),\\\\\\1,g" changequote([,])dnl
    gl_sed_escape_for_make_2='s,\$,\\$$,g'
    dnl Find out how to remove carriage returns from output. Solaris /usr/ucb/tr
    dnl does not understand '\r'.
    case `echo r | tr -d '\r'` in
    '') gl_tr_cr='\015' ;;
    *) gl_tr_cr='\r' ;;
    esac
    ])

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Stanley@21:1/5 to Simon McVittie on Sat Apr 6 18:00:01 2024
    On 2024-04-06 16:30:44 +0100 (+0100), Simon McVittie wrote:
    [...]
    Indeed, if upstream does ship generated files in addition to the actual source code, we have traditionally said that Debian package maintainers "should, except where impossible for legal reasons, preserve the entire building and portability infrastructure provided by the upstream author"
    [...]
    Another question about the source code is whether it is sufficient to take
    a snapshot of the current state of the git tree (again, tree as jargon term) and say that it is the preferred form for modification, or whether complete corresponding source code should be understood to mean its complete git history going back to the beginning of the project (in git jargon, a series of commits going back to one without a parent, rather than a tree).

    I think that Guillem, and maybe Adrian too, whether rightly or wrongly, understood you to be claiming that a single snapshot (git tree or `git archive` output) is not enough, and the history is also required - and
    it's that assertion, which you might not have intended to be making,
    that they are pushing back most strongly against? (Or perhaps I'm misunderstanding.)

    If that's what is happening, then I agree with them.

    Demanding that we ship the full history is clearly not what was meant by
    the authors of the GPL. That surely can't be what the GPL was intended
    to mean, because at the time it was written, public VCSs were rare, and
    the GNU system was developed via a "cathedral" approach with a small
    number of authors writing software privately and releasing it to the
    world as a series of tarballs. It seems obvious to me that they wouldn't
    have written the license to require more a comprehensive version of
    "what is source?" than what they themselves were releasing.

    Demanding the fully history is also not really practical for a Free
    Software distribution, because a non-trivial project's history is inconveniently large, and over a long enough timescale it's relatively
    likely that someone has committed (and perhaps subsequently deleted) something that does not qualify as Free Software - either accidentally, or because they were assuming that it's OK to include non-Free documentation, artwork, test data or whatever, as long as it isn't executable code
    (which, rightly or wrongly, is not the position taken by Debian).
    [...]

    A related place where this becomes fuzzy is when projects extract
    metadata from revision control or otherwise assemble real files
    based on relationships between commits. Projects I work on set
    version information from Git tags, by parsing footers from commit
    messages, and counting commits in what is basically their `make
    dist` process. They may also build ChangeLog files from commit
    messages, assemble AUTHORS files referred to in their copyright
    license from commit data, build release notes by associating the
    introduction of independent stub files with specific commits
    appearing in different branches and tags, and so on. Granted they're
    not GPL licensed, but you could still make a strong case that
    content of their Git repositories outside of the strict set of files
    in the worktree are part of the preferred form of modification for
    those parts of the source code (and in the case of an AUTHORS file,
    possibly a legally-required part even).

    For those projects, we upstream maintainers understand that
    downstream distributions want to include source code and can't
    necessarily include full copies of our Git repositories, so we
    create and cryptographically sign source code tarballs with all that extracted/assembled metadata in the form of "generated" files, and
    present those as our primary source distributions.
    --
    Jeremy Stanley

    -----BEGIN PGP SIGNATURE-----

    iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmYRcAlfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCk55Q/8DaTN1SaJdxbtQ8geQ+mx1KRd6GJ3r3SvXI7ILZxDW813JWFGZL9GiKk1 AiyzTgrVqw4pLY8ZAc/k4RagHrzbN62WDWtcCgQhJlDqV+O9oD+7XG6eR4qVtNAY m28YUltd4X6sBOE6JusNzgW1XI3H0nNhEppCL8pNLudERH4I967t/mGCnmFN9nvI G8zB15RFRYaGq5FQrwp2byC19giJ1qkHgcS5dk0Zgj6aiwuH1Qk0cxbWFnsffipH iSBgxnBltSRr5Q0pTLP/vCxvSn3wV8f7+s7nhQfFDAMV5RKXJ3j8drTeQXqDDhFp vxSwGjQwtRVPsrCi4i1WOwzFuO9/VnKmvPBPa0+f4cg/NkXEUWC66yZuRn6dJ8/G FQ0ScMURt7PjzgW5sMBYFtcToDGNYe3ObOvwiSnjsbtJokckUP+dFzCceXd3AXLv G40me30xj7dNzXaGbw6eNdu9jLaje8Y2VKsDdfzewP5E2rvNMQ8xASm6FP8BYBOk iUffvLQgLFa8a4Xtm0vnaVQUy+a8ev1LAFJKQWQNqldo5oG3jf2QRAWCEmkMpaGr d1RlWEt2DYMDvCtgDTwpIDyfTjU7t4BMHYvu6ZYhFsrxX6liSOZ/SWSj3U8/qEJP vca9h9j0BBLkzTnjSMRjxc/hVQ/cnAWqq/eirB03ZEHsiaIBfpI=
    =RQLO
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32
  • From Simon McVittie@21:1/5 to kpcyrd on Sat Apr 6 17:40:01 2024
    On Sat, 06 Apr 2024 at 15:54:51 +0200, kpcyrd wrote:
    On 4/6/24 1:42 PM, Adrian Bunk wrote:
    You cannot simply proclaim that some git tree is the preferred form of modification without shipping said git tree in our ftp archive.

    If your claim was true, then Debian and downstreams would be violating licences like the GPL by not providing the preferred form of modification in the archive.

    I'm obviously not a lawyer, but I do think this is the case. Quoting from GPL-3.0:

    The ā€œsource codeā€ for a work means the preferred form of the work for making modifications to it. ā€œObject codeā€ means any non-source form of a
    work.

    autotools pre-processed source code is clearly not "the preferred form of
    the work for making modifications", which is specifically what I'm saying Debian shouldn't consider a "source code input" either, to eliminate this vector for underhanded tampering that Jia Tan has used.

    If we can force a future Jia Tan to commit their backdoor into git (for everybody to see) I consider this a win.

    I think maybe different people in this thread are talking about different things, and talking past each other as a result. There are two questions
    about what is the preferred form for modification, and I think perhaps not everyone agrees on which question they think they're answering.

    Which files are part of the source tree? ----------------------------------------

    One question is: say you hand-write a file of one format (Autotools configure.ac and *.m4) and preprocess it into another format that, while technically editable, is not what you would genuinely edit unless you
    had no alternative (the Autotools ./configure script). What is acceptable source code for this file?

    Obviously if you don't have configure.ac, then you don't have the complete corresponding source code in the form you would want to use to make
    changes; so I think the answer has to include at least configure.ac, and
    there is an (IMO valid) argument that if configure.ac is missing, then what
    you have does not constitute source code.

    But, it is conventional for Autotools projects to ship the generated ./configure script *as well* (for example this is what `make dist`
    outputs), to allow the project to be compiled on systems that do not
    have the complete Autotools system installed. What we have traditionally
    said is that it's legitimate for the source code of a Debian package to
    include ./configure, as long as it *also* includes configure.ac.

    Indeed, if upstream does ship generated files in addition to the actual
    source code, we have traditionally said that Debian package maintainers "should, except where impossible for legal reasons, preserve the entire building and portability infrastructure provided by the upstream author" (<https://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.en.html#repackaged-upstream-source>),
    It is legitimate to ask whether that rule's value exceeds its cost, or
    whether the value of deleting generated files and forcing them to be regenerated, as a "nothing up my sleeve" mechanism to make it harder
    for a future Jia Tan being able to sneak malicious things in via the
    `make dist` tarball, would be higher - but right now, we normally do
    ship both the source and the generated file, and I'm not aware of anyone claiming that that makes the result non-GPL-compliant.

    It's also relatively common for Autotools projects' `make dist` tarballs
    to omit some files that are part of the upstream git tree, such as
    VCS files like .gitignore, and ancillary/non-essential files like the configuration for Github Actions, Gitlab CI or equivalent. I think that's
    a valid thing to do (as long as they are not the source code for something
    in the dist tarball!) - and in fact omitting them reduces the number of
    files that a packager needs to review, therefore improving our chances of detecting the next backdoored module.

    So I think you're both partly right: we should insist on having the
    source code for every file we distribute as source, and in some ways it
    would make review easier if we deleted all files that are not source code
    (or even all files that are not required for our distro), but I don't
    agree that it is *necessarily* necessary for our source code archive to
    be identical to the upstream git tree.

    Note that I'm using "tree" as the git jargon term here: approximately "something that you could pack into a `git archive` tarball, losslessly".
    To go beyond that, we move on to the other question I can see here:

    Which commits are part of the source code? ------------------------------------------

    Another question about the source code is whether it is sufficient to take
    a snapshot of the current state of the git tree (again, tree as jargon term) and say that it is the preferred form for modification, or whether complete corresponding source code should be understood to mean its complete git
    history going back to the beginning of the project (in git jargon, a series
    of commits going back to one without a parent, rather than a tree).

    I think that Guillem, and maybe Adrian too, whether rightly or wrongly, understood you to be claiming that a single snapshot (git tree or `git
    archive` output) is not enough, and the history is also required - and
    it's that assertion, which you might not have intended to be making,
    that they are pushing back most strongly against? (Or perhaps I'm misunderstanding.)

    If that's what is happening, then I agree with them.

    Demanding that we ship the full history is clearly not what was meant by
    the authors of the GPL. That surely can't be what the GPL was intended
    to mean, because at the time it was written, public VCSs were rare, and
    the GNU system was developed via a "cathedral" approach with a small
    number of authors writing software privately and releasing it to the
    world as a series of tarballs. It seems obvious to me that they wouldn't
    have written the license to require more a comprehensive version of
    "what is source?" than what they themselves were releasing.

    Demanding the fully history is also not really practical for a Free
    Software distribution, because a non-trivial project's history is inconveniently large, and over a long enough timescale it's relatively
    likely that someone has committed (and perhaps subsequently deleted)
    something that does not qualify as Free Software - either accidentally, or because they were assuming that it's OK to include non-Free documentation, artwork, test data or whatever, as long as it isn't executable code
    (which, rightly or wrongly, is not the position taken by Debian).

    Another practical concern is that Debian already has a legal review
    bottleneck: the time and effort needed for maintainers and the archive administrators to check that the entire source release contains only Free Software under an acceptable license is significant, and it's a major
    limiting factor on how much software we can ship. If we expanded the
    source release from "the source code as of today" to "all versions of the source code up to and including today", in projects with a non-trivial
    history that would dramatically increase the amount of time and effort
    that needs to be spent on review. As a result of this concern, the
    archive administrators have specifically disallowed the use of source
    package formats that contain history: only a moment-in-time snapshot
    (the equivalent of a git tree, not a series of git commits) is allowed.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sean Whitton@21:1/5 to Guillem Jover on Sun Apr 7 09:50:02 2024
    Hello,

    On Sat 06 Apr 2024 at 02:24pm +02, Guillem Jover wrote:

    Hi!

    On Sat, 2024-04-06 at 19:13:22 +0800, Sean Whitton wrote:
    On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:
    Right now the preferred form of source in Debian is an upstream-signed
    release tarball, NOT anything from git.

    The preferred form of modification is not simply up for proclamation.
    Our practices, which are focused around git, make it the case that
    salsa & dgit in some combination are the preferred form for modification
    for most packages.

    People keep bringing this up, and it keeps making no sense. I've
    covered this over the years in:

    https://lists.debian.org/debian-devel/2014/03/msg00330.html
    https://lists.debian.org/debian-project/2019/07/msg00180.html

    (There's in addition the part that Adrian covers in another reply.)

    I understand this point of view. The situation is not clear.
    But it is at least plausible that for some projects, the git history is
    part of the preferred form for modification. It is certainly not always
    true.

    I think that this point is largely academic, however. We are doing a disservice to our users if they have to go hunting beyond Debian
    services to find the upstream git history, because they'll likely want
    it if they indeed do want to modify packages installed on their system.
    Our own git histories of packaging changes aren't enough. So we should
    be hosting both, on some combination of salsa and dgit-repos.

    --
    Sean Whitton

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmYSTpEZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQHB3D/4qeWo1x1ihsRtKnle7J896 IaPYFwHckSFVBctN2B2jRde7uGXqWFsKq1WVx4Cc3YZAjgIwbRb1TeM3mcG8rcyH ZNSKxSyQpgodvvgHZfQJWuKHC7416A/BWcD2AvXIejqeQrHYE1kCFRki5lOrMpLo Fb4vloTsKkBxN0liOlK6jPdkGpkXTrMkvlxijJ2IqVPTVYZ7i8rn63E88VfBNABs u5p4Oea2MGV3ohxJY4woh9IoaSDCiyitxT1eXAIS5vQj9c7hvF/V0DrucmaX02U2 mgmTF5ds7zPGjjhm6Fbwl8fnobu7mtFbkpZzAqZDqp3IQY5t7OkSlSQHgrWCyn9R FMSlQaIUVUEaWzA6XqE/bYfoReQuzGWooa2TIDl3uQovC/2s9xO73T2GvU8lf24/ 01U+/Ckw9n/IhNGyOZ84b6G2pQ09xug1+jRmcN0qMafTl0yQmpAGIql75HGvWh6z zC7aMPSZ41idN97+29aBeXUdzF14DqZhO+rVYVG9xc+aNPKarImQiw3fyHzAWv2X 3kNrrMdGreBLBFIO4MDudjDrj5pQVCdjIu/FXIEfWpIJTYV46mL9o0N+xMSP2yTX uUPEORKvb4lb2IGa89PEVu2dUedmTryuHB1dDd5Mvk1S70vs3hzHVxk6xASvyT2Z p4ohxl/4WFsyJyD/agRvOg==TY5D
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Us
  • From Sean Whitton@21:1/5 to Adrian Bunk on Sun Apr 7 09:50:01 2024
    Hello,

    On Sat 06 Apr 2024 at 02:42pm +03, Adrian Bunk wrote:

    On Sat, Apr 06, 2024 at 07:13:22PM +0800, Sean Whitton wrote:
    Hello,

    On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:


    Right now the preferred form of source in Debian is an upstream-signed
    release tarball, NOT anything from git.

    The preferred form of modification is not simply up for proclamation.
    Our practices, which are focused around git, make it the case that
    salsa & dgit in some combination are the preferred form for modification
    for most packages.

    You cannot simply proclaim that some git tree is the preferred form of modification without shipping said git tree in our ftp archive.

    If your claim was true, then Debian and downstreams would be violating licences like the GPL by not providing the preferred form of modification
    in the archive.

    Well, maybe we are! Or maybe we're not when we publish those histories
    on salsa and/or dgit-repos.

    It also seems important to note that this is project-specific. Whether
    the git history is part of the preferred form of modification depends on
    the project's practices and content.

    I don't have a settled opinion on what we should be doing. But what I
    am sure about is that the preferred form for modification is determined
    by the content of the project, and we can't change what the preferred
    form for modification actually is just by choosing what exactly we
    publish.

    --
    Sean Whitton

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmYSTd4ZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQCwiEACI2XUhsoz/bHu6cROdC23l z+ndZP4UMc7AzZMOb0xno83cd8sX1MjMW9SaJOIUHoLVOL5jstgIf5sVtigXFvKj eRvhRNFAox/jW8Z3KHdGWKUiPjgG3pBOECwEVSxJ4THK+rTnh/9aicKndEIAIGmc /+045F295ugDH12C/uTyr6h6pkka9uiARPQqZ0SPjbSwVm8i+BW21A8/2bJJZLjU Q2Fp0wjA27DWzW0CvVVpdiiz9axe0SyQm14PKcnyVnbjox8sWlZxazIMPeNaQuiZ 4k6PYl9deY1Xcx80ipuHe+Jxjh1tL1uJZnGhdLMDe+tUi9FIJT7AJkeNCUmLRlPa h8nZcG6R+o+5fzvjQO/cpd4CISWWcsTTgngs0OCmoEUsFUM+yJ/6jiuhFdS6zhkg zpbV2sSI+nMvItXgVnB1dYlRvjpUPWk36cYAK20fkrkfVXO0GFSgS5I7BeHjiahb rhCxZBJ/G+7IFbB+ubf1u/H1YHYl70oBExsDi9carf0WxQTDgBkGaRfG2bSE8mN+ 9iAgD8CnZQYAkDSgStzwX0Zkra7KutjGAEH++SjlE1KifE0OM1XsTgTMRcirYNtE 0ELG9XCdPXq/MZzzR2dG0Jby0vjSHvUPggjP3rZHXRP3TX83XQ22yVqKGoLKq6hr voV3FUzIXBo07up5GRH4ow==bmVW
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Us
  • From Theodore Ts'o@21:1/5 to Simon McVittie on Thu Apr 11 16:30:02 2024
    On Sat, Apr 06, 2024 at 04:30:44PM +0100, Simon McVittie wrote:

    But, it is conventional for Autotools projects to ship the generated ./configure script *as well* (for example this is what `make dist`
    outputs), to allow the project to be compiled on systems that do not
    have the complete Autotools system installed.

    Or, because some upstream maintainers have learned through, long,
    bitter experience that newer versions of autoconf tools may result in
    the generated configure script to be busted (sometimmes subtly), and
    so distrust relying on blind autoreconf always working.

    (For Debian, I always make sure that the upstream configure script for
    autoconf is generated on a Debian testing system, and yes, I have had
    to make adjustments to the "prefferred form of modification" files so
    that the resulting configure script works. For me, it's not that the
    configure file is the preferred form of modification, but rather, the
    preferred form of distriibution.)

    Yes, I realize that the logical follow-on to this is that perhaps we
    should just abandon autotools completely; unfortunately, I'm not quite
    willing to make the assertion, "all the world's Linux and I don't care
    about portability to non-Linux systems" ala the position taken by the
    systemd maintainers --- and for all its faults, autoconf still has
    decades of potability work that is not easy to replace.

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Theodore Ts'o on Thu Apr 11 16:40:01 2024
    On Thu, Apr 11, 2024 at 10:26:55AM -0400, Theodore Ts'o wrote:
    On Sat, Apr 06, 2024 at 04:30:44PM +0100, Simon McVittie wrote:
    But, it is conventional for Autotools projects to ship the generated ./configure script *as well* (for example this is what `make dist` outputs), to allow the project to be compiled on systems that do not
    have the complete Autotools system installed.

    Or, because some upstream maintainers have learned through, long,
    bitter experience that newer versions of autoconf tools may result in
    the generated configure script to be busted (sometimmes subtly), and
    so distrust relying on blind autoreconf always working.

    When was the last time this actually happened to you? I certainly
    remember it being a problem in the early 2.5x days, but it's been well
    over a decade since this actually bit me.

    --
    Colin Watson (he/him) [[email protected]]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Colin Watson on Thu Apr 11 17:00:01 2024
    On Thu, Apr 11, 2024 at 03:37:46PM +0100, Colin Watson wrote:

    When was the last time this actually happened to you? I certainly
    remember it being a problem in the early 2.5x days, but it's been well
    over a decade since this actually bit me.

    I'd have to go through git archives, but I believe the last time was
    when aclocal replaced one of the macros in aclocal.m4, and the updated
    macro was not backwards compatible.

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to G. Branden Robinson on Fri Apr 12 01:20:01 2024
    On Thu, Apr 11, 2024 at 01:27:54PM -0500, G. Branden Robinson wrote:
    At 2024-04-11T15:37:46+0100, Colin Watson wrote:
    On Thu, Apr 11, 2024 at 10:26:55AM -0400, Theodore Ts'o wrote:
    Or, because some upstream maintainers have learned through, long,
    bitter experience that newer versions of autoconf tools may result
    in the generated configure script to be busted (sometimmes subtly),
    and so distrust relying on blind autoreconf always working.

    When was the last time this actually happened to you? I certainly
    remember it being a problem in the early 2.5x days, but it's been well
    over a decade since this actually bit me.
    ^^^^^^^^^^^^^

    A darkly amusing story of this frustration can be found under "Why
    patch?" at <https://invisible-island.net/autoconf/autoconf.html>.

    I mean, sure - as I said, I recall there being problems in the early
    2.5x days - but I will note that the newest release mentioned there was
    over two decades ago. I'm not really interested in relitigating things
    from that long ago at this point.

    --
    Colin Watson (he/him) [[email protected]]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)