• Bug#1105019: sbuild: source.changes includes binary build info

    From Guillem Jover@1:229/2 to Holger Levsen on Tue May 13 12:10:02 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    Hi!

    On Tue, 2025-05-13 at 06:33:49 +0000, Holger Levsen wrote:
    On Tue, May 13, 2025 at 02:28:57AM +0200, Guillem Jover wrote:
    With the .buildinfo support introduction, one current requirement is
    that any .changes file includes at least one .buildinfo file (so
    there's currently no filtering based on build type, nor any counting
    to not break on potentially old tooling).

    can't we change this requirement? .buildinfo files for _source.changes
    don't make sense, so we shouldn't create nor distribute them.

    (I think we have discussed this in the past. :)

    If someone uses dpkg-buildpackage, then build dependencies need to be
    satisfied (even for source-only builds), where code gets executed from
    the package itself (clean targets etc), so this is also a build that
    generates an upload represented in a .changes file. Those can also
    affect source package generation, so I still think it does make sense
    that they generate a .buildinfo file. I also think reproducible source
    packages are an important thing that we already have (at least tooling
    wise), which I'd rather not regress support on.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Holger Levsen@1:229/2 to Guillem Jover on Tue May 13 08:40:01 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    On Tue, May 13, 2025 at 02:28:57AM +0200, Guillem Jover wrote:
    With the .buildinfo support introduction, one current requirement is
    that any .changes file includes at least one .buildinfo file (so
    there's currently no filtering based on build type, nor any counting
    to not break on potentially old tooling).

    can't we change this requirement? .buildinfo files for _source.changes
    don't make sense, so we shouldn't create nor distribute them.


    --
    cheers,
    Holger

    ⢀⣴⠾⠻⢶⣦⠀
    ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
    ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
    ⠈⠳⣄

    „Nicht Hitler, Göring, Goebbels und Himmler haben mich verschleppt und
    geschlagen. Nein! Es war der Schuster, der Nachbar, der Greisler, der
    Milchmann, der Postmann, der eine Uniform bekommen hat, eine Binde (..),
    und dann waren sie die ‚Herrenrasse‘.“ Karl Stojka (1931 - 2003)

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmgi580ACgkQCRq4Vgaa qhwz9A//SYN2Nf/p0r6bJ4wuywpIge6C5JpwQDSNi25D/IFFVHcAAoRw5NrLk04k Z/Nexz+aqUYsRMdaYhKXX8EUSXXXuN74lZh2NoeoEsAlruDHts3EijmyZCddzEtQ NqvlRlgFNrdydcSYhqRb0Dr16RxPSR65VesAW+m2tr4FZzIwXrdEYeCNxu8SpEV1 iexTIjn7YCjKwQ61vH/4t4Drixv48ZzOwHnRwPfU0ie86k9B4x1JIofY9ZPLK7Ly ys4H0kHwzhAET0iqJ+qdO+qm01gDcS5XBZ7pDltn7vmfPJyYoDi/hTwsOC9PJzws uK0dk3ZxV6i5yu/UTz84ueWYsstsNV3/wCRSpWF24ZRMjU26/TKD/tG3mD0eGV1N CF4tpa5fcjNXYTKjp
  • From Guillem Jover@1:229/2 to Raphael Hertzog on Tue May 13 02:40:02 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    Hi!

    On Mon, 2025-05-12 at 10:38:08 +0200, Raphael Hertzog wrote:
    Control: reassign -1 dpkg-dev
    Control: found -1 1.21.22

    On Sat, 10 May 2025, Johannes Schauer Marin Rodrigues wrote:
    6. Observe the presence of the buildinfo in the resulting source.changes:
    grep buildinfo ../*_source.changes

    This is something that dpkg-genchanges does. Sbuild runs this:

    dpkg-genchanges --build=source

    And that will include a reference to the buildinfo if before running this command, the package was already built. If you run "debian/rules clean" before
    running above command, then the resulting .changes file will *not* contain a
    reference to the buildinfo.

    If you think that this is a bug, you should re-assign this to dpkg.

    Thanks, doing so as I ran your reproducer and I confirm that the binary buildinfo is there.

    $ grep buildinfo hello_2.10-3_source.changes
    807907dc2f97b2a089e6b05e697eaf157f57767b 5813 hello_2.10-3_amd64.buildinfo
    15285c44b8509fb01454f419171170f9e6468c866df5e59e65036bc9cd35062c 5813 hello_2.10-3_amd64.buildinfo
    7fd5124a2508e44589040f769a152da1 5813 devel optional hello_2.10-3_amd64.buildinfo

    Guillem, the issue reported here is that running "dpkg-genchanges --build=source" in a freshly built tree will include the _<arch>.buildinfo from the source+binary build run just before, whereas a source-only build will properly generate a .changes that references a _source.buildinfo.

    This introduces a difference in what's generated between users of "sbuild --source-only-changes" and "sbuild --source --no-arch-any --no-arch-all" (or plain dpkg-buildpackage -S).

    With the .buildinfo support introduction, one current requirement is
    that any .changes file includes at least one .buildinfo file (so
    there's currently no filtering based on build type, nor any counting
    to not break on potentially old tooling).

    From the grep above, I'm assuming there's no reference to a
    *_source.buildinfo file in debian/files, so that means one will not
    get included in the generated .changes file (as I think would be
    expected if one performs a source-only or source+binary build?).

    I guess what I see here is potentially a problem with how the build is
    being driven by sbuild or whatever else is driving it, where there's
    at least a missing source build phase (dpkg-buildpackage --build=source),
    or parts of it (dpkg-genbuildinfo --build=source).

    But then it could be argued that there's potentially a problem with dpkg-genchanges where it should track how many .buildinfo files are being distributed, and probably warn (or error out?) if none are, and then if
    it has seen a .buildinfo matching the current --build mode, then ignore
    other .buildinfo files not matching it. Although this would break
    source-only uploads performed as full builds, which was added explicitly
    to support that use case when source-only uploads support got added in
    Debian. For example I routinely prepare all my uploads with:

    $ dpkg-buildpackage --changes-option=-S

    Because I want the artifacts I built to be recorded as part of the
    .buildinfo. But if what you really want is a pure source-only upload,
    then I think that's what you should be asking your build driver, the
    equivalent of:

    $ dpkg-buildpackage --build=full
    $ dpkg-buildpackage --build=source

    So adding such filtering would break that use case above. I'd need to
    think how to support such filtering, but right now I'm not seeing it.

    (Unless I've completely misunderstood the bug report, as I don't think
    I've ever really used sbuild, where my main interactions with it are
    via code reading and to support it as part of dpkg. :)

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Holger Levsen@1:229/2 to Guillem Jover on Tue May 13 13:20:01 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    hi Guillem,

    On Tue, May 13, 2025 at 12:02:54PM +0200, Guillem Jover wrote:
    can't we change this requirement? .buildinfo files for _source.changes don't make sense, so we shouldn't create nor distribute them.
    (I think we have discussed this in the past. :)

    indeed! :)

    If someone uses dpkg-buildpackage, then build dependencies need to be satisfied (even for source-only builds), where code gets executed from
    the package itself (clean targets etc), so this is also a build that generates an upload represented in a .changes file.

    point taken. (not sure whether I previously that it this way, thus I'd rather say so now.)

    Those can also
    affect source package generation, so I still think it does make sense
    that they generate a .buildinfo file. I also think reproducible source packages are an important thing that we already have (at least tooling
    wise), which I'd rather not regress support on.

    actually we don't have reproducible source packages and last time we looked (which argueingly is 10 years ago) it didnt seem feasible *and* we didn't
    see a compelling reason to have them either.

    why do you think they are important?


    --
    cheers,
    Holger

    ⢀⣴⠾⠻⢶⣦⠀
    ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
    ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
    ⠈⠳⣄

    its crazy that civilization is ending in like 20-30 years and were just here working jobs. (@RobDenBleyker)

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmgjKJ0ACgkQCRq4Vgaa qhxgIA//bfE/vqZ+2Vx6460ErYAiB3ebYBNxf79J9wGMtPliJD40nTcwKBj/+avJ PKdsWk4/J3257EI5RQYKzo0xUGEk0Vupv7hvd60ngNJn0GTiohJGlpj6x68JiQj/ hnS82XuGYpXyP2f8YIYUsMVm6o9bV31uW1RiucSJSeJGfXSoE8oK/ojxobj2Kv7k /6evDy/UjLsix0PDQBfw7ol7GUmyc7EjxraDDdBqnKbrs5AVlkrC73aJtS7RBniR Il4D81Lhh2+iAhDasJ8Pj0lu4QakayY7Y32AUPidBmLTUegwr/L1nFrNZfyuEK/Z zQhL39FGdCBcmhTL4UF9Qqipzb1TboCSgCvaBXbYonzrxvPM1tX2eIK7JprFo8US JMhMHgIvvTfUKTmudkNa3T41h69bdqSFxCcrDImewbkJhnd22GP5miluK8vgyyPq hHVlQUFuCPnR1FXzgqj0rr+hhKK6uNSpv1FfDO5qQj+obmTZ1glQQtxtp03evRqr 5oVvsh2AmmKZgmcLUGlMovf55f9wSoEAzTckeFKU+joBZ4cjbYIIiovDvgNqTfsg
    rb/M
  • From Guillem Jover@1:229/2 to Holger Levsen on Tue May 13 14:30:01 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    Hi!

    On Tue, 2025-05-13 at 11:10:25 +0000, Holger Levsen wrote:
    On Tue, May 13, 2025 at 12:02:54PM +0200, Guillem Jover wrote:
    Those can also
    affect source package generation, so I still think it does make sense
    that they generate a .buildinfo file. I also think reproducible source packages are an important thing that we already have (at least tooling wise), which I'd rather not regress support on.

    actually we don't have reproducible source packages and last time we looked (which argueingly is 10 years ago) it didnt seem feasible *and* we didn't
    see a compelling reason to have them either.

    We have had reproducible source packages (barring OpenPGP signatures in
    the .dsc files) since pretty much the same time dpkg-deb gained support
    for reproducible binary packages. See these commits I found (I don't
    recall whether there's been need for anything else more recently):

    https://git.dpkg.org/cgit/dpkg/dpkg.git/commit/?id=d959233560317459336d39197f515c2042472762
    https://git.dpkg.org/cgit/dpkg/dpkg.git/commit/?id=66a12fb8b22f13bb89dd59bf13db2fb939d3de87
    https://git.dpkg.org/cgit/dpkg/dpkg.git/commit/?id=6c32c76ba20b641e14fc1533cecb3ca674850a90

    why do you think they are important?

    For QA alone this seems important (test suites for example), but in a
    security context, to me this seems like a rather important part TBH,
    the foundation on which binary package reproducibility is sitting. More
    so in scenarios such as the xz attack for example. Reviewing diffoscope differences is very helpful, but in the end we need to review and modify
    the sources, from which the binaries get derived. :)

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Holger Levsen@1:229/2 to Guillem Jover on Tue May 13 15:10:01 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    On Tue, May 13, 2025 at 02:24:38PM +0200, Guillem Jover wrote:
    We have had reproducible source packages (barring OpenPGP signatures in
    the .dsc files) since pretty much the same time dpkg-deb gained support

    have you actually tried that?

    why do you think they are important?
    For QA alone this seems important (test suites for example), but in a security context, to me this seems like a rather important part TBH,
    the foundation on which binary package reproducibility is sitting. More
    so in scenarios such as the xz attack for example. Reviewing diffoscope differences is very helpful, but in the end we need to review and modify
    the sources, from which the binaries get derived. :)

    obviously I agree that being able to reproduce the content would be nice, however in our tests years ago, not even that was possible, yet alone
    bit by bit (thus including timestamps).

    I guess someone would need to actually investigate some hundred packages
    today, to see how things are really today.


    --
    cheers,
    Holger

    ⢀⣴⠾⠻⢶⣦⠀
    ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
    ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
    ⠈⠳⣄

    Life may not be the party we hoped for, but while we're here we might as well dance!

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmgjQfYACgkQCRq4Vgaa qhwCvw/+M8/Nslq78ONji/YbqFZ1/CDCfsVpXbcjQW2+TmSpjmzCZ2HQoxKkfJlQ HedgmNOlfH5ZNTyyhTsjnZZ6zKcg/HkqWOR2wNbIsXaTkzxYL+XGgj4VV/hTfoeq IBDxYQyJ1enmLdPApeG3rPABtL36O36mY7FGcpNSbjKvuA+EUvJLyINRQ77eq2F6 guabkuQXAQa9yW8PXVNyJSsbNsGcbhDe0xf3wRptSU3aWhmsOifOYRmix5mqflvi s724W15Zd9lr7mxVBo/ItIoiqnuJmA1BjNgpczIHVWjTZuj9jm+nxJDhcESDC8md NQltq+LrMWd0unj/j4HaED6v/U+8o9e4Ix8/v9SfvdroBmJgiSBHw3Oil6HoQXjB YAvJ/S1fCf4e/XfedWjT9+yuA1K1kd+3M/jVKdlG9bP6msdtm9SMYIt23fuXomwo s+oVcPbraPfu54U3KdaX/U1IyPMEQA7BzYHxzTTIMClvCkBnzbnsKjCh4qc3AaMG 4OSUW2TA/PDfkfKh9CYjCAhuu/zP6ymCnOj6LCyROUNLNmYs8AjfySG/NPPOA1SC G0yQdXsHkw4HHp/rRvy9EegqDtg
  • From Guillem Jover@1:229/2 to Holger Levsen on Wed May 14 11:10:01 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    Hi!

    On Tue, 2025-05-13 at 12:58:30 +0000, Holger Levsen wrote:
    On Tue, May 13, 2025 at 02:24:38PM +0200, Guillem Jover wrote:
    We have had reproducible source packages (barring OpenPGP signatures in
    the .dsc files) since pretty much the same time dpkg-deb gained support

    have you actually tried that?

    Sure, I'd like to assume at the time this got implemented :), and also
    as part of every dpkg release:

    https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/build-aux/gen-release#n147

    Also ISTM that reproducibility of source packages is easier to proof
    (at least from the toolchain PoV), than for binary packages, because
    most of the generation is driven by the toolchain itself (as seen from
    the commits I referenced in dpkg). The only variable and/or potentially problematic part is the «debian/rules clean» and whether it has side
    effects that could affect that generation.

    A current test could be something like:

    ,---
    $ apt source dpkg
    $ sq verify --cleartext dpkg_1.22.18.dsc | head -n-1 > dpkg-orig.dsc
    $ cd dpkg-1.22.18
    $ dpkg-buildpackage -us -uc -S
    $ cd ..
    $ diff -u dpkg-orig.dsc dpkg_1.22.18.dsc && echo reproduced source
    reproduced source
    `---

    why do you think they are important?

    For QA alone this seems important (test suites for example), but in a security context, to me this seems like a rather important part TBH,
    the foundation on which binary package reproducibility is sitting. More
    so in scenarios such as the xz attack for example. Reviewing diffoscope differences is very helpful, but in the end we need to review and modify the sources, from which the binaries get derived. :)

    obviously I agree that being able to reproduce the content would be nice, however in our tests years ago, not even that was possible, yet alone
    bit by bit (thus including timestamps).

    If you recall the specifics, I'd be curious to hear them!

    I guess someone would need to actually investigate some hundred packages today, to see how things are really today.

    Perhaps my statements were sloppy though. When I said reproducible, I
    meant that the toolchain can produce them, assuming the source package
    itself does not get in the way via «debian/rules clean». I didn't mean
    we have 100% coverage on the Debian archive for example, where as you
    point out we (well someone :) would need to practically check whether
    that's the case. My assumption is that most would do, but I think it's realistic to expect that we might find a number of packages were
    «debian/rules clean» affects the source generation.

    I think whether we can reproduce the same source after a full build
    (so the equivalent of a twice in a row build) might perhaps be more
    challenging (and I'd expect less reproducibility there), but for a
    single download source + full build, we are only concerned about the
    «clean» target, as the source generation is performed as the first
    thing.

    OTOH, I think the current reproducible infra has probably all the
    data, and it might just be a matter of checking whether the unsigned
    *.dsc (from build-a and build-b) match? :)

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)
  • From Holger Levsen@1:229/2 to Guillem Jover on Fri May 16 14:10:02 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    hi,

    On Wed, May 14, 2025 at 10:56:41AM +0200, Guillem Jover wrote:
    Sure, I'd like to assume at the time this got implemented :), and also
    as part of every dpkg release:
    https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/build-aux/gen-release#n147

    oh nice!

    I guess someone would need to actually investigate some hundred packages today, to see how things are really today.
    Perhaps my statements were sloppy though. When I said reproducible, I
    meant that the toolchain can produce them, assuming the source package
    itself does not get in the way via «debian/rules clean». I didn't mean
    we have 100% coverage on the Debian archive for example, where as you
    point out we (well someone :) would need to practically check whether
    that's the case. My assumption is that most would do, but I think it's realistic to expect that we might find a number of packages were «debian/rules clean» affects the source generation.

    I've just checked devscripts and developers-reference, and much to my
    surprise their source packages indeed built bit by bit identical:

    $ diffoscope p1/developers-reference_13.19_source.changes p2/developers-reference_13.19_source.changes
    --- p1/developers-reference_13.19_source.changes
    +++ p2/developers-reference_13.19_source.changes
    ├── Files
    │ @@ -1,4 +1,4 @@

    │ 6c2a48c479ecd9d4710b64549f8ef44a 1644 doc optional developers-reference_13.19.dsc
    │ 283e1516834500ab48daf62c74714af2 575920 doc optional developers-reference_13.19.tar.xz
    │ - 3afde36f59e56164068ad521f11bc60a 6057 doc optional developers-reference_13.19_source.buildinfo
    │ + e3d438ba597ef522c68b9a730a7b32d4 6057 doc optional developers-reference_13.19_source.buildinfo
    ├── developers-reference_13.19_source.buildinfo
    │ ├── Build-Date
    │ │ @@ -1 +1 @@
    │ │ -Fri, 16 May 2025 11:54:47 +0000
    │ │ +Fri, 16 May 2025 11:55:12 +0000


    I think whether we can reproduce the same source after a full build
    (so the equivalent of a twice in a row build) might perhaps be more challenging (and I'd expect less reproducibility there),

    yes, me too, but that's not how source packages are build for real. :)

    but for a
    single d
  • From Helmut Grohne@1:229/2 to Roberto C. Sanchez on Tue Jul 1 20:10:01 2025
    XPost: linux.debian.bugs.dist
    From: [email protected]

    Hi Roberto and others,

    On Fri, May 09, 2025 at 09:39:25PM -0400, Roberto C. Sanchez wrote:
    Steps to reproduce:

    1. configure backports repo
    2. install git-buildpackage: apt-get install git-buildpackage
    3. install sbuild: apt-get install -t bookworm-backports sbuild
    4. create an appropriate sbuild environment:
    mkdir -p ~/.cache/sbuild
    mmdebstrap --verbose --mode=unshare --architecture="$(dpkg --print-architecture)" --variant=apt --hook-dir=/usr/share/mmdebstrap/hooks/maybe-merged-usr bookworm ~/.cache/sbuild/bookworm-$(dpkg --print-architecture).tar.zst /etc/apt/sources.list
    /bin/echo -e '$chroot_mode="unshare";\n$clean_source=0;\n1;' > ~/.sbuildrc 4. clone an arbitrary package from Salsa: git clone https://salsa.debian.org/debian/krb5
    5. Execute the build:
    cd krb5
    git branch upstream origin/upstream
    git branch pristine-tar origin/pristine-tar
    git checkout -b bookworm origin/bookworm
    gbp buildpackage --git-builder='sbuild -d bookworm --no-run-lintian --source-only-changes' --git-debian-branch=bookworm
    6. Observe the presence of the buildinfo in the resulting source.changes:
    grep buildinfo ../*_source.changes

    It seems that the bug report consensus is that you see the inclusion of
    the .buildinfo file in the _source.changes as a bug. What you see as a
    bug, I see as a feature. I may perform a source-only upload and still
    convey that I reproducibly built the package.

    Can you or someone else elaborate on why you see the inclusion of the .buildinfo file as a problem?

    If there was some functional change removing the .buildinfo from a
    source upload, I'd be inclined to report the reverse bug in the absence
    of such reasoning.

    On the flip side, the inclusion of the buildinfo does pose downstream
    problems. We may now be faced with multiple .buildinfo files with the
    same filename and different content. As such, we may not just store them
    next to each other, but that's how mergechanges and reprepro operate
    leading to problems in their use and Debusine also encountered the issue
    with mergechanges on its own.

    Originally, reproducible builds would include a granular timestamp in
    the filename and you may see evidence for that in e.g. https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles#Example. That timestamp was replace with package_version_architecture in the dpkg implementation. Depending on why others see the inclusion of a binary .buildinfo as a problem, maybe using an unreproducible name would solve
    some of those problems?

    I note that we may influence the default filename:

    dpkg-genbuildinfo -O../othername.buildinfo
    dpkg-buildpackage --buildinfo-file=../othername.buildinfo
    sbuild --debbuildopt=--buildinfo-file=../othername.buildinfo

    In principle, sbuild could decide to choose a default that differs from
    dpkg's. Not sure we want that.

    In asked Guillem for references to the choice in filenames and he kindly provided
    * https://lists.debian.org/debian-dpkg/2016/11/msg00056.html
    Guillem argues that it would be difficult discovering the right
    buildinfo file and that there could be several of them from earlier
    builds.
    * https://git.dpkg.org/cgit/dpkg/dpkg.git/commit/?id=d5005e4576bcf9b341e83cfb8647d5f96438642f
    The commit argues that there is no point in using unique filename.

    We now do have reasons to choose unique buildinfo filenames (i.e. there
    exist tools that assume them to be unique).

    For now, I'd like to understand why the inclusion of the buildinfo is
    seen as a problem and how others see the idea of changing the buildinfo filename as a means to avoid collisions.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: you cannot sedate... all the things you hate (1:229/2)