All, (going out to both debian-devel and bug-gnulib, please be
respectful of each community's different perspectives and trim Cc
when focus shifts to any Debian or gnulib specific topics)
(please pardon the accidental duplicate post to bug-gnulib...)
The content of upstream source code releases can largely be categorized
into 1) the actual native source-code from the upstream supplier, 2) pre-generated artifacts from build tools (e.g., ./configure script) and
3) third-party maintained source code (e.g., config.guess or getopt.c).
The files in 3) may be referred to as "vendoring". The habit of
including vendored and pre-generated artifacts is a powerful and
successful way to make release tarballs usable for users, going back to
the 1980's. This habit pose some challenges for packaging though:
1) Pre-generated files (e.g., ./configure) should be re-generated to
make sure the package is built from source code, and to allow patches
on the toolchain used to generate the pre-generated files to have any
effect. Otherwise we risk using pre-generated files created using
non-free or non-public tools, which if I understand correctly against
Debian main policy. Verifying that this happens for all
pre-generated files in an upstream tarball is complicated, fragile
and tedious work. I think it is simple to find examples of mistakes
in this area even for important top-popcon Debian packages. The
current approach of running autoreconf -fi is based on a
misunderstanding: autoreconf -fi is documented to not replace certain
files with newer versions:
https://lists.nongnu.org/archive/html/bug-gnulib/2024-04/msg00052.html
2) If a security problem in vendored code is discovered, the security
team may have to patch 50+ packages if the vendor origin is popular.
Maybe even different versions of the same vendored code has to be
patched.
3) Auditing the difference between the tarball and what is stored in
upstream version control system (VCS) is challenging. The xz
incident exploited the fact that some pre-generated files aren't
included in upstream VCS. Some upstream react to this by adding all
pre-generated artifacts to VCS -- OpenSSH seems to take the route of
adding the generated ./configure script to git, which moves that file
from 3) to 1) but the problem is remaining.
4) Auditing for license compliance is challenging, since not only do we
have to audit all upstream's code but we also have to audit the
license of pre-generated files and vendored source-code.
There are probably more problems involved, and probably better ways to articulate the problems than what I managed to do above. The Go and
Rust ecosystems solve some of these issues, which has other consequences
for packaging. We have largely ignored that the same challenges apply
to many C packages, and I'm focusing on those that uses gnulib --
https://www.gnu.org/software/gnulib/ -- gzip, tar, grep, m4, sed, bison,
awk, coreutils, grub, libiconv, libtasn1, libidn2, inetutils, etc:
https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=users.txt
Solving all of the problems for all packages will require some work and
will take time. I've started to see if we can make progress on the gnulib-related packages. I'm speaking as contributor to gnulib and
maintainer of a couple of Debian packages, but still learning to
navigate -- the purpose of this post is to describe what I've done for
libntlm and ask for feddback to hopefully make this into a re-usable
pattern that can be applied to more packages. It would be great to
improve collaboration on these topics between GNU and Debian.
So let's turn this post into a recipe for Debian maintainers of packages
that use gnulib to follow for their packages. I'm assuming git for now
on, but feel free to mentally s/git/$VCS/.
The first step is to establish an upstream tarball that you want to work
with. There are too many opinions floating around on this to make any
single solution a pre-requisite so here are the different approaches I
can identify, ordered by my own preference, and the considerations with
each.
1) Use upstream's PGP signed git-archive tarball.
See my recent blog posts for this new approach. The key property
here is that there is no need to audit difference between upstream
tarball and upstream git.
https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/
https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/
2) Use upstream's PGP signed tarball.
This is the current most common and recommended approach, as far as I
know.
3) Create a PGP signed git-archive tarball.
If upstream doesn't publish PGP signed tarballs, or if there is a
preference from upstream or from you as Debian package maintainer to
not do 1) or 2), then create a minimal source-only copy of the git
archive and sign it yourself. Could be done something like this:
git clone
https://git.savannah.gnu.org/git/inetutils.git
cd inetutils/
git archive --prefix=inetutils-v2.5/ -o inetutils-2.5-src.tar.gz v2.5
# additional filtering of tarball may go here
gpg -b inetutils-2.5-src.tar.gz
This is your new upstream tarball. To build this particular one, use
./bootstrap --no-git --gnulib-srcdir=/usr/share/gnulib.
4) Use upstream's git-archive tarball and PGP sign it.
Download it using the GitHub or GitLab download link on the git tag
like the cool kids. If you did this on a sunny day, the downloaded
tarball should be identical to the git-archive tarball and you can
sign it if you are comfortable with this.
5) Use upstream's git-archive tarball.
For those who want to join the really cool kids club.
6) Use upstream's tarball without PGP signature.
This is quite common today. It happens when upstream doesn't publish
PGP signatures or the Debian maintainer doesn't care about them.
Regardless of mechanism, you should end up with a tarball that we call
the "upstream tarball". Which approach is chosen is subjective and up
to the Debian package maintainer. people have different opinions.
While I can't hide my own preferences I think we have to acknowledge
that there is no single uniform answer here.
To reach our goals in the beginning of this post, this upstream tarball
has to be filtered to remove all pre-generated artifacts and vendored
code. Use some mechanism, like the debian/copyright Files-Excluded
mechanism to remove them. If you used a git-archive upstream tarball,
chances are higher that you won't have to do a lot of work especially
for pre-generated scripts.
This filtered tarball will be the *.orig.tar.gz used to build the Debian package.
Ideally you would like for the *.orig.tar.gz tarball to be as close as
possible to upstream's git repository for the tag release, minus any pre-generated scripts or vendored gnulib files that upstream put into
git. For collaborative upstreams, you could try to convince them to not
put pre-generated scripts and vendored gnulib files into git.
Auditing the upstream tarball to the *.orig.tar.gz should be simple, use sha256sum or diffoscope to compare content. In some ideal world this
could be bit-by-bit identical. I'm hoping this can be the new best
recommended approach going forward. This is only possible when upstream
agree with these concerns, and make an effort to publish such minimized source-only tarballs. This may be a pipe dream, just like Debian's
current best recommended approach for upstream PGP signed tarballs are sometimes ignored.
You will now be faced with the challenge of building this tarball. Your existing debian/rules makefile will not work any more since it assumed
the existance of the pre-generated scripts and vendored gnulib files.
So you have to add the required tools as Build-Depends: and update the debian/rules to build everything from source code.
For libntlm the essential diff between version 1.7-1, that used upstream tarball with pre-generated content and gnulib code, and latest version
1.8-3 that builds from a minimal source-only tarball is small:
--- a/debian/control
+++ b/debian/control
@@ -6,6 +6,8 @@ Uploaders:
Simon Josefsson <
[email protected]>,
Build-Depends:
debhelper-compat (= 13),
+ git,
+ gnulib (>= 20240412~dfb7117+stable202401.20240408~aa0aa87-3~),
Standards-Version: 4.6.2
Section: libs
Homepage:
https://www.nongnu.org/libntlm/
--- a/debian/rules
+++ b/debian/rules
@@ -1,6 +1,16 @@
#! /usr/bin/make -f
+include /usr/share/gnulib/debian/gnulib-dpkg.mk
+
export DEB_BUILD_MAINT_OPTIONS = hardening=+all
%:
- dh $@ --builddirectory=build -X.la
+ dh $@ --without autoreconf --builddirectory=build
+
+pull:
+ ./bootstrap --gnulib-srcdir=$(GNULIB_DEB_DEBIAN_GNULIB) --pull
+
+gen:
+ ./bootstrap --gnulib-srcdir=$(GNULIB_DEB_DEBIAN_GNULIB) --gen
+
+execute_before_dh_auto_configure: dh_gnulib_clone pull dh_gnulib_patch gen
As you can see the essential part is to add a Build-Depends on the
gnulib Debian package to get the necessary gnulib code for building. We
also disable dh_aut