2. Set the entire environment to the environment specified in buildinfo
when doing a reproducible build. I think this is conceptually the
simplest, but it means that we should make every tool that builds
official Debian packages use the same environment variable logic so
that the buildinfo file completely captures the environment (without
leaking random, inappropriate things into buildinfo). It also means
effectively giving up on debian/rules build being a path for making
a
reproducible build, since we don't have control over that
environment,
but I think it will be hard to make that work anyway.
I personally lean towards 2, which is consistent with what's in Policy
right now, but I can see definite merits in 3. I believe the reproducible builds project is currently sort of doing 1, but I have a hard time seeing how to make that viable on the testing side.
On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:
I personally lean towards 2, which is consistent with what's in Policy
right now, but I can see definite merits in 3. I believe the
reproducible builds project is currently sort of doing 1, but I have a
hard time seeing how to make that viable on the testing side.
Thanks for raising this question, Russ!
I'm not sure that we should let lack of exhaustive testing push us away
from (1). (1) is in principle the right thing -- it's easy to make a
build reproducible if we tell people that they have to do exactly one specific thing. But we generally want people to be able to run
heterogenous systems, and not to force them into one particular
environment.
Consider someone who wants to see more logging from a build, for
example. There could be an environment variable that encourages the toolchain to log more, but doesn't affect the binary objects created by
the build. By going with choices (2) or (3) we effectively dismiss even considering the reproducibility of those builds, which seems like a
shame.
Does everything in policy need to be rigorously testable? or is it ok
to have Policy state the desired outcome even if we don't know how (or
don't have the resources) to test it fully today.
On 2017-09-18, Russ Allbery wrote:
Daniel Kahn Gillmor <[email protected]> writes:
On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:
Does everything in policy need to be rigorously testable? or is it ok
to have Policy state the desired outcome even if we don't know how (or
don't have the resources) to test it fully today.
I don't think everything has to be rigorously testable, but I do think
it's a useful canary. If I can't test something, I start wondering
whether that means I have problems with my underlying assumptions.
In particular, for (1), we have no comprehensive list of environment
variables that affect the behavior of tools, and that list would be
difficult to create. Many pieces of software add their own environment
variables with little coordination, and many of those variables could
possibly affect tool output.
There is a huge difference between variables that *might* affect the
build as an unintended input that gets stored in a resulting packages in
some manner, and variables that are designed to change the behavior of
parts of the build toolchain.
I consider unintended variables that affect the build output a bug, and variables designed and intended to change the behavior of the toolchain expected, reasonable behavior.
Daniel Kahn Gillmor <[email protected]> writes:
On Sun 2017-09-17 16:26:25 -0700, Russ Allbery wrote:
I personally lean towards 2, which is consistent with what's in Policy
right now, but I can see definite merits in 3. I believe the
reproducible builds project is currently sort of doing 1, but I have a
hard time seeing how to make that viable on the testing side.
Thanks for raising this question, Russ!
I'm not sure that we should let lack of exhaustive testing push us away
from (1). (1) is in principle the right thing -- it's easy to make a
build reproducible if we tell people that they have to do exactly one
specific thing. But we generally want people to be able to run
heterogenous systems, and not to force them into one particular
environment.
Well... I would argue that the amount of time and effort that's gone into this project shows that it's not that easy to make a build reproducible
even when telling people to do exactly one thing. :) But I get your
point.
Does everything in policy need to be rigorously testable? or is it ok
to have Policy state the desired outcome even if we don't know how (or
don't have the resources) to test it fully today.
I don't think everything has to be rigorously testable, but I do think
it's a useful canary. If I can't test something, I start wondering
whether that means I have problems with my underlying assumptions.
In particular, for (1), we have no comprehensive list of environment variables that affect the behavior of tools, and that list would be
difficult to create. Many pieces of software add their own environment variables with little coordination, and many of those variables could possibly affect tool output.
I feel like the work for (1) and for (3) ends up being comparable; for (1)
we have to maintain a blacklist, and for (3) we have to maintain a
whitelist. But (3) is testable, whereas (1) is inherently aspirational
and will always have to be aspirational. We're endlessly going to be discovering some other environment variable that changes tool output.
I'm also unsure that (1) is even what we want to claim. Do we really want
to say that builds are always reproducible if you don't change this short list of environment variables, no matter whatever other environment
variables you set?
There's some appeal in this for the end user, but it
feels very frustrating for the package maintainer. At first glance, as a package maintainer, I'd think I'd have to maintain a huge blacklist of environment variables that I've discovered affect my toolchain somewhere,
and explicitly unset them all in debian/rules. This doesn't feel like a
good use of anyone's time (and may actually *break* other, non-reproducibility-related things that people want to do with my
package).
On Mon, 18 Sep 2017 at 18:00:51 -0700, Vagrant Cascadian wrote:
[..]
I consider unintended variables that affect the build output a bug, and
variables designed and intended to change the behavior of the toolchain
expected, reasonable behavior.
There is a *huge* number of variables that are intended to change
behaviour, and may or may not affect the behaviour of this specific
package. Which of your categories are these in?
For example, basically any well-behaved programming language or programming-language-like environment has an equivalent of PYTHONPATH, PERL5LIB, PKG_CONFIG_PATH and similar variables, [..]
Similarly, there is an intractably huge number of environment variables
that can affect the result of Automake and make. Do you know about all
of them? Including RM, PC, AR, LOADLIBES (and those are just for make's implicit rules)? [..]
I think the assumption has to be that every environment variable is potentially intended to affect the build unless otherwise stated [..]
[..] It would be most useful if we were to identify a
restricted subset of environment variables for which there is consensus
that the variable is meant to be merely user preference and shouldn't
affect the build [..]
Perhaps those variables should be a whitelist, or perhaps there is
some wording for Policy that would identify them while excluding the legitimately build-affecting ones - but either way I think the
assumption should be "there is a limited subset of environment
variables that are required to preserve reproducibility when varied,
and the rest are uninteresting".
I would suggest amending:non-build programs to affect their behaviour. Explicitly, this excludes TERM, HOME, LOGNAME, USER [..]
- a set of environment variable values; and
+ a set of reserved environment variable values; and
then later:
+ A "reserved" environment variable is defined as DEB_*, DPKG_*, SOURCE_DATE_EPOCH, BUILD_PATH_PREFIX_MAP, variables listed by dpkg-buildflags and other variables explicitly used by buildsystems to affect build output, excluding any variables used by
some other variables are used by non-build tools, such as LC_*, USER, etc. Since they affect non-build programs, they possibly may be set in a developer's normal environment, so just running "debian/rules build" will pick these up. Then, the buildshould stay the same despite these other variables.
[..]
(The last time I erroneously included PATH in the final "excluded" list - because we have varied PATH but in a really trivial way on tests.r-b.org for ages - but I now agree with you that we shouldn't expect reproducibility when PATH is varied.)
There is a huge difference between variables that *might* affect the
build as an unintended input that gets stored in a resulting packages in
some manner, and variables that are designed to change the behavior of
parts of the build toolchain.
I consider unintended variables that affect the build output a bug, and variables designed and intended to change the behavior of the toolchain expected, reasonable behavior.
In practice, for the vast majority of packages in Debian, it is a
relatively small number of environment variables to get fairly solid reproducibility coverage... at least from what we've seen so far.
[..] It does mean that discovery of any new
such environment variable would require a change to our whitelist in
approach (3), so there would be some lag and the whitelist would become
long over time (with a corresponding testing load). But (3) does try to achieve that use case without trying to anticipate any possible
environment variable setting. It lets us be reactive to newly-discovered environment variables across which we want to stay reproducible.
Does everything in policy need to be rigorously testable? or is it ok
to have Policy state the desired outcome even if we don't know how (or
don't have the resources) to test it fully today.
I don't think everything has to be rigorously testable, but I do think
it's a useful canary. If I can't test something, I start wondering
whether that means I have problems with my underlying assumptions.
[..]
[..]ends up on the blacklist then they would also have to fix their own package to be invariant under that envvar.
OTOH, developer reproducibility checkers (such as reprotest) can be a little bit more strict. I can imagine something like:
- reprotest runs 3 builds:
- build 0 with current env
- build 1 with current env + varying some "blacklist" envvars
- build 2 with current env + varying some "non-whitelist" envvars
If there are differences between build 1 and build 2, then reprotest reports "unexpected envvar $XXX affected the build" and the developer can then either submit it for inclusion on the "whitelist" or the "blacklist" based on the Policy wording. If it
So over time, this way we can build up a blacklist and a whitelist. But it shouldn't be in the original policy. And I don't think what I suggested above is a particularly disruptive or surprising process, especially since the "public" builders wouldonly do the "looser" interpretation so people aren't bothered by bogus "unreproducible" reports.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (3 / 13) |
| Uptime: | 43:15:56 |
| Calls: | 12,111 |
| Calls today: | 2 |
| Files: | 15,008 |
| Messages: | 6,518,439 |