In this work, limitations with --chroot-mode=unshare became apparent and that lead to Johannes, Jochen and me sitting down in Berlin pondering ideas on how to improve the situation. That is a longer story, but eventually Timo Röhling
asked the innocuous question of why we cannot just use schroot and make it work with namespaces.
In this work, limitations with --chroot-mode=unshare became apparent and
that lead to Johannes, Jochen and me sitting down in Berlin pondering
ideas on how to improve the situation. That is a longer story, but
eventually Timo R�hling asked the innocuous question of why we cannot
just use schroot and make it work with namespaces.
There are two approaches to
managing an ephemeral build container using namespaces. In one approach,
we create a directory hierarchy of a container root filesystem and for
each command and hook that we invoke there, we create new namespaces on demand. In particular, there are no background processes when nothing is running in that container and all that remains is its directory
hierarchy. Such a container session can easily survive a reboot (unless stored on tmpfs). Both sbuild --chroot-mode=unshare and unschroot.py
follow this approach. For comparison, schroot sets up mounts (e.g /proc)
when it begins a session and cleans them up when it ends. No such
persistent mounts exist in either sbuild --chroot-mode=unshare or unschroot.py.
While podman
and docker allow running unprivileged application containers, they still require privileged containers when you want to run systemd-as-pid-1.
Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
should avoid if we can.
At the moment, rootless Podman would seem like the obvious choice. As far
as I'm aware, it has the same user namespaces requirements as the unshare backends in mmdebstrap, autopkgtest and schroot (user namespaces enabled, setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in /etc/subgid).
Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
Debian we would want to do the latter, at least initially, to avoid
being forced to either trust an external registry like hub.docker.com
or operate our own.
Here's the Dockerfile/Containerfile to turn a sysroot tarball into an
OCI image (obviously it can be extended with LABELs and other
customizations, but this is fairly close to minimal):
In this work, limitations with --chroot-mode=unshare became apparent and that lead to Johannes, Jochen and me sitting down in Berlin pondering
ideas on how to improve the situation. That is a longer story, but eventually Timo Röhling asked the innocuous question of why we cannot
just use schroot and make it work with namespaces.
I have to ask:
Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
should avoid if we can.
On Tue, 25 Jun 2024 at 10:16:20 +0200, Helmut Grohne wrote:
In this work, limitations with --chroot-mode=unshare became apparent and that lead to Johannes, Jochen and me sitting down in Berlin pondering
ideas on how to improve the situation. That is a longer story, but eventually Timo R�hling asked the innocuous question of why we cannot
just use schroot and make it work with namespaces.
I have to ask:
Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
should avoid if we can.
At the moment, rootless Podman would seem like the obvious choice. As far
as I'm aware, it has the same user namespaces requirements as the unshare backends in mmdebstrap, autopkgtest and schroot (user namespaces enabled, setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in /etc/subgid).
Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
Debian we would want to do the latter, at least initially, to avoid
being forced to either trust an external registry like hub.docker.com
or operate our own.
Persisting a container root filesystem between multiple operations comes
with some serious correctness issues if there are "hooks" that can modify
it destructively on each operation: see <https://bugs.debian.org/499014>
and <https://bugs.debian.org/994836>. As a result of that, I think the
only model that should be used in new systems is to have some concept of
a session (like schroot type=file, but unlike schroot type=directory)
so that those "hooks" only run once, on session creation, preventing
them from arbitrarily reverting/overwriting changes that are subsequently made by packages installed into the chroot/container (for example dbus' creation of the messagebus uid/gid in #499014, and exim4's creation of Debian-exim in #994836).
Simon McVittie <[email protected]> writes:
Persisting a container root filesystem between multiple operations comes with some serious correctness issues if there are "hooks" that can modify it destructively on each operation: see <https://bugs.debian.org/499014> and <https://bugs.debian.org/994836>. As a result of that, I think the
only model that should be used in new systems is to have some concept of
a session (like schroot type=file, but unlike schroot type=directory)
so that those "hooks" only run once, on session creation, preventing
them from arbitrarily reverting/overwriting changes that are subsequently made by packages installed into the chroot/container (for example dbus' creation of the messagebus uid/gid in #499014, and exim4's creation of Debian-exim in #994836).
I'm not entirely sure that I'm following the nuances of this discussion,
so this may be irrelevant, but I think type=btrfs-snapshot provides the
ideal properties for container file systems. This unfortunately require
file system support and therefore cannot be used unless you've already embraced a file system with subvolumes, but if you have, you get all of
the speed of a persistent container root file system with none of the correctness issues, because you get a fresh (and almost instant) clone of
a canonical root file system that is discarded after each build.
I use that in combination with a cron job to update the source subvolume daily to ensure that it's fully patched.
Unfortunately, there's no way that we can rely on this, but it would be
nice to continue to support it for those who are using a supported
underlying file system already.
Simon McVittie <[email protected]> writes:
I think the
only model that should be used in new systems is to have some concept of
a session (like schroot type=file, but unlike schroot type=directory)
I'm not entirely sure that I'm following the nuances of this discussion,
so this may be irrelevant, but I think type=btrfs-snapshot provides the
ideal properties for container file systems.
Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
should avoid if we can.
At the moment, rootless Podman would seem like the obvious choice. As far
as I'm aware, it has the same user namespaces requirements as the unshare backends in mmdebstrap, autopkgtest and schroot (user namespaces enabled, setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in /etc/subgid).
Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
Debian we would want to do the latter, at least initially, to avoid
being forced to either trust an external registry like hub.docker.com
or operate our own.
podman is also supported as a backend by autopkgtest-virt-podman, Toolbx (podman-toolbox in Debian) and distrobox. autopkgtest's autopkgtest-build-podman does not yet support starting from a tarball
as described above, but it easily could (contributions welcome).
Or, if Podman is too "not invented here" for Debian's use, using rootless lxd/Incus is another option - although that introduces a dependency
on projects and formats that are rarely used outside the Debian/Ubuntu bubble, which risks them becoming another schroot (and also requires us to decide whether we follow Canonical's lxd or the community fork Incus post-fork, which could get somewhat political).
There are two approaches to
managing an ephemeral build container using namespaces. In one approach,
we create a directory hierarchy of a container root filesystem and for
each command and hook that we invoke there, we create new namespaces on demand. In particular, there are no background processes when nothing is running in that container and all that remains is its directory
hierarchy. Such a container session can easily survive a reboot (unless stored on tmpfs). Both sbuild --chroot-mode=unshare and unschroot.py
follow this approach. For comparison, schroot sets up mounts (e.g /proc) when it begins a session and cleans them up when it ends. No such persistent mounts exist in either sbuild --chroot-mode=unshare or unschroot.py.
Persisting a container root filesystem between multiple operations comes
with some serious correctness issues if there are "hooks" that can modify
it destructively on each operation: see <https://bugs.debian.org/499014>
and <https://bugs.debian.org/994836>. As a result of that, I think the
only model that should be used in new systems is to have some concept of
a session (like schroot type=file, but unlike schroot type=directory)
so that those "hooks" only run once, on session creation, preventing
them from arbitrarily reverting/overwriting changes that are subsequently made by packages installed into the chroot/container (for example dbus' creation of the messagebus uid/gid in #499014, and exim4's creation of Debian-exim in #994836).
I don't know whether creating new namespaces multiple times (but without running external integration hooks the second and subsequent times)
will also lead to practical problems, but I note that outside the Debian bubble, everything that enters a new container environment seems to
operate by creating a process that encapsulates the container, and then either letting it run to completion interactively or non-interactively (`docker run`, etc.), or letting it run in the background (perhaps with
an init system or `sleep infinity` as its "payload" process) and then repeatedly injecting code into that pre-existing namespace
(either `docker exec`, etc., or something like ssh).
autopkgtest's Docker, Podman, lxc, lxd backends all operate by creating
a namespaced init or sleep process with `docker run` or equivalent, and
then injecting subsequent commands into the namespace that was created
for that long-running process with `docker exec` or equivalent.
I think unshare is the outlier here, and I think it would be good to
consider whether it really needs to be.
The more like other container managers a new container manager is, the
less likely it is to break reasonable expectations in future, like
schroot regularly does.
While podman
and docker allow running unprivileged application containers, they still require privileged containers when you want to run systemd-as-pid-1.
What do you mean by "privileged containers" exactly? Do you mean a system service that runs with CAP_SYS_ADMIN and other scary privileges in the
init namespace, like the typical use of dockerd, or are you also counting uses of the setuid newuidmap as being privileged?
If you are happy to use the setuid newuidmap (which I believe the unshare backends for schroot, mmdebstrap, autopkgtest also rely on) then my understanding is that "rootless" podman is essentially equivalent:
you need a setuid newuidmap, a range of 65536 uids in /etc/subuid,
a range of 65536 gids in /etc/subgid, and a kernel that will allow unprivileged users to create new user namespaces, but beyond that there
are no special privileges required.
Please see /usr/share/doc/podman/README.Debian for details of what it needs.
For systemd-as-pid-1 specifically,
`autopkgtest-build-podman --init=systemd` and
`autopkgtest-virt-podman --init` demonstrate how this can be done, and
last time I tried, it was possible to run them unprivileged (other than needing access to the setuid newuidmap, as above). systemd is able to
detect that it's running in a container and turn off functionality like
udev that would only be appropriate in a VM or on bare metal, and podman knows how to tell systemd that it should do this.
I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that ISame. So I implemented overlayfs support in pbuilder:
get a fast and persistent base, independent of the underlying filesystem, with fresh instances per session. (You can access the base via the source:<id> names.) I never liked the type=file stuff, as it's slow to
setup and maintain.
Guillem Jover <[email protected]> writes:
I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that I get
a fast and persistent base, independent of the underlying filesystem,
with fresh instances per session. (You can access the base via the source:<id> names.) I never liked the type=file stuff, as it's slow to setup and maintain.
Ah, thank you, I didn't realize that existed. That sounds like a nice generalization of the file system snapshot approach.
I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that I get
a fast and persistent base, independent of the underlying filesystem,
with fresh instances per session. (You can access the base via the source:<id> names.) I never liked the type=file stuff, as it's slow to
setup and maintain.
Ah, thank you, I didn't realize that existed. That sounds like a nice generalization of the file system snapshot approach.
Ah, thank you, I didn't realize that existed. That sounds like a nice
generalization of the file system snapshot approach.
I think that this how the
sbuild-debian-developer-setup
script, setup chroots
For systemd-as-pid-1 specifically,
`autopkgtest-build-podman --init=systemd` and
`autopkgtest-virt-podman --init` demonstrate how this can be done, and
last time I tried, it was possible to run them unprivileged (other than
needing access to the setuid newuidmap, as above). systemd is able to
detect that it's running in a container and turn off functionality like
udev that would only be appropriate in a VM or on bare metal, and podman
knows how to tell systemd that it should do this.
This is very cool. Running autopkgtests in system containers without
being root (or incus-admin) very much is what I'd like to do. And it's
much better if I don't have to write my own container framework for
doing it. I couldn't get it to work locally yet (facing non-obvious
error messages).
Would someone be able to document (mail/wiki/blog/...) how to set up and
use podman for running autopkgtests.
I have to ask:
Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? [...]
Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
Debian we would want to do the latter, at least initially, to avoid
being forced to either trust an external registry like hub.docker.com
or operate our own.
I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that I
get a fast and persistent base, independent of the underlying filesystem, with fresh instances per session.
You can access the base via the source:<id> names
lxd/incus also was on my list,
but my understanding is that they do not work without their system
services at all
and being able to operate containers (i.e. being incus-admin or the
like) roughly becomes equivalent to being full root on the system
defeating the purpose of the exercise.
At least for me, building container images locally is a requirement. I
have no interest in using a container registry.
lxd/incus also was on my list, but my understanding is that they do not
work without their system services at all and being able to operate containers (i.e. being incus-admin or the like) roughly becomes
equivalent to being full root on the system defeating the purpose of the exercise.
I guess you understood my explanation differently than it was meant.
While the container is persisted into the filesystem, this is being done
for each package build individually. sbuild --chroot-mode=unshare and unschroot use a tarball as their source and opening the session amounts
to extracting it. At the end of the session, the tree is disposed. The session concept of schroot is being reused in unschroot and it very much behaves like a type=file chroot except that you can begin a session,
reboot and continue using it until you end it without requiring a system service to recover your sessions during boot.
The main difference to how everyone else does this is that in a typical sbuild interaction it will create a new user namespace for every single command run as part of the session. sbuild issues tens of commands
before launching dpkg-buildpackage and each of them creates new
namespaces in the Linux kernel (all of them using the same uid mappings, performing the same bind mounts and so on). The most common way to think
of containers is different: You create those namespaces once and reuse
the same namespace kernel objects for multiple commands part of the same session (e.g. installation of build dependencies and dpkg-buildpackage).
There two ways of
interacting with containers that use one set of namespaces for their
entire existence. One is setting up some IPC mechanism and receiving
commands to be run inside (for instance spawning a shell and piping
commands into it or driving the container via ssh) or an external
process joins (setns) the existing container (namespaces) and injects
code into it (docker exec). That latter approach has a history of vulnerabilities closely related to vulnerabilities in setuid binaries, because we are transitioning a process (and all of its context) from
outside the container into it and thus expose all of its context (memory maps, open file descriptors and so on) to contained processes. As such,
I think that an approach based on an IPC mechanism should be preferred.
I am not sure whether podman exec operates in this way, but a quick codesearch did not exhibit obvious uses of setns inside the podman
source code. Would anyone be able to tell how podman exec is
implemented here?
I think you really need one more non-trivial (but very commonly
available) privilege. You need a cgroup manager (such as systemd) that
allows creating and delegating a cgroup hierarchy to you.
Would someone be able to document (mail/wiki/blog/...) how to set up and
use podman for running autopkgtests. Thus far, I failed to figure out
how to plug a local Debian mirror (as opposed to a container registry)
into autopkgtest-build-podman. It is quite difficult to locate podman documentation that is applicable under the assumption that you don't
want to use any container registry.
We learned that sbuild --chroot-mode=unshare and unschroot spawn
a new set of namespaces for every command. What you point out as a
limitation also is a feature. Technically, it is a lie that the
namespaces are always constructed in the same way. During installation
of build depends the network namespace is not unshared while package
builds commonly use an unshared network namespace with no interfaces but
the loopback interface.
I have to ask:
Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
should avoid if we can.
Here's the Dockerfile/Containerfile to turn a sysroot tarball into an
OCI image (obviously it can be extended with LABELs and other
customizations, but this is fairly close to minimal):
FROM scratch
ADD sysroot.tar.gz /
CMD ["/bin/bash"]
On Tue, 25 Jun 2024 at 18:55:45 +0200, Helmut Grohne wrote:
The main difference to how everyone else does this is that in a typical sbuild interaction it will create a new user namespace for every single command run as part of the session. sbuild issues tens of commands
before launching dpkg-buildpackage and each of them creates new
namespaces in the Linux kernel (all of them using the same uid mappings, performing the same bind mounts and so on). The most common way to think
of containers is different: You create those namespaces once and reuse
the same namespace kernel objects for multiple commands part of the same session (e.g. installation of build dependencies and dpkg-buildpackage).
Yes. My concern here is that there might be non-obvious reasons why
everyone else is doing this the other way, which could lead to behavioural differences between unschroot and all the others that will come back to
bite us later.
For whole-system containers running an OS image from init upwards,
or for virtual machines, using ssh as the IPC mechanism seems
pragmatic. Recent versions of systemd can even be given a ssh public
key via the systemd.system-credentials(7) mechanism (e.g. on the kernel command line) to set it up to be accepted for root logins, which avoids needing to do this setup in cloud-init, autopkgtest's setup-testbed,
or similar.
For "application" containers like the ones you would presumably want
to be using for sbuild, presumably something non-ssh is desirable.
If you build an image by importing a tarball that you have built in
whatever way you prefer, minimally something like this:
$ cat > Dockerfile <<EOF
FROM scratch
ADD minbase.tar.gz /
EOF
$ podman build -f Dockerfile -t local-debian:sid .
then you should be able to use localhost/local-debian:sid
as a substitute for debian:sid in the examples given in autopkgtest-virt-podman(1), either using it as-is for testing:
$ autopkgtest -U hello*.dsc -- podman localhost/local-debian:sid
or making an image that has been pre-prepared with some essentials like dpkg-source, and testing in that:
$ autopkgtest-build-podman --image localhost/local-debian:sid
...
Successfully tagged localhost/autopkgtest/localhost/local-debian:sid
$ autopkgtest hello*.dsc -- podman autopkgtest/localhost/local-debian:sid
(tests run)
Adding a mode for "start from this pre-prepared minbase tarball" to allstubborn people like me will happily go the extra mile.
of the autopkgtest-build-* tools (so that they don't all need to know
how to run debootstrap/mmdebstrap from first principles, and then duplicate the necessary options to make it do the right thing), has been on my
to-do list for literally years. Maybe one day I will get there.
From my point of view, this isn't actually necessary. I expect that many people would be fine drawing images from a container registry. Those
We could certainly also benefit from some syntactic sugar to make the automatic choice of an image name for localhost/* podman images nicer,
with fewer repetitions of localhost/.
podman is unlikely to provide you with a way to generate a minbase
tarball without first creating or downloading some sort of container
image in which you can run debootstrap or mmdebstrap, because you have
to be able to start from somewhere. But you can run mmdebstrap unprivileged in unshare mode, so that's enough to get you that starting point.
We learned that sbuild --chroot-mode=unshare and unschroot spawn
a new set of namespaces for every command. What you point out as a limitation also is a feature. Technically, it is a lie that the
namespaces are always constructed in the same way. During installation
of build depends the network namespace is not unshared while package
builds commonly use an unshared network namespace with no interfaces but the loopback interface.
I don't think podman can do this within a single run. It might be feasible
to do the setup (installing build-dependencies) with networking enabled; leave the root filesystem of that container intact; and reuse it as the
root filesystem of the container in which the actual build runs, this time with --network=none?
Or the "install build-dependencies" step (and other setup) could perhaps
even be represented as a `podman build` (with a Dockerfile/Containerfile, FROM the image you had as your starting point), outputting a temporary container image, in which the actual dpkg-buildpackage step can be invoked
by `podman run --network=none --rmi`?
I am concerned about behavioural differences due to the reimplementation
from first principles aspect though. Jochen and Aurelien will know more
here, but I think we had a fair number of ftbfs due to such differences.
None of them was due to the architecture of creating a namespaces for
each command and most of them were due to not having gotten right
containers in general. Some were broken packages such as skipping tests
when detecting schroot.
If we move beyond containers and look into building
inside a VM (e.g. sbuild-qemu) we are in a difficult spot, because we
need e.g. systemd for booting, but we may not want it in our build environment. So long term, I think sbuild will have to differentiate
between three contexts:
* The system it is being run on
* The containment or virtualisation environment used to perform the
build
* The system where the build is being performed inside the containment
or virtualisation environment
I don't quite understand the need for a Dockerfile here. I suspect that
this is the obvious way that works reliably, but my impression was that
using podman import would be easier.
$ autopkgtest -U hello*.dsc -- podman localhost/local-debian:sid
This did not work for me. autopkgtest failed to create a user account.
I am more interested in providing isolation-container though as a number
of tests require that and I currently tend to resort to virt-qemu for
that. Sure enough, adding --init=systemd to autopkgtest-build-podman
just works and a system container can also be used as an application container by autopkgtest (so there is no need to build both), but
running the autopkgtest-virt-qemu --init also fails here in non-obvious
ways. It appears that user creation was successful, but the user
creation script is still printed in red.
Let me pose a possibly stupid suggestion. Much of the time when people interact with autopkgtest, there is a very limited set of backends and backend options people use frequently. Rather than making the options shorter, how about introducing an aliasing mechanism? Say I could have
some ~/.config/autopkgtest.conf and whenever I run autopkgtest ... -- $BACKEND such that there is no autopkgtest-virt-$BACKEND, consult that configuration file and if there the value is assigned, expand it the
assigned value. Then, I can just record my commonly used backends and
options there and refer to them by memorable names of my own liking.
Automatic choice of images makes things more magic, which bears negative aspects as well.
Every time I run a podman container (e.g. when I run
autopkgtest) my ~/.local/share/containers grows. I think autopkgtest
manages to clean up in the end, but e.g. podman run -it ... seems to
leave stuff behind.
Of course, when I skip podman's image management and use --rootfs, I can
side step this problem by choosing my root location on a tmpfs, but
that's not how autopkgtest uses podman.
I don't think podman can do this within a single run. It might be feasible to do the setup (installing build-dependencies) with networking enabled; leave the root filesystem of that container intact; and reuse it as the root filesystem of the container in which the actual build runs, this time with --network=none?
Do I understand correctly that in this variant, you intend to use podman without its image management capabilities and rather just use --rootfs spawning two podman containers on the same --rootfs (one after another)
where the first one installs dependencies and the second one isolates
the network for building?
On Thu, 27 Jun 2024 at 11:46:51 +0200, Helmut Grohne wrote:
I don't quite understand the need for a Dockerfile here. I suspect that this is the obvious way that works reliably, but my impression was that using podman import would be easier.
Honestly, the need for a Dockerfile here is: I already knew how to build containers from a Dockerfile, and I didn't read the documentation for
the lower-level `podman import` because `podman build` can already do
what I needed.
I see this as the same design principle as why we encourage package maintainers to use dh, even when building trivial "toy" packages like
hello, and in preference to implementing debian/rules at a lower level
in trivial cases. To build a non-trivial container with multiple layers, you'll likely need a Dockerfile (or docker-compose, or some similar thing) *anyway*, so a typical user expectation will be to have a Dockerfile, and anyone building a container will likely already have learned the basics
of how to write one; and then we might as well follow the same procedure
in the trivial case, rather than having the trivial case be different and require different knowledge.
Do I understand correctly that in this variant, you intend to use podman without its image management capabilities and rather just use --rootfs spawning two podman containers on the same --rootfs (one after another) where the first one installs dependencies and the second one isolates the network for building?
Maybe that; or maybe use its image management, tell the first podman command not to delete the container's root filesystem (don't use --rm), and then there's probably a way to tell podman to reuse the resulting filesystem
with an additional layer in its overlayfs for the network-isolated run.
Please note that I am far from being an expert on podman or the
"containers" family of libraries that it is based on, and I don't
know everything it is capable of. Because Debian has a lot of pieces
of infrastructure we have built for ourselves from first principles,
I've had to spend time on understanding the finer points of sbuild,
schroot, lxc and so on, so that I can replicate failure modes seen on
the buildds and therefore fix release-critical bugs in the packages that
I've taken responsibility for (and occasionally also try to improve the infrastructure itself, for example #856877 which recently passed its
7th birthday). That comes with an opportunity cost: the time I spent
learning about schroot is time that I didn't spend learning about OCI.
One of the reasons I would like to have fewer Debian-specific pieces in
our stack is so that other Debian developers don't have to do what I
did, and can instead spend their time gaining transferrable knowledge
that will be equally useful inside and outside the Debian bubble (for
example the best ways to use OCI images, and OCI-based tools like
Docker and Podman, which have a lot of overlap in how they are used even though they are rather different behind the scenes).
But, if everybody is so excited about this, where are the sbuild contributors implementing this?
The excitement can probably also
be seen by there existing 13 independent software packages that do "debian package building in docker"
I imagine that one could whip up some kind of wrapper
that is building a container either from a tarball created via mmboostrap or similar
using buildah, have it install all necessary build dependencies, and then use podman to run the actual build
I also briefly started playing with debcraft, which I really like from a usability perspective
I had the idea to build my Debian packages in a clean docker container instead of using cowbuilder etc for some time now.
On Thu, 27 Jun 2024 at 17:26:20 +0200, Johannes Schauer Marin Rodrigues wrote:
But, if everybody is so excited about this, where are the sbuild contributors
implementing this?
I'm sorry, consider it added it to my list. As usual, there's no guarantee that I will get there within my lifetime, but I'll make sure to feel
suitably guilty about my failure to achieve it.
This is clearly not entirely true any more, because if it was, buildds would not be able to use sbuild's unshare backend - so perhaps now is the time to be proposing a sbuild podman backend, and I should probably be writing one instead of replying to this message.
I'm sorry that I have failed to provide a concrete solution to this problem, and I will try to do better in future.
There are lots of options for doing this, some of which are listed in <https://wiki.debian.org/SystemBuildTools#Package_build_tools>.
All of these have the same problem as cowbuilder, pbuilder, and any
other solution that is not sbuild + schroot: it isn't (currently) what
the production Debian buildds use, therefore it is entirely possible
(perhaps even likely, depending on what packages you maintain) that your package will build successfully and pass tests in your own local builder,
but then fail to build or fail tests on the buildds as a result of some
quirk of how schroot sets up its chroots, which is a worse-than-RC bug
making the package unreleasable.
Could you point me to some Debian Bug # or otherwise share examples of
cases when a build succeeded locally but failed on official Debian
builders due to something that is specific for sbuild/schroot?
I have never run in such a situation despite doing Debian packaging
for 10 years with fairly complex C++ software targeting all archs
Debian supports.
"Helmut" == Helmut Grohne <[email protected]> writes:
Could you point me to some Debian Bug # or otherwise share examples of
cases when a build succeeded locally but failed on official Debian
builders due to something that is specific for sbuild/schroot?
"Richard" == Richard Lewis <[email protected]> writes:
At the moment, rootless Podman would seem like the obvious choice. As
far
as I'm aware, it has the same user namespaces requirements as the
unshare
backends in mmdebstrap, autopkgtest and schroot (user namespaces
enabled,
setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in
/etc/subgid).
As a datapoint, I use rootless podman containers extensively both for autopkgtest and as an sbuild backend (though the latter is affected by #1033352 for which I still need to implement a cleaner workaround).
I think the only problem I encountered was a corner case when passing
in
a device into a container: at some point, autopkgtest runs su which
uses
the setgroups() syscall, and group permissions get lost. The solution
was to setup up the proper gidmaps. I documented my findings here [1].
Though this latter issue shouldn't be a problem on buildds, where
devices aren't passed in.
Specifically I'm concerned about what [advocating use of podman]
means for tests and if they
should be able to use unprivileged containers themselves to test things.
Relatedly it'd be great if we actually had a VM in-between us and the build.
How well does this setup nest? I had a lot of trouble trying to run the unshare backend within an unprivileged container as setup by systemd-nspawn
- mostly with device nodes. In the end I had to give up and replaced the container with a full-blown VM. I understand that some of the things compose a little if the submaps are set up correctly, with less IDs allocated to the nested child. Is there a way to make this work properly, or would you always run into setup issues with device nodes at this point?
I'll be honest, I think building a new container backend makes no sense
at all.
There's a lot of work that has gone into systemd-nspawn, podman, docker, crun, runc, and the related ecosystems.
I think an approach that allowed sbuild to actually use a real container backend would be long-term more maintainable and would allow Debian's
DevOps practices to better align with the rest of the world.
I have some work I've been doing in this space which won't be useful to
you because it is not built on top of sbuild.
(Although I'd be happy to share under LGPL-3 for anyone interested.)
But I find that I disagree with the idea of writing a new container
runtime for sbuild so strongly that I can no longer use sbuild for
Debian work, so I started working on my own package building solution.
In terms of constructive feedback:
* I think your intuition that sbuild --chroot=unshare is limiting is
good.
* I would move toward a persistent namespace approach because it is
more similar to broadly used container backends.
* overlayfs/fuse-overlayfs are how the rest of the world is solving
these problems (or snapshots and the like). Directories are kind of a
Debian-specific artifact that I find more and more awward to deal with
as the rest of my work uses containers for CI/CD.
My initial experiments indicate that we're in
for a factor two [slowdown] whereas we could get this down significantly
by using an overlayfs approach that we cannot shoehorn into podman.
podman
upstream insists on CAP_SYS_ADMIN being a no go while systemd upstream insists on CAP_SYS_ADMIN being a requirement
I have reached the
conclusion that doing a persistent namespace requires a background
process and an IPC mechanism. (This requirement rules out podman/docker/crun/runc.)
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (0 / 16) |
| Uptime: | 165:09:18 |
| Calls: | 12,096 |
| Calls today: | 4 |
| Files: | 15,001 |
| Messages: | 6,517,803 |