"Sean" == Sean Whitton <[email protected]> writes:
The ftpmaster team have refused to trust uploads coming from the
tag2upload service. This GR is to override that decision.
Now, we have the following proposal on how to get t2u integrated. Note,
we are not entirely happy with it and do not think this is the best way forward, but given the current situation, it is a way that gets things untangled, and then we see what the future will bring.
So, in short: A t2u uploaded source package should consist of whatever
t2u produces (normal Debian source package) *plus* two additional files.
The first file contains client side generated data, but to *not*
overburden the client, this *only* consists of the output of `git
ls-files --format="%(objectmode) %(objectname) %(path)"` for the tag
that should be uploaded, signed by the DD/DM key - or something
similarly easily generated on client side. Exact format can be hashed
out between t2u people and ftpmaster during implementation.
The second file consists of a shallow git clone of the repository for
the tag that t2u wants to upload, put into an appropriately named
tarball.
Obvious (ha) note: The above mentions a command and stuff, but the exact
way is up to implementation. It can even contain more data if deemed
useful or add a file from t2u with a list what it added/generated, for example. Or - in the future - might be completely replaced with another implementation we all agree on, provided it has similar minimal basic requirements as the thing proposed here. (Summarized as detached
signature from the DD/DM over the tree of the tag, and that tree/tags
data available in a file beside the upload).
Thank you very much for putting this together! I know how hard it is to coordinate a bunch of voices and turn them into something concrete. This
is incredibly helpful and I really appreciate it.
After reading this over and thinking about it for a bit, I had a few questions just to make sure I fully understood the proposal.
Aigars Mahinovs <[email protected]> writes:
Correct me if I'm wrong, but I believe the intention is to have two technically redundant data points saved into the archive:
1) checksums of the contents of the shallow copy git tree in the
maintainer work folder (signed by the maintainer)
2) contents of the shallow copy git tree in the t2u server work folder (signed by t2u)
Oh! Did I misunderstand Joerg's second point entirely? By "the tag that
t2u wants to upload," I assumed that meant the tag the uploader signed or,
in other words, the state of the tree *before* t2u started doing its work that has the uploader signature attached.
On Sun, 30 Jun 2024 at 17:58, Russ Allbery <[email protected]> wrote:
The second file consists of a shallow git clone of the repository for
the tag that t2u wants to upload, put into an appropriately named
tarball.
Just to double check, to make sure I'm not missing some subtlety, it's
intentional that this file contains all of the same information as in
the first file, and the first file is just a subset of this same
information in a different form?
In other words, someone could verify the signature on the Git tag in
this file and then run the git ls-files command on the Git repository
and get exactly the same information as in the first file, so the first
file is technically redundant. I can think of some reasons why you
might want that, but it's a little surprising, so I wanted to make sure
that's intentional.
Correct me if I'm wrong, but I believe the intention is to have two technically redundant data points saved into the archive:
1) checksums of the contents of the shallow copy git tree in the
maintainer work folder (signed by the maintainer)
2) contents of the shallow copy git tree in the t2u server work folder (signed by t2u)
The only suggestion I would have here would be to have the shallow git
clone on the t2u side have a variable depth that is selected so that the commits in the resulting depth are sufficient for the source package construction, like in case of a rebase workflow you'd need to have git history deep enough to include all Debian patches and the last upstream commit.
On Sun, 30 Jun 2024 at 19:28, Russ Allbery <[email protected]> wrote:
Aigars Mahinovs <[email protected]> writes:
Correct me if I'm wrong, but I believe the intention is to have two technically redundant data points saved into the archive:
1) checksums of the contents of the shallow copy git tree in the maintainer work folder (signed by the maintainer)
2) contents of the shallow copy git tree in the t2u server work folder (signed by t2u)
Oh! Did I misunderstand Joerg's second point entirely? By "the tag that t2u wants to upload," I assumed that meant the tag the uploader signed or, in other words, the state of the tree *before* t2u started doing its work that has the uploader signature attached.
I do not see that in either what me or Joerg wrote. And I also don't
see much sense in that.
In contrast, having a tarball of the git state *before* t2u starts its
work would provide a tarball that *can* be verified against the
checksums from the first file. That will give you a clear data point -
t2u started its work with the exactly the same workspace as the
maintainer signed. And will provide a frozen copy of that starting
workspace in the archive independent of the (more complex) dgit
service.
In contrast, having a tarball of the git state *before* t2u starts its
work would provide a tarball that *can* be verified against the
checksums from the first file. That will give you a clear data point -
t2u started its work with the exactly the same workspace as the
maintainer signed. And will provide a frozen copy of that starting
workspace in the archive independent of the (more complex) dgit service.
The only suggestion I would have here would be to have the shallow git
clone on the t2u side have a variable depth that is selected so that
the commits in the resulting depth are sufficient for the source
package construction, like in case of a rebase workflow you'd need to
have git history deep enough to include all Debian patches and the
last upstream commit.
On Sunday, June 30, 2024 1:45:15 PM EDT Aigars Mahinovs wrote:
On Sun, 30 Jun 2024 at 19:28, Russ Allbery <[email protected]> wrote:
Aigars Mahinovs <[email protected]> writes:
Correct me if I'm wrong, but I believe the intention is to have two technically redundant data points saved into the archive:
1) checksums of the contents of the shallow copy git tree in the maintainer work folder (signed by the maintainer)
2) contents of the shallow copy git tree in the t2u server work folder (signed by t2u)
Oh! Did I misunderstand Joerg's second point entirely? By "the tag that t2u wants to upload," I assumed that meant the tag the uploader signed or,
in other words, the state of the tree *before* t2u started doing its work that has the uploader signature attached.
I do not see that in either what me or Joerg wrote. And I also don't
see much sense in that.
In contrast, having a tarball of the git state *before* t2u starts its
work would provide a tarball that *can* be verified against the
checksums from the first file. That will give you a clear data point -
t2u started its work with the exactly the same workspace as the
maintainer signed. And will provide a frozen copy of that starting workspace in the archive independent of the (more complex) dgit
service.
It's one at the point the maintainer signed the tag.
So, in short: A t2u uploaded source package should consist of
whatever
t2u produces (normal Debian source package) *plus* two additional
files.
The first file contains client side generated data, but to *not*You describe the contents here, but not the semantics, and I'm not
overburden the client, this *only* consists of the output of `git
ls-files --format="%(objectmode) %(objectname) %(path)"` for the tag
that should be uploaded, signed by the DD/DM key - or something
similarly easily generated on client side. Exact format can be hashed
out between t2u people and ftpmaster during implementation.
sure
that I fully understand what the intended semantics are (in other
words,
what packages dak will accept under this proposal). Would source
packages
that contain additional files not represented in this list of files
and
hashes be accepted, for example?
Here's one specific concrete example for one workflow using tag2upload
(there are other variations): suppose I, as the uploader, tag a Git
tree
that is patches-applied with no patches in debian/patches/*. I run
the
above command and include that signed data somewhere where tag2upload
can
get at it via the Git tag. tag2upload then turns that into a 3.0
(quilt)
package and uploads that package along with the information as
requested,
including that list of files and hashes that I signed from the tree
that I
tagged.
When dak sees the package, all of the files in debian/* in the source
package will have the same hashes as in the git ls-files output, but
the
source package will have additional files in debian/patches/* that do
not
exist in the git ls-files output. Some of the upstream files will
have
hashes in the git ls-files output that match the contents of those
files
after unpacking the source package (and thus applying patches), but
will
not match the hashes of those files as they exist in the upstream orig.tar.gz.
In this proposal, would dak be willing to accept such a package?
The case of a repository that contains only the debian/* files poses
another set of complications, but I don't think we have to get into
that
immediately. The above examples are probably enough to work through
to
understand what the intended semantics of this manifest is.
The second file consists of a shallow git clone of the repository forJust to double check, to make sure I'm not missing some subtlety, it's intentional that this file contains all of the same information as in
the tag that t2u wants to upload, put into an appropriately named
tarball.
the
first file, and the first file is just a subset of this same
information
in a different form?
In other words, someone could verify the signature on the Git tag in
this
file and then run the git ls-files command on the Git repository and
get
exactly the same information as in the first file, so the first file
is
technically redundant. I can think of some reasons why you might want
that, but it's a little surprising, so I wanted to make sure that's intentional.
The intention is that enough gets uploaded and stored somewhere that dak
(or whoever later) can reconstruct what t2u did. And, obviously, if you
then follow the steps t2u does and use as input the shallow clone
(verified against the maintainers sig), it really should get identical output. (Maybe minus timestamps, but for the important part).
The case of a repository that contains only the debian/* files
poses another set of complications, but I don't think we have to
get into that immediately. The above examples are probably enough
to work through to understand what the intended semantics of this
manifest is.
I'm not entirely sure on what is best to require here. I mean, the orig source has to be somewhere, including on the maints machine, so should
be possible to be included in this without any extra large magic.
On 30.06.24 21:30, Aigars Mahinovs wrote:
The Debian developer/maintainer creates a signed git tag that contains
(in its message, presumably, to avoid adding new communication lines)
the file listing of the git checkout at the point of signing
(including file names, modes and short SHA checksum hashes). This
extra content is added at the end of the tag message,
OK, maybe I'm just not getting it, but the tag *already* contains the file listing you want to add to the tag, implicitly: it refers a commit which refers a tree which refers to exactly those files.
If it ever does not, then we'd all have _way_ worse problems than figuring out how to safely create a t2u tag.
So what would this actually buy us, in terms of additional safety?
On 01.07.24 12:46, Aigars Mahinovs wrote:
Yes and no. See what the git tag actually contains and what the GPGSo it signs them indirectly instead. I don't consider that to be a problem.
signature actually signs is just the one hash of the commit object.
This commit object then refers to the other files of the repo, but the
GPG signature does not directly sign those.
There's no material difference whether the tag signs a commit that
hashes a tree that (eventually) hashes the files, or a list of the
files plus their hashes, or a tarball of the files in question (except
that the way we do the latter is too brittle – it depends on the file
order and compression used).
The single advantage of including a file list would be if it included
the files' SHA256-or-better hashes, but given the difficulty of
finding *and* exploiting a SHA1 collision it's a judgment call whether
that's worth the effort.
If we do decide that a second hash is worth the effort, I *strongly* recommend to simply add an (optional) field with the output of "git
ls-files -z | xargs -0 sha512sum | sort | sha512sum" to the tag. This
has the exact same security implications as a list of paths and their sha512sum but is a heap of orders of magnitude smaller.
You can mitigate this by re-validating all commit hashes using a SHA1CD
git implementation before trusting a git repository. I have not seen confirmation that 'git fsck' actually do that.
If some new attack implementation on SHA1 appears, that isn't detected
by your SHA1CD variant, your validation can be by-passed.
Firstly, you say a "shallow clone".
It is not straightforward to include *precisely* the set of commits
that are required to reproduce the output. The conversion might, in principle, go arbitrarily far into the maintainer's packaging branch;
and, if the conversion involves an external tool such as
git-debcherry, that tool probably won't currently report what
commit(s) it used - so would need to be modified.
I'm hoping the reason you say "shallow clone" is simply to avoid
bloat.
In that case, it's fairly simple: I find it difficult to imagine a
future workflow that includes the history *of the upstream branch*.
So the t2u server could exclude commits which are in the history of
the nominated upstream tag. That would generally do the right thing,
but it wouldn't *guarantee* not to include unwanted history. Would
that be OK ?
Secondly, the file listing. Thanks for the explanation. I'm still
not quite sure we understand why you want it.
Even so, I think I have a possible way to eliminate it, while still
giving you the property that dak (or a future audit) can know the file
list of the tree signed by the maintainer, without needing to actually
run git.
(I'm guessing that having dak not run git is why you don't think it's
good enough that one can verify the contents directly from the git tag
by running the git-ls-files rune.)
The git tag is itself a Merkle tree, containing the information you
need. So the hashes of all these things, and the filenames, are
already signed by the maintainer - that's the git tag. The reason
it's not readily verifiable without running git itself, is mostly
because getting the actual object texts out of git is very
complicated.
How about we (the tag2upload team):
The new listing program could be written in the language of your
choice. (I'm volunteering to write it.)
Honestly I think that sounds way more complicated than a "git ls-files something" based file and process, and binds us more tightly to actual
git than ls-files does (one could easily have more fields in there, if
deemed neccessary).
But I don't see anything obviously so wrong that its a NO, so fine.
The new listing program could be written in the language of your
choice. (I'm volunteering to write it.)
While I personally love Rust nowadays, dak is python, so python that
will be. (While it won't be integrated into dak (so easier for others to take), it should share the same language).
Sean, I think we should finish updating the design with these agreed
changes. Joerg, when we have done that, will you review it and make
sure we have properly captured what you think we've agreed ?
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 716 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 51:48:17 |
| Calls: | 12,115 |
| Calls today: | 6 |
| Files: | 15,010 |
| Messages: | 6,518,570 |
| Posted today: | 1 |