---
Ok! Round two.
Here's a summary of the changes from v1. For git-debpush:
- The pristine-tar checking code is now only run if this is a non-native
package (i.e., "if $upstream").
- The upstream version is used instead of the Debian-revised one.
- Differently from the old pristine-tar check, the code is not run just
for the first (i.e., -1 or -0.1) revision, but for any upload. This
way, the t2u service can potentially handle the case where
a pristine-tar upload was intended, but no orig is available in the
archive yet. Please let me know if this makes sense or not!
diff --git a/git-debpush b/git-debpush
index e3a4ba39..78e42fb9 100755
--- a/git-debpush
+++ b/git-debpush
@@ -457,6 +457,30 @@ if $upstream; then
to_push+=("$upstream_tag")
fi
+# I obtain the commit ID at the time of the upload, so that I can be sure that
+# the tag2upload service generates the tarball with the expected pristine-tar
+# branch state
+pristine_tar_info=''
+if $upstream; then
+ uversion="${version%-*}"
+
+ if pristine_tar_commit=$(git rev-parse --verify --quiet 'refs/heads/pristine-tar'); then
+ pristine_tar_tarballs=$(git ls-tree -z --name-only -- 'refs/heads/pristine-tar' \
+ | grep -zF -- "${source}_${uversion}.orig.tar." \
+ | grep -zc -- "\.id$")
+
+ if [ "$pristine_tar_tarballs" -gt 1 ]; then
+ fail 'more then one pristine-tar orig'
+ fi
+
+ # If there's no tarball, the user probably stopped using pristine-tar a
+ # while ago, but didn't delete the branch. Just ignore it.
+ if [ "$pristine_tar_tarballs" -eq 1 ]; then
+ pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+ fi
+ fi
+fi
+
#**** Useful sanity checks ****
@@ -2031,6 +2033,9 @@ END
"s=$suite",
"u=$t2u_upstreamc",
);
+ if (length $t2u_pristine_tar) {
+ push(@obtain_origs, "pristine_tar=$t2u_pristine_tar")
+ }
+# TODO: what about signature files?
I think it would be helpful to work on the spec in tag2upload(5)
before continuing too much with code. It'll make it easier to keep
the three of us on the same page.
- The pristine-tar checking code is now only run if this is a non-native
package (i.e., "if $upstream").
- The upstream version is used instead of the Debian-revised one.
ITYM your new code, right?
The old check already had both these properties.
- [..] the t2u service can potentially handle the case where
a pristine-tar upload was intended, but no orig is available in the
archive yet. Please let me know if this makes sense or not!
It might make sense, I'm not sure yet. Can you describe a concrete
example that would lead to this being helpful?
diff --git a/git-debpush b/git-debpush
index e3a4ba39..78e42fb9 100755
--- a/git-debpush
+++ b/git-debpush
@@ -457,6 +457,30 @@ if $upstream; then
to_push+=("$upstream_tag")
fi
+# I obtain the commit ID at the time of the upload, so that I can be sure that
+# the tag2upload service generates the tarball with the expected pristine-tar
+# branch state
+pristine_tar_info=''
+if $upstream; then
+ uversion="${version%-*}"
+
+ if pristine_tar_commit=$(git rev-parse --verify --quiet 'refs/heads/pristine-tar'); then
+ pristine_tar_tarballs=$(git ls-tree -z --name-only -- 'refs/heads/pristine-tar' \
+ | grep -zF -- "${source}_${uversion}.orig.tar." \
+ | grep -zc -- "\.id$")
+
+ if [ "$pristine_tar_tarballs" -gt 1 ]; then
+ fail 'more then one pristine-tar orig'
+ fi
+
+ # If there's no tarball, the user probably stopped using pristine-tar a
+ # while ago, but didn't delete the branch. Just ignore it.
+ if [ "$pristine_tar_tarballs" -eq 1 ]; then
+ pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+ fi
+ fi
+fi
+
#**** Useful sanity checks ****
Can you explain why you've put this in at this point in the script? I
think that maybe it should go later, after all the sanity checks.
I take it you switched from invoking pristine-tar itself to calling git-ls-tree in order to use NUL termination? If so, maybe we should
make that change first to the existing check. Perhaps you could
prepare an MR to that effect.
@@ -2031,6 +2033,9 @@ END
"s=$suite",
"u=$t2u_upstreamc",
);
+ if (length $t2u_pristine_tar) {
+ push(@obtain_origs, "pristine_tar=$t2u_pristine_tar")
+ }
Generally we avoid parentheses on builtin operators and use poetry
style, so
push @obtain_origs, "pristine_tar=$t2u_pristine_tar"
if $t2u_pristine_tar;
+# TODO: what about signature files?
Do you think we could extract them and include them in the upload?
I think we can verify them by using the upstream key embedded in the
source package, right? And if that verification fails we should
probably abort the upload -- maintainers who choose to use tarball signatures had better make sure they verify.
Thanks. I've included some inline comments below.
I think it would be helpful to work on the spec in tag2upload(5) before continuing too much with code. It'll make it easier to keep the three
of us on the same page.
On Sat 26 Jul 2025 at 02:12pm +02, Andrea Pappacoda wrote:
- Differently from the old pristine-tar check, the code is not run just
for the first (i.e., -1 or -0.1) revision, but for any upload. This
way, the t2u service can potentially handle the case where
a pristine-tar upload was intended, but no orig is available in the
archive yet. Please let me know if this makes sense or not!
It might make sense, I'm not sure yet. Can you describe a concrete
example that would lead to this being helpful?
+# TODO: what about signature files?
Do you think we could extract them and include them in the upload?
I think we can verify them by using the upstream key embedded in the
source package, right? And if that verification fails we should
probably abort the upload -- maintainers who choose to use tarball
signatures had better make sure they verify.
Hi again!
I tried to add to the tag2upload.5 manpage the pristine-tar handling
design outlined in our discussions, which is inline below. Still, I have
a few questions:
What should we do with that upstream commit metadata?
On Sat 26 Jul 2025 at 03:56pm +02, Andrea Pappacoda wrote:
There's really no reason why really, I just tried to put everything
pristine-tar related in the same place. Thinking about it, these
checks can only go before obtaining the pristine_tar_info, because
I cannot reasonably get the pristine-tar info before first making
sure there's just one orig.
I would suggest it should go in the section marked "Gather git history information".
What I was thinking is that changing to use git(1) instead of pristine-tar(1) is a logically distinct change from changing from
a check to embedding pristine-tar info in the tag. So they should be separate commits anyway, and we'd want to run the full test suite
against both of them. While we are still discussing design you could
get the first change out of the way with a MR now.
Do you mean whether dak does any verification? I don't know.
Great :)
I think that for the time being, publishing the signature without extra processing is the most appropriate solution. We always have time to
revise it if needed.
On Sun 27 Jul 2025 at 04:11pm +02, Andrea Pappacoda wrote:
I tried to add to the tag2upload.5 manpage the pristine-tar handling
design outlined in our discussions, which is inline below. Still,
I have a few questions:
What should we do with that upstream commit metadata?
Sorry, but which metadata is that?
Trying to read your patch, I think the fact I don't use pristine-tar is really showing. Is the .id file defined somewhere? Is your knowledge
of the pristine-tar branch contents from reading a spec, or empirical?
The upstream parameter specifies the tag or branch [or commit, Ed.]
that contains the same content that is present in the tarball. The
name of the tree it points to will be recorded for later use by
pristine-tar checkout.
Glad we have someone who knows it better working on it.
I tried to add to the tag2upload.5 manpage the pristine-tar handling
design outlined in our discussions, which is inline below. Still, I have
a few questions:
What should we do with that upstream commit metadata? pristine-tar does
not need that, since it'll generate the tarball from the git tree id
stored in source_version.orig.tar.id. Still, we might want to make sure
that the pristine-tar tree corresponds to the one of the upstream commit
id. I don't know how useful this would be though, since the delta may
contain additional file additions and removals. Also, what should we do
with such tarballs whose contents are not identical to the git tree?
In the text below, I assume that:
- We want to verify equality of upstreamc's tree and the one used by
pristine-tar.
- We allow binary deltas (i.e., the .delta file) to contain
modifications to files stored in the referenced tree, such as the
addition of configure scripts.
Here it is:
=item C<pristine-tar>=COMMITID
Identifies the state of the pristine-tar branch at the time of push, if present and containing data related to the current upstream version.
Something like
Names a commit containing pristine-tar metadata.
The commit must contain SOMETHING LIKE exactly one .id file with
SOME PROPERTIES OR OTHER. The .id file MUST SATISFY SOME
CONDITIONS THAT I DON'T UNDERSTAND.
The tag must also contain an C<upstream> item, and the tree named in
the .id file must be identical to that of the C<upstream> commit.
The pristine-tar commit may contain SOMEHOW IDENTIFIABLE signature
file. The signature file MUST SATISFY REASONAB.E CONDITIONS SUCH AS
ITS FILENAME BEING SANE. The signature file will then be published
together with the orig tarball. The signature file is treated as
pure data by the service (so will not be verified or even format
checked).
If an orig tarball needs to be (re)generated, the service will use
pristine-tar, using precixely the metadata in the .id file. The
service will check that the generated tarball MATCHES THE HASH IN
THE .ID FILE and that its contained tree is identical to SOMETHING.
The named prstine-tar commit must be reachable from the
C<pristine-tar> branch in the repository.
Ian.
Have you had a chance to look at the following?
On Mon 28 Jul 2025 at 08:19pm +01, Ian Jackson wrote:
Something like
Names a commit containing pristine-tar metadata.
The commit must contain SOMETHING LIKE exactly one .id file with
SOME PROPERTIES OR OTHER. The .id file MUST SATISFY SOME
CONDITIONS THAT I DON'T UNDERSTAND.
The tag must also contain an C<upstream> item, and the tree named
in the .id file must be identical to that of the C<upstream>
commit.
The pristine-tar commit may contain SOMEHOW IDENTIFIABLE signature
file. The signature file MUST SATISFY REASONAB.E CONDITIONS SUCH
AS ITS FILENAME BEING SANE.
The signature file will then be published together with the orig
tarball. The signature file is treated as pure data by the service
(so will not be verified or even format checked).
If an orig tarball needs to be (re)generated, the service will use
pristine-tar, using precixely the metadata in the .id file. The
service will check that the generated tarball MATCHES THE HASH IN
THE .ID FILE and that its contained tree is identical to SOMETHING.
The named prstine-tar commit must be reachable from the
C<pristine-tar> branch in the repository.
In practise, pristine-tar always stores the signature file as "orig_name.asc". So I think we could just specify this requirement here.
If an orig tarball needs to be (re)generated, the service will use
pristine-tar, using precixely the metadata in the .id file. The
service will check that the generated tarball MATCHES THE HASH IN
THE .ID FILE and that its contained tree is identical to SOMETHING.
I'm not sure I get this part, but if you meant what I understood, then
it's wrong. The .id file does not contain the hash of the tarball, it contains a single line which corresponds to the tree id, as mentioned
above. I'm honestly not sure where the hash verification happens, but *i believe* it's part of the reconstruction when pristine-gz and co re run, thanks to information stored in the .delta (VCDIFF) file.
One question remains unanswered. Should we allow .delta files modifying
the tarball contents (i.e., do we want to allow generating tarballs
which have different contents then the git tree)?
I've come back from a party and am a bit tipsy so I will read this
properly later, but:
Thanks for engaging with these questions!
I think in principle it might be a .sig.
So the .id contains the tree (git tree object) which uniquely
identifies the *contents* of the tarball.
But how does the pristine-tar information specify the precise hash of
the tarball itself? Does the .delta file say what the output hash is supposed to be ?
I don't think I fully understand the implications. My default
position is that the answer should be "no" unless one of us *does* understand the implications :-).
I don't think I fully understand the implications. My default
position is that the answer should be "no" unless one of us *does*
understand the implications :-).
One different example which may illustrates the "unexpected" results
which this could lead to is this one. Here, the tarball is created with
a file containing "evil" content, while in the upstream/latest branch
only the "good" content is stored. Upon tarball checkout, the good
content gets replaced with the evil one:
I thought that the .delta files were mostly to cover, for example, the tarball containing autotools-generated files that aren't in git?
Isn't that a key use case?
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):
I thought that the .delta files were mostly to cover, for example, the
tarball containing autotools-generated files that aren't in git?
Isn't that a key use case?
Not according to Colin in the "want Jia Tan option" bug,
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109423#15
Empty directories are a corner case but git will consider them
treesame so if we do the check in git all will be well.
I thought that the .delta files were mostly to cover, for example, the tarball containing autotools-generated files that aren't in git?
Isn't that a key use case?
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):
I thought that the .delta files were mostly to cover, for example, the
tarball containing autotools-generated files that aren't in git?
Isn't that a key use case?
Not according to Colin in the "want Jia Tan option" bug,
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109423#15
Empty directories are a corner case but git will consider them
treesame so if we do the check in git all will be well.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 716 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 49:34:56 |
| Calls: | 12,115 |
| Calls today: | 6 |
| Files: | 15,010 |
| Messages: | 6,518,539 |