Hi,
I'd like to propose the following for portage:
- Only support one "secure" hash function (such as sha2, sha3, blake2, etc)
- Only generate and parse one hash function in Manifest files
- Remove support for multiple hash functions
In other words, what are we actually getting by having _both_ SHA2-512
and BLAKE2b for every file in every Manifest? It's not about file
integrity, since certainly a single hash handles that use case fine.
And it's not about security either, since for that we use gpg
signatures, and gpg signatures are carried out over a _single_ hash of
the plain text being hashed, so the security of the system reduces to breaking SHA2-512 anyway. So, if it's not about file integrity and
it's not about security, what is it about?
I don't really care which one we use, so long as it's not already
broken or too obscure/new. So in other words, any one of SHA2-256,
SHA2-512, SHA3, BLAKE2b, BLAKE2s would be fine with me. Can we just
pick one and roll with it?
Jason
PS: there _is_ a good reason for recording the file size in Manifest
files as we do now: it's quicker to compare sizes on large files than
it is to read and hash the whole thing, so this gives us a "free" way
of noticing quick corruption.
Hi,
I'd like to propose the following for portage:
- Only support one "secure" hash function (such as sha2, sha3, blake2, etc)
- Only generate and parse one hash function in Manifest files
- Remove support for multiple hash functions
[...]
I don't really care which one we use, so long as it's not already
broken or too obscure/new. So in other words, any one of SHA2-256,
SHA2-512, SHA3, BLAKE2b, BLAKE2s would be fine with me. Can we just
pick one and roll with it?
Hi,
I'd like to propose the following for portage:
- Only support one "secure" hash function (such as sha2, sha3, blake2, etc)
- Only generate and parse one hash function in Manifest files
- Remove support for multiple hash functions
In other words, what are we actually getting by having _both_ SHA2-512
and BLAKE2b for every file in every Manifest? It's not about file
integrity, since certainly a single hash handles that use case fine.
And it's not about security either, since for that we use gpg
signatures, and gpg signatures are carried out over a _single_ hash of
the plain text being hashed, so the security of the system reduces to breaking SHA2-512 anyway. So, if it's not about file integrity and
it's not about security, what is it about?
I don't really care which one we use, so long as it's not already
broken or too obscure/new. So in other words, any one of SHA2-256,
SHA2-512, SHA3, BLAKE2b, BLAKE2s would be fine with me. Can we just
pick one and roll with it?
PS: there _is_ a good reason for recording the file size in Manifest
files as we do now: it's quicker to compare sizes on large files than
it is to read and hash the whole thing, so this gives us a "free" way
of noticing quick corruption.
On Tue, 05 Apr 2022, Jason A Donenfeld wrote:
- GPG signatures are already over the SHA512 of the plain text, so
they security of the system already reduces to that. By choosing
SHA512, we don't add more risk, whilst choosing something else means
we're in trouble if either one has a problem.
The OpenPGP signature is for the top-level Manifest only. In case there
was any trouble, it would be trivial to change the hash algorithm used
for this.
In constrast to that, updating the hashes in all Manifest files is a
huge pain in the neck. Basically, you must download all distfiles, which
is not trivial. For example, think of fetch-restricted files. (I've
helped twice with updating Manifest files, so I believe I know what I'm talking about. :)
I don't really care which one we use, so long as it's not already
broken or too obscure/new. So in other words, any one of SHA2-256, SHA2-512, SHA3, BLAKE2b, BLAKE2s would be fine with me. Can we just
pick one and roll with it?
Back when we added BLAKE2b, the idea was to eventually remove SHA512
(the previous hash). However, this was rejected afterwards.
Hi Michal,
On Tue, Apr 05, 2022 at 02:49:12PM +0000, Michał Górny wrote:
I don't really care which one we use, so long as it's not already
broken or too obscure/new. So in other words, any one of SHA2-256, SHA2-512, SHA3, BLAKE2b, BLAKE2s would be fine with me. Can we just
pick one and roll with it?
Back when we added BLAKE2b, the idea was to eventually remove SHA512
(the previous hash). However, this was rejected afterwards.
Maybe we should pick that back up? Do you remember the ultimate
rationale for rejecting it? Do you suppose those are still valid?
This was a topic in June 2021's Council meeting:
https://gitweb.gentoo.org/sites/projects/council.git/tree/meeting-logs/20210613-summary.txt#n33
https://gitweb.gentoo.org/sites/projects/council.git/tree/meeting-logs/20210613.txt#n137
Basically there was no great reason presented for making the change
and some (IMO specious) reasons for keeping multiple hashes. I don't
think anyone felt strongly enough about removing one hash to fight for
it.
On Tue, 05 Apr 2022, Jason A Donenfeld wrote:
Huh. Something not brought up there or https://bugs.gentoo.org/784710
is the fact that the _security_ of the system reduces to SHA-512 as
used by our GPG signatures.
By the way, we're not currently _checking_ two hash functions during src_prepare(), are we?
I'd like to propose the following for portage:
- Only support one "secure" hash function (such as sha2, sha3, blake2, etc)
- Only generate and parse one hash function in Manifest files
- Remove support for multiple hash functions
In other words, what are we actually getting by having _both_ SHA2-512
and BLAKE2b for every file in every Manifest?
On Tue, 05 Apr 2022, Jason A Donenfeld wrote:
Huh. Something not brought up there or https://bugs.gentoo.org/784710
is the fact that the _security_ of the system reduces to SHA-512 as
used by our GPG signatures.
The hash algorithm would be the least of my concerns about the security
of these signatures.
IIUC, the secret signing key is stored on a machine that is connected to
the network (Infra, please correct me if I'm wrong). So there are other
more likely attack vectors than a preimage attack on a 512 bit hash
function.
In other words, what are we actually getting by having _both_ SHA2-512
and BLAKE2b for every file in every Manifest?
Implementations are often broken and we have to expect zero day attacks
on hashes and on signatures. Hence it does not hurt to have a second hash.
It is very likely that we can not trust in X for a while in the next
years, but it is very unlikely that two different implementations are affected.
Additionally calculating a second hash does not cost anything.
On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <[email protected]> wrote:
By the way, we're not currently _checking_ two hash functions during src_prepare(), are we?
I don't know, but the hash-checking is definitely checked before src_prepare().
On 5 Apr 2022, at 22:13, Jonas Stein <[email protected]> wrote:
Hi
I'd like to propose the following for portage:
- Only support one "secure" hash function (such as sha2, sha3, blake2, etc) >> - Only generate and parse one hash function in Manifest files
- Remove support for multiple hash functions
No, this has no benefit.
In other words, what are we actually getting by having _both_ SHA2-512
and BLAKE2b for every file in every Manifest?
Implementations are often broken and we have to expect zero day attacks on hashes and on signatures. Hence it does not hurt to have a second hash.
It is very likely that we can not trust in X for a while in the next years, but it is very unlikely that two different implementations are affected.
Additionally calculating a second hash does not cost anything.
This matches my views and recollection. We could revisit it
if there was a passionate advocate (which it looks like there may well be).
While I wasn't against it before, I was sort of ambivalent given
we had no strong reason to, but I'm more willing now given
we're also cleaning out other Portage cruft at the same time.
On 6 Apr 2022, at 01:15, Jason A. Donenfeld <[email protected]> wrote:
Hi Sam,
On Wed, Apr 6, 2022 at 2:02 AM Sam James <[email protected]> wrote:
This matches my views and recollection. We could revisit it
if there was a passionate advocate (which it looks like there may well be). >>
While I wasn't against it before, I was sort of ambivalent given
we had no strong reason to, but I'm more willing now given
we're also cleaning out other Portage cruft at the same time.
I think actually the argument I'm making this time might be subtly
different from the motions that folks went through last year.
Specifically, the idea last year was to switch to using BLAKE2b only.
I think what the arguments I'm making now point to is switching to
SHA2-512 only.
There are two reasons for this.
1) Security: since the GPG signatures use SHA2-512, then the whole
system breaks if SHA2-512 breaks. If we choose BLAKE2b as our only
hash, then if either SHA2-512 or BLAKE2b break, then the system
breaks. But if we choose SHA2-512 as our only hash, then we only need
to worry about SHA2-512 breaking.
2) Comparability: other distros use SHA2-512, as well as various
upstreams, which means we can compare our hashes to theirs easily.
A reason why some people might prefer BLAKE2b over SHA2-512 is a
performance improvement. However, seeing as right now we're opening
the file, reading it, computing BLAKE2b, closing the file, opening the
file again, reading it again, computing SHA2-512, closing the file, I
don't think performance is actually something people care about. Seen differently, removing either one of them will already give us a
performance "boost" or sorts.
Jason
On 5 Apr 2022, at 22:13, Jonas Stein <[email protected]> wrote:
In other words, what are we actually getting by having _both_ SHA2-512
and BLAKE2b for every file in every Manifest?
Implementations are often broken and we have to expect zero day attacks on hashes and on signatures. Hence it does not hurt to have a second hash.
I don't think this is the case. They're not broken often, it's a very very big deal when they do, and we'd also have far bigger problems in such a case (as already pointed out, TLS would be an issue, but also GPG signatures, git commit hashes, ...).
It is very likely that we can not trust in X for a while in the next years, but it is very unlikely that two different implementations are affected.
I don't think it is likely that e.g. SHA512 will be broken in the next few years, no, but if it is going to be, we have far bigger issues and we'd need to have double algorithms in our whole stack, which we don't have.
Additionally calculating a second hash does not cost anything.
It does have a cost at both Manifest-generation time and emerge-time.
On Wed, 06 Apr 2022, Jason A Donenfeld wrote:
I think actually the argument I'm making this time might be subtly
different from the motions that folks went through last year.
Specifically, the idea last year was to switch to using BLAKE2b only.
I think what the arguments I'm making now point to is switching to
SHA2-512 only.
On Wed, 06 Apr 2022, Jason A Donenfeld wrote:
I think actually the argument I'm making this time might be subtly
different from the motions that folks went through last year.
Specifically, the idea last year was to switch to using BLAKE2b only.
I think what the arguments I'm making now point to is switching to
SHA2-512 only.
Still, I think that if we drop one of the hashes then we should proceed
with the original plan. That is, keep the more modern BLAKE2B (which was
a participant of the SHA-3 competition [1]) and drop the older SHA512.
I also think that the argument about the OpenPGP signature isn't very
strong, because replacing that signature by another one using a
different hash is trivial. As I said before, replacing all Manifest
files in the tree isn't.
Why? Then we're dependent on two things, either of which could break, rather than one.
See? If either of these should happen, then we'll be happy that we still
have both hashes in our Manifest files.
OTOH, if that argument is not relavant because the probability of both
is close to zero, then (from a security POV) it doesn't matter which of
the two hashes we remove.
On Wed, 06 Apr 2022, Jason A Donenfeld wrote:
Why? Then we're dependent on two things, either of which could break,
rather than one.
2) Comparability: other distros use SHA2-512, as well as variousCan we expand on this specific thread for a moment?
upstreams, which means we can compare our hashes to theirs easily.
A reason why some people might prefer BLAKE2b over SHA2-512 is aOr just only verifying the "strongest" hash gives you that boost.
performance improvement. However, seeing as right now we're opening
the file, reading it, computing BLAKE2b, closing the file, opening the
file again, reading it again, computing SHA2-512, closing the file, I
don't think performance is actually something people care about. Seen differently, removing either one of them will already give us a
performance "boost" or sorts.
On Tue, Apr 5, 2022 at 8:05 PM Sam James <[email protected]> wrote:
Our security fails currently if EITHER SHA2-512 or a hardened version
of SHA-1 are defeated. Our top gpg signature is bound to a git commit
record by SHA2-512, and the git commit record is bound to everything
else in the repository (including the manifest objects) by SHA-1,
because git hasn't transitioned away from that (as far as I'm aware it
is still a work in progress - the SHA-1 algorithm it uses is hardened
against known attacks).
I agree that this is an unlikely scenario, so it is a judgement call
as to whether the ease of recovery in the event of a failure is worth
the cost to maintain the second hash. I agree that we'd need double algorithms in the whole stack to prevent a failure, but in the current
state we do have advantages for recovering from a failure after the
fact.
It seems that the likely scenario is that we get advance warning of weaknesses in a hash function, but without a practical exploit being
readily available. In that case we could do a more orderly
transition. We'd still save time with the double hashed manifests,
and whether this makes a difference is hard to say.
On Wed, 06 Apr 2022, Jason A Donenfeld wrote:
So I'll spell out the different possibilities:
1) GPG uses SHA-512. Manifest uses SHA-512 and BLAKE2b.
1a) Possibility: SHA-512 is broken. Result: system broken.
1b) Possibility: BLAKE2b is broken. Result: nothing.
2) GPG uses SHA-512. Manifest uses SHA-512.
2a) Possibility: SHA-512 is broken. Result: system broken.
2b) Possibility: BLAKE2b is broken. Result: nothing.
3) GPG uses SHA-512. Manifest uses BLAKE2b.
3a) Possibility: SHA-512 is broken. Result: system broken.
3b) Possibility: BLAKE2b is broken. Result: system broken.
See how from a security perspective, (2) is not worse than (1), but
(3) is worse than both (1) and (2)?
No, you're still missing the point.Question directly for you Jason, because you make a professional study
If SHA-512 breaks, the security of the system fails, regardless of
what change we make. This is because GnuPG uses SHA-512 for its
signatures.
So I'll spell out the different possibilities:score -1 + 0 = -1
1) GPG uses SHA-512. Manifest uses SHA-512 and BLAKE2b.
2) GPG uses SHA-512. Manifest uses SHA-512.score -1 + 0 = -1
3) GPG uses SHA-512. Manifest uses BLAKE2b.score -1 + -1 = -2
See how from a security perspective, (2) is not worse than (1), butYes, (2) is not worse than (1) for the overall security perspective.
(3) is worse than both (1) and (2)?
Sort of. The security between infra and users relies on SHA2-512. The security between devs and infra relies on SHA-1. I guess the "full
system" depends on both, but I've been focused on the more likely
issue of a community-run mirror serving bogus files.
Yea I see this argument, but I don't quite buy it. Maintaining two
sets of hashes for the unlikely event that one gets broken AND we
absolutely cannot incrementally transition gradually to an unbroken
one seems rather overblown.
Hi Matt,
On Tue, Apr 5, 2022 at 10:38 PM Matt Turner <[email protected]> wrote:
On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <[email protected]> wrote: >>> By the way, we're not currently _checking_ two hash functions during
src_prepare(), are we?
I don't know, but the hash-checking is definitely checked before src_prepare().
Er, during the builtin fetch phase. Anyway, you know what I meant. :)
Anyway, looking at the portage source code, to answer my own question,
it looks like the file is actually being read twice and both hashes
computed. I would have at least expected an optimization like:
hash1_init(&hash1);
hash2_init(&hash2);
for chunks in file:
hash1_update(&hash1, chunk);
hash2_update(&hash2, chunk);
hash1_final(&hash1, out1);
hash2_final(&hash2, out2);
But actually what's happening is the even less efficient:
hash1_init(&hash1);
for chunks in file:
hash1_update(&hash1, chunk);
hash1_final(&hash1, out1);
hash2_init(&hash2);
for chunks in file:
hash2_update(&hash2, chunk);
hash1_final(&hash2, out2);
So the file winds up being open and read twice. For huge tarballs like chromium or libreoffice...
But either way you do it - the missed optimization above or the
unoptimized reality below - there's still twice as much work being
done. This is all unless I've misread the source code, which is
possible, so if somebody knows this code well and I'm wrong here,
please do speak up.
On 4/5/2022 17:49, Jason A. Donenfeld wrote:
Hi Matt,
On Tue, Apr 5, 2022 at 10:38 PM Matt Turner <[email protected]> wrote:
On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <[email protected]> wrote:
By the way, we're not currently _checking_ two hash functions during
src_prepare(), are we?
I don't know, but the hash-checking is definitely checked before src_prepare().
Er, during the builtin fetch phase. Anyway, you know what I meant. :)
Anyway, looking at the portage source code, to answer my own question,
it looks like the file is actually being read twice and both hashes computed. I would have at least expected an optimization like:
hash1_init(&hash1);
hash2_init(&hash2);
for chunks in file:
hash1_update(&hash1, chunk);
hash2_update(&hash2, chunk);
hash1_final(&hash1, out1);
hash2_final(&hash2, out2);
But actually what's happening is the even less efficient:
hash1_init(&hash1);
for chunks in file:
hash1_update(&hash1, chunk);
hash1_final(&hash1, out1);
hash2_init(&hash2);
for chunks in file:
hash2_update(&hash2, chunk);
hash1_final(&hash2, out2);
So the file winds up being open and read twice. For huge tarballs like chromium or libreoffice...
But either way you do it - the missed optimization above or the
unoptimized reality below - there's still twice as much work being
done. This is all unless I've misread the source code, which is
possible, so if somebody knows this code well and I'm wrong here,
please do speak up.
Not to go off-topic, but where in Portage's source is this logic at? It seems like an easy fix for a slightly more efficient Portage.
On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote:Bump for my parent message, that I'm very surprised at the lack of
2) Comparability: other distros use SHA2-512, as well as variousCan we expand on this specific thread for a moment?
upstreams, which means we can compare our hashes to theirs easily.
I was the author of GLEP59 about changing the Manifest hashes, and I
noted at the time, with references, that the effective strength of a set
of hashes is only that of the strongest hash.
On Wed, Apr 06, 2022 at 02:15:02AM +0200, Jason A. Donenfeld wrote:
2) Comparability: other distros use SHA2-512, as well as variousCan we expand on this specific thread for a moment?
upstreams, which means we can compare our hashes to theirs easily.
I was the author of GLEP59 about changing the Manifest hashes, and I
noted at the time, with references, that the effective strength of a set
of hashes is only that of the strongest hash.
One of my regrets from GLEP59 is that it's made it harder for use cases outside of the normal user distfile workflow.
The use case that impacted me the most was being able to compare our distfiles were over time vs external sources, esp. if the file goes
missing or was fetch-restricted and we can't produce a new hash of it.
Maybe upstream only ever published SHA1/SHA256, and we only ever
calculated SHA512/BLAKE2b on the file. Since we never had hashes from
both sides at the same time, we cannot prove it was the same file.
We need to be able to ship one or more hashes to users, for the specific
use case of validating the distfiles they download.
As a developer, I'd like to be able to track the other hashes for a
file, without forcing ourselves to retain the file. This might be to
compare with upstream published hashes, or to compare with other
distros.
In fact it would be really nice to have a semi-automated pipeline to
plug in signed upstream hashes to our Manifests, and make it possibly to prove our new SHA512/BLAKE2B hash was taken over the correct input in
the first place, and there wasn't any subtle supply-chain attack early
in the packaging process.
Where would those hashes go? They don't need to be in the Manifest, or
at the very least they don't need to be distributed via rsync to users
(it only costs a small amount of bytes to do so).
Where else could they go?
- Commit messages could work.
- Git notes to a lesser degree.
- alternate repos?
A reason why some people might prefer BLAKE2b over SHA2-512 is a performance improvement. However, seeing as right now we're openingOr just only verifying the "strongest" hash gives you that boost.
the file, reading it, computing BLAKE2b, closing the file, opening the
file again, reading it again, computing SHA2-512, closing the file, I
don't think performance is actually something people care about. Seen differently, removing either one of them will already give us a
performance "boost" or sorts.
I do want to check into the code that you pointed out, because I'm
really sure much older versions of Portage did the CORRECT thing of only reading the file in a single pass.
On Wed, Apr 06, 2022 at 07:06:30PM +0200, Jason A. Donenfeld wrote:
No, you're still missing the point.
If SHA-512 breaks, the security of the system fails, regardless ofQuestion directly for you Jason, because you make a professional study
what change we make. This is because GnuPG uses SHA-512 for its
signatures.
of this: does the type of breakage/successful attack against against
SHA-512 matter?
e.g. is it possible that some type of attack would only work against the Manifest entry, but NOT against the GPG signature's embedded SHA-512 (or
the opposite).
The best hypothetical idea I had was that there exists some large
special input that lets an attacker reset the output to an arbitrary
hash after their malicious payload: but it wouldn't fit in the GPG
signature space.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 144:27:57 |
| Calls: | 12,089 |
| Calls today: | 2 |
| Files: | 15,000 |
| Messages: | 6,517,490 |