-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mathemtical aspects #1
Comments
Couple of remarks
|
The checksum (in case of minio) is computed per 64 MB object part in case of PUT and per object part in case of multipart PUT. However for the computed probabilities this does not matter. This is not part of the mathematical considerations - See comment to 2. point.
That's why the Choice of hash size paragraph says:
I don't not compute the probability for a collision of object hashes - I compute the probability for a collision of modified (bitrot occurred) object hashes. All numbers refer to objects which are already modified by bitrot. For example: Whether bitrot affects every second object or only
That is correct. Of course there is also a probability that this could happen (that would be a message forgery) but as long as the chosen cryptographic primitives satisfies the claimed security properties this probability is negligible. To show an example: So the probability for an undetected bitrot would exceed 50% after |
I see what you are saying, you are correct. I went back to my Excel sheet and did some extrapolations as to when a certain bit length would get a collision with the following results in terms of number of permutations:
So this seems in line with the results mentioned above, let's discuss it with the team and make a decision. BTW You don't think they use minio in space? :-) |
Yes, this numbers sounds reasonable. I don't know the probability for bitrot (I guess it should be relatively small). So I think 64 bit is too short, 128 bit would be acceptable and 256 would be the conservative choice. So 128bit would probably satisfy most users expectations but I would recommend 256 bit to be sure.
Oh - your right I completely forgot the ISS and all the satellites, my bad 😉 |
Requirements
Since we use a hash-based bitrot protection approach we need a (hash) function
f
mapping value of arbitrary length to a value of a fixed length. The functionf
must satisfy the following properties:f(x) = y
must be deterministic. That means thatf(x)
always evaluates toy
and not toz
at some point in timef
with k different randomly chosen inputsx1, ..., xk
should differ with probabilityP = e-((k * (k-1)) / (2 * 2n))
wheren
is length of the output values off
.The 1. requirement is kind of obvious - if
f
is not deterministic it is not possible to reliable verify the integrity of data. The 2. requirement is derived from the strong collision requirement. However in contrast to the general birthday problem we are not interested in "constructed" collisions. Therefore we define a requirement violation as a collisions of randomly chosen inputs. A constructed collision would imply and intelligent adversary exploiting a mathematical weakness of a given mappingf
to find collisions with a probabilityP' > P
. However bitrot protection is not designed to withstand an attacker because the adversary can simple modify the data and replace the stored hash value with the hash value of the modified data.Random Oracle and PRF
A cryptographic secure hash function - like BLAKE2b-512 - always satisfies our requirements since it provides strong collision resistance which does not restrict the input values to be chosen randomly.
A random oracle and following a PRF (like SipHash / HighwayHash) also satisfies our requirements since a PRF can be constructed from our function f` using the following scheme:
|ki| = |xi|
yi = f(ki ⊕ xi)
This implies that SipHash, HighwayHash and BLAKE2b can be used for bitrot protection because of their mathematical properties. (Since we discarded Poly1305 anyway I don't discuss it here further)
However SipHash and HigwayHash are PRFs and require a secret key. The secret key can be securely omitted if the PRF provides strong collision resistance for a fixed (but unkown) secret key which is the case for SipHash and HighwayHash.
Choice of hash size
The following list shows the number of data modifications that can happened before the probability for an undetected bitrot exceeds 50%:
For
k = 232 + 229 + 228
the probability for a collisionPc = 1-e-((k * (k-1)) / (2 * 2n)) = 0.506 ~= 50%
That would mean that the probability for one undetected bitrot is about 50% after 5100 million computations - roughly speaking 5 billion objects with bitrot in our case.For
k = 264 + 261 + 260
the probability for a collisionPc = 1-e-((k * (k-1)) / (2 * 2n)) = 0.506 ~= 50%
That would mean that the probability for one undetected bitrot is about 50% after 21905508 trillion computations - roughly speaking 21905508 trillion objects with bitrot in our case.For
k = 2128 + 2125 + 2124
the probability for a collisionPc = 1-e-((k * (k-1)) / (2 * 2n)) = 0.506 ~= 50%
That would mean that the probability for one undetected bitrot is about 50% after4.04×1038
computations - This is far beyond practical computation and can be treated as impossible.Notice that this numbers refer to all minio instances. So for example if we talk about 5 billion objects with bitrot than we mean 5 billion distributed over all minio instances all over the world.
Basically 50% chance of undetected bitrot on one minio server somewhere on the planet earth.
(Except aliens are using minio 😉)
The text was updated successfully, but these errors were encountered: