Yes, this sounds ridiculous. But in formal terms, it’s true. Unkeyed public hash functions like SHA512 can’t be truly collision-resistant, at least not according to the standard definitions we use to define the concept.

I know I sound like a crank, but me explain. In (paraphrased) formal terms, collision-resistance is usually defined something like this:

There exists no efficient algorithm that can output a ‘collision’ in the hash function.

Fact #1: we *know* that there exist collisions in SHA512. We don’t have to find them, their existence is a direct consequence of the pigeonhole principle.

Fact #2: if there are collisions, then we know that something like the following algorithm exists:

*am*saying is that it exists. Guaranteed, somewhere in the universe.

*knows*the algorithm, at least so far as we know. It turns out that expressing such concepts in mathematical notation is tricky. Try codifying this:

Xiaoyun Wang’s lab at Shandong University ran out of Nespresso pods last month and so (pending a restocking) we’re still mostly certain that SHA512 is collision-resistant, hmmkay?

It’s hard!

*nothing*to do with the way we use hash functions in real life — typically insisting that hash functions have keys. This kills off the argument above, because any algorithm would have to store a collision for every possible

*key*of the hash function, something that’s just not practical for an efficient algorithm to do.

Let me guess, nobody ever told you that your hash functions need keys to be collision-resistant? That’s right. __Because nobody uses them in practice__. *And we all agree never to talk about it.*