Hash Design and Goodhart's Law
SMHasher is a popular test suite created by Austin Appleby for testing hash functions for weaknesses and speed. There is also a more up-to-date fork.
SMHasher is widely used in the design and vetting of non-cryptographic hash functions, and non-cryptographic hash functions often advertise that they pass the entire SMHasher test suite as a signal of hash quality.
This is a problem for hashes with large output sizes, and in this post we're going to explore why that is.
In the process of exploring that, we're also going to:
- Create a 128-bit hash function that cleanly passes SMHasher, while nevertheless having issues that are obvious from analysis.
- Identify a common issue with most currently published large-output non-cryptographic hashes.
Please note that this entire post is about hashes intended for use in non-adversarial conditions. Hashes that need to withstand attacks have additional considerations, and I'm not qualified to discuss that.1