Front Page Archive

Cessen's Ramblings

2024 - 07 - 10

Hash Design and Goodhart's Law

SMHasher is a popular test suite created by Austin Appleby for testing hash functions for weaknesses and speed. There is also a more up-to-date fork.

SMHasher is widely used in the design and vetting of non-cryptographic hash functions. And non-cryptographic hash functions often advertise that they pass the entire SMHasher test suite as a signal of hash quality.

This is a problem for hashes with large output sizes, and in this post we're going to explore why that is.

In the process of exploring that, we're also going to:

  • Create a 128-bit hash function that cleanly passes SMHasher, while nevertheless having issues that are obvious from analysis.
  • Identify a common issue with most currently published large-output non-cryptographic hashes.

Please note that this entire post is about hashes intended for use in non-adversarial conditions. Hashes that need to withstand attacks have additional considerations, and I'm not qualified to discuss that.1

2022 - 07 - 09

Major version numbers may not be sacred, but backwards compatibility is.

Tom Preston-Werner, the creator of the SemVer version numbering standard, published an article a little bit ago titled Major Version Numbers are Not Sacred. The article is worth a read, but the basic argument (as best I can summarize it) is that we shouldn't be afraid of incrementing the major version number of our SemVer-adhering software projects to indicate breaking changes.

On the one hand, I agree with him: if SemVer is to be meaningful, then we obviously need to ensure that we increment the major version number any time we release a breaking change. Even if the breaking change isn't a big, sexy, marketable change. Even if it's just a change to a corner of the API that few, if any, people use. You still need to increment the major version number to indicate the backwards compatibility break. If you don't, then you're simply not adhering to SemVer. And there's nothing specifically wrong about not adhering to SemVer, but you shouldn't then claim to be adhering.

On the other hand, he then goes on to argue (if I understand him correctly) that we should therefore be willing to increment the major version number willy-nilly. And I very much disagree with him there. It is specifically because SemVer ties API breakage to the major version number that SemVer-adhering projects should be hesitant about major version bumps. Not because the version number matters, but because backwards compatibility matters.

Of course, if your project is experimental, still in alpha, just for fun, etc. then this obviously doesn't apply. But for serious projects out of alpha/beta that are intended for real use, backwards compatibility matters a lot.

2020 - 07 - 27

Foreign Language Dictionaries

In this post I'm going to make the case that foreign language dictionaries (e.g. Japanese-English dictionaries) are actually thesauruses, not dictionaries, and that this has important implications for how you use them when learning a language.

2020 - 04 - 03

Ropey: Things I Would Do Differently

Ropey is a utf8 text rope library for Rust that I wrote and maintain. Initially I created it just for use in my own text editor Led, but early on I decided to split it out into a separate library since it seemed generally useful. Last year I finally released Ropey 1.0, and since then I've had time to reflect on some of the design decisions I made, and what I do and don't like about Ropey.

To be clear, I don't expect to create a Ropey 2.0. At least, not any time soon. And even if I do, I will still maintain the 1.x release indefinitely. I strongly believe that stability is important for published code that you want others to use, and rapid-fire releasing new major versions with breaking changes is a great way to undermine that.

With that said, here are the things I would go back and change if I could.

2019 - 01 - 09

Rust Community Norms for Unsafe Code

I recently released Ropey 1.0, a text rope library for Rust. Ropey uses unsafe code internally, and its use of unsafe unsurprisingly came up in the 1.0 release thread on Reddit.

The ensuing discussion (especially thanks to Shnatsel) helped me significantly reduce the amount of unsafe code in Ropey with minimal (though not non-existent) performance degradation. But the whole thing nevertheless got me thinking about unsafe code and community norms around it, and I figured writing some of those thoughts down might be useful.

My hope is this post will be part of a community-wide discussion, and I would love for others (likely smarter than me) to write their thoughts on this topic as well. This post is simply my take on it.