Front Page Archive

Cessen's Ramblings

2020 - 04 - 03

Ropey: Things I Would Do Differently

Ropey is a utf8 text rope library for Rust that I wrote and maintain. Initially I created it just for use in my own text editor Led, but early on I decided to split it out into a separate library since it seemed generally useful. Last year I finally released Ropey 1.0, and since then I've had time to reflect on some of the design decisions I made, and what I do and don't like about Ropey.

To be clear, I don't expect to create a Ropey 2.0. At least, not any time soon. And even if I do, I will still maintain the 1.x release indefinitely. I strongly believe that stability is important for published code that you want others to use, and rapid-fire releasing new major versions with breaking changes is a great way to undermine that.

With that said, here are the things I would go back and change if I could.

Unicode scalar values are the primary indexing metric.

When I started Ropey, I had this grand plan to make a Unicode text rope that basically hides the underlying encoding, and just lets you deal with scalar values. Thus, all of Ropey's primary APIs, including its slicing APIs, operate in terms of Unicode scalar values rather than bytes.

I now believe that was the wrong decision.

First of all, it turns out that it's just not that useful in practice. It seems like a nice idea at first, because hey, Unicode scalar values are the atomic unit of Unicode text, right? But the reality is that because of e.g. graphemes, there isn't much of a practical difference between bytes and scalar values for any kind of real thing you might want to do. If you want to report character counts to the user, for example, you'll want to use some concept of graphemes, not scalar values. And if you want to insert text at a given location, any reasonably efficient indexing metric will do.

Second, it costs performance and space to keep track of scalar values. Every time you insert text, for example, Ropey has to scan the inserted text to count its scalar values. The same goes for removing text. And also internally when Ropey splits leaf nodes. Etc. And you have to store those scalar value counts in the nodes. And it's all just a bunch more code to maintain in Ropey for relatively little payoff.

Third, sometimes you need to operate in terms of bytes. There are a lot of situations where you can't reasonably be encoding agnostic. So Ropey needs to provide byte-oriented API's anyway. Thus, by using scalar values, all I really did was increase Ropey's API surface area without much practical application.

Long story short: if I were to do it over again, I would remove all the scalar value oriented APIs (the Chars iterator notwithstanding).

Having said all that, I don't think this was a huge blunder or anything. And it does make Ropey easier to jump into for casual use. But I don't think it pulls its weight.

Ropey includes split and concatenate operations.

One of the things that ropes are really good at is split and concatenate operations. These are even listed as fundamental operations on e.g the Wikipedia page about ropes. Thus, I implemented them for Ropey.

However, much like scalar value indexing, in practice these operations are pretty much useless for the kind of work Ropey is designed for. In a text editor, you rarely split a text buffer into two pieces, or merge two buffers together. And even in the rare cases where you need to do that, it's never done as a singular operation, so you can't really use these operations for it anyway.

And the down side of including these operations in Ropey is twofold:

  1. Again, increased API surface area.
  2. The code for these operations is surprisingly complicated because of the B-tree rebalancing that has to happen.

For both of those reasons, I wish I just hadn't included these operations in Ropey to begin with. It makes the API messier, and it's a bunch of gnarly code that I have to maintain now for no good reason.

The only up-side to having this code in Ropey is that it allows very large text insertions/removals to be done in O(M + log N) time (where M is the size of the text being inserted/removed) rather than O(M log N), because I re-use the split/concatenate code to special-case that. But small insertions/removals are the typical case, and very large insertions would still be reasonably fast (the constant factor on M is pretty low in both cases).

In short, I also don't think this code is pulling its weight.

Leaf nodes and internal nodes are the same size.

This is actually one that can be changed in a completely backwards compatible way, so I may yet change this in a future 1.x release. But doing so will be a pain, and I wish I had done this differently from the start.

The current design of Ropey requires that internal nodes and leaf nodes occupy roughly the same number of bytes as each other. I don't think this was a horrible decision, but it did box me in a bit, and I believe Ropey could be made even faster if this requirement were relaxed.

Specifically, if I want to play with the size of the leaf nodes to see how that affects performance, it necessarily also changes the internal node size. And vice-versa. And I have a hunch that the optimal sizes for each are different. However, even if they turn out to be the same, having the flexibility to make them different wouldn't cause any problems, and it makes experimentation easier down the line.

Final Thoughts

My general feeling about good code and good API design is that it's as much about what you don't include as it is about what you do include. And that feeling is certainly reflected in this post: none of these items are things I wish I had added, and two of them are things I wish I hadn't.

But despite what I've written above, I'm very happy with Ropey as a library over-all. Is it perfect? Nope. But I do think it's solidly good, and I believe it's a great choice for anyone building a text editor.