Cessen's Ramblings

2020 - 04 - 03

Ropey: Things I Would Do Differently

Ropey is a utf8 text rope library for Rust that I wrote and maintain. Initially I created it just for use in my own text editor Led, but early on I decided to split it out into a separate library since it seemed generally useful. Last year I finally released Ropey 1.0, and since then I've had time to reflect on some of the design decisions I made, and what I do and don't like about Ropey.

To be clear, I don't expect to create a Ropey 2.0. At least, not any time soon. And even if I do, I will still maintain the 1.x release indefinitely. I strongly believe that stability is important for published code that you want others to use, and rapid-fire releasing new major versions with breaking changes is a great way to undermine that.

With that said, here are the things I would go back and change if I could.

Unicode scalar values are the primary indexing metric.

When I started Ropey, I had this grand plan to make a Unicode text rope that basically hides the underlying encoding, and just lets you deal with scalar values. Thus, all of Ropey's primary APIs, including its slicing APIs, operate in terms of Unicode scalar values rather than bytes.

I now believe that was the wrong decision.

First of all, it turns out that it's just not that useful in practice. It seems like a nice idea at first, because hey, Unicode scalar values are the atomic unit of Unicode text, right? But the reality is that because of e.g. graphemes, there isn't much of a practical difference between bytes and scalar values for any kind of real thing you might want to do. If you want to report character counts to the user, for example, you'll want to use some concept of graphemes, not scalar values. And if you want to insert text at a given location, any reasonably efficient indexing metric will do.

Second, it costs performance and space to keep track of scalar values. Every time you insert text, for example, Ropey has to scan the inserted text to count its scalar values. The same goes for removing text. And also internally when Ropey splits leaf nodes. Etc. And you have to store those scalar value counts in the nodes. And it's all just a bunch more code to maintain in Ropey for relatively little payoff.

Third, sometimes you need to operate in terms of bytes. There are a lot of situations where you can't reasonably be encoding agnostic. So Ropey needs to provide byte-oriented API's anyway. Thus, by using scalar values, all I really did was increase Ropey's API surface area without much practical application.

Long story short: if I were to do it over again, I would remove all the scalar value oriented APIs (the Chars iterator notwithstanding).

Having said all that, I don't think this was a huge blunder or anything. And it does make Ropey easier to jump into for casual use. But I don't think it pulls its weight.

Ropey includes split and concatenate operations.

One of the things that ropes are really good at is split and concatenate operations. These are even listed as fundamental operations on e.g the Wikipedia page about ropes. Thus, I implemented them for Ropey.

However, much like scalar value indexing, in practice these operations are pretty much useless for the kind of work Ropey is designed for. In a text editor, you rarely split a text buffer into two pieces, or merge two buffers together. And even in the rare cases where you need to do that, it's never done as a singular operation, so you can't really use these operations for it anyway.

And the down side of including these operations in Ropey is twofold:

  1. Again, increased API surface area.
  2. The code for these operations is surprisingly complicated because of the B-tree rebalancing that has to happen.

For both of those reasons, I wish I just hadn't included these operations in Ropey to begin with. It makes the API messier, and it's a bunch of gnarly code that I have to maintain now for no good reason.

The only up-side to having this code in Ropey is that it allows very large text insertions/removals to be done in O(M + log N) time (where M is the size of the text being inserted/removed) rather than O(M log N), because I re-use the split/concatenate code to special-case that. But small insertions/removals are the typical case, and very large insertions would still be reasonably fast (the constant factor on M is pretty low in both cases).

In short, I also don't think this code is pulling its weight.

Leaf nodes and internal nodes are the same size.

This is actually one that can be changed in a completely backwards compatible way, so I may yet change this in a future 1.x release. But doing so will be a pain, and I wish I had done this differently from the start.

The current design of Ropey requires that internal nodes and leaf nodes occupy roughly the same number of bytes as each other. I don't think this was a horrible decision, but it did box me in a bit, and I believe Ropey could be made even faster if this requirement were relaxed.

Specifically, if I want to play with the size of the leaf nodes to see how that affects performance, it necessarily also changes the internal node size. And vice-versa. And I have a hunch that the optimal sizes for each are different. However, even if they turn out to be the same, having the flexibility to make them different wouldn't cause any problems, and it makes experimentation easier down the line.

Final Thoughts

My general feeling about good code and good API design is that it's as much about what you don't include as it is about what you do include. And that feeling is certainly reflected in this post: none of these items are things I wish I had added, and two of them are things I wish I hadn't.

But despite what I've written above, I'm very happy with Ropey as a library over-all. Is it perfect? Nope. But I do think it's solidly good, and I believe it's a great choice for anyone building a text editor.

2019 - 01 - 09

Rust Community Norms for Unsafe Code

I recently released Ropey 1.0, a text rope library for Rust. Ropey uses unsafe code internally, and its use of unsafe unsurprisingly came up in the 1.0 release thread on Reddit.

The ensuing discussion (especially thanks to Shnatsel) helped me significantly reduce the amount of unsafe code in Ropey with minimal (though not non-existent) performance degradation. But the whole thing nevertheless got me thinking about unsafe code and community norms around it, and I figured writing some of those thoughts down might be useful.

My hope is this post will be part of a community-wide discussion, and I would love for others (likely smarter than me) to write their thoughts on this topic as well. This post is simply my take on it.

Different Use-cases

I've gotten the impression that, on-the-whole, the Rust community's attitude towards unsafe code is something along the lines of, "Don't ever use it unless you're a wizard... and you're not a wizard." Unsafe code, it seems to me, is seen as this incredibly dangerous thing, and if it's in your crate or application then there's something wrong.

(Having said that, I'm not at all sure whether this impression is actually accurate, but I imagine if I've gotten that impression, many others likely have as well.)

In any case, I absolutely don't see unsafe code that way, and I think this is because my use-cases for Rust are different.

As I understand it, Rust more-or-less began its life at Mozilla with the goal of making browsers more secure. From that standpoint I largely agree with avoiding unsafe code. We've seen time and time again that memory safety bugs are both easy to introduce and a significant attack vector. Eliminating that whole class of bugs is good for security.

However, for my own projects I'm mostly uninterested in the web. I suspect this puts me in a minority, both within the Rust community and within the software development community as a whole. The vast majority of software developed today seems to be web-facing in some way or another. However, the software I'm most interested in writing runs locally on people's computers and is not web-facing. For example, my 3d renderer.

Now, I'm not saying there aren't security concerns with local software. Local software has a way of eventually finding itself on networks (for example, render farms in the case of 3d rendering). Nevertheless, the cost-benefit analysis is different: trading some risk of unsafety for better performance is almost always the right choice for something like an offline 3d renderer.

So for me Rust's memory safety guarantees aren't about security, they're about making software development easier. Tracking down memory safety bugs is a pain—in my experience, far more of a pain than fighting the borrow checker. But if I need to drop into unsafe code in a few places to squeeze more performance out of my renderer, I'm certainly going to do it.

Of course, there are a lot of use-cases where that is absolutely the wrong trade-off, and I suspect that includes most software being written today. But I think it's worth making this distinction: different software has (legitimately) different priorities. And I would like to see more acknowledgment of that in the Rust ecosystem.

Which leads me to the next part of this post...

Marking Crates

Ropey has a prominent notice in its readme about its use of unsafe code. I put it there precisely because of varying use-cases: people who are writing software where security is a high priority probably shouldn't use Ropey, or should only utilize it in a compartmentalized way. On the other hand, people who are writing high performance text editors that can handle crazy-huge files without breaking a sweat...? That's precisely what I designed Ropey for.

Ropey's readme notice—or something like it—is something I would like to see adopted as a norm in the Rust community. Different software has different needs, and I don't think a one-size-fits-all attitude towards unsafe code makes sense. But I do think we all need to be forthright about the priorities of our crates. In fact, that forthrightness probably shouldn't be limited to just the use of unsafe code.

So I guess what I'm saying is: I don't know exactly what I want this to look like, but I would like there to be some generally accepted way for crates to advertise what their priorities are. This would allow people to better choose crates with trade-offs and cost/benefit analysis appropriate to their own projects.

Safety Feature Flag

At the most recent Seattle Rust Meetup I had a great conversation about unsafety with Ivan (don't know his last name, alas, but thanks Ivan!) and he had an excellent idea for Ropey that I think may be more widely applicable: provide a feature flag that switches to an all-safe version of Ropey. In Ropey's case this is pretty straightforward: the unsafe code is well compartmentalized, and could be easily swapped out with safe (but slower / more memory hungry) equivalents based on a feature flag.

I am seriously considering this for Ropey, but I think it can apply to other crates as well. And maybe having a standard flag for this would be good. It obviously wouldn't be useful for every crate, but it would allow crate consumers to make this trade-off themselves whenever a crate does provide the flag.

But Can't We Just Make Safe Code Faster?

One of the things I legitimately love about the Rust ecosystem is the push to get unsafe code compartmentalized into crates with safe APIs. Having a rich set of crates that are working hard to make unsafe code correct, and expose it safely for others, is super awesome.

However, this will never be able to cover all possible situations. Covering reasonably common cases is feasible, but eventually it becomes a game of whac-a-mole. There are always going to be things that won't be covered in the ecosystem, especially the more niche you get.

Moreover, some optimizations are specific to the needs of a holistic design. (John Carmack's talk about systems engineering explains this much better than I can, and is also just a great talk.) And sometimes these sorts of optimizations will require unsafe code, and because of their specificity won't be generally useful as a separate crate.

For these and other reasons, I believe there will always be legitimate cases for using unsafe code in one-off ways. The more we can shrink the number of such cases, the better, for sure. But it will never shrink to zero.

Wrapping Up

I don't mean any of this to be an "excuse" for people using unsafe code haphazardly. Even in my own case, I was able to significantly reduce (though not eliminate) the unsafe code in Ropey with a little prodding from Shnatsel on Reddit. When you're in a performance mindset it's easy to reach for the lowest level before you strictly need to.

But legitimate uses of unsafe code are inevitable, and I think a "know and broadcast your use-case" ethos would be healthier and more useful for the Rust community than a more-or-less "don't use unsafe code" attitude.

2018 - 12 - 12

Rust 2019 - It's the Little Things

(This post is in response to this call for posts on the Rust blog. Also, this post focuses on Rust as a language, because I feel like the crates ecosystem is something that will naturally grow and mature over time as more people use Rust for more use-cases. Also, in most respects I agree strongly with Jonathan Turner's Rust 2019 post, which has a similar flavor to this one.)

This might be an uncommon opinion—especially among those motivated enough to write a Rust 2019 post—but I actually think Rust is pretty much at a good place now. For the kinds of things that I want to do (e.g. my path tracer), there isn't much that Rust is lacking as a language. There are some fiddly things like "placement new" that could be useful, but nothing really major. And of course, well-designed new features are always welcome, they just don't seem particularly critical to me at this point.

In other words, I'm pretty much satisfied. Mission accomplished, as far as I'm concerned. I think the rest is just polish. Just the little things.

But... the little things aren't necessarily little in terms of effort and hours needed to accomplish them. I don't think any of the following is trivial. I mean "little" with respect to shininess and visibility/marketing optics. Also, little != unimportant here.

So without further delay, these are the things I would like to see Rust focus on in 2019.

Type-level Integers (a.k.a. Integers as Generic Arguments)

This paper cut is pretty easy to run into. I was at the Seattle Rust Meetup this month, and while helping out a new rustacian we ended up stumbling over the list of impls for fixed-size arrays in the standard library. It's... silly. Tons of impls that are identical except for varying the length of the array. I felt like I had to apologize to the person I was helping, "Yeah... yeah, it's super stupid. Yeah, I'm sorry. Yeah, you're totally right. There are plans to improve the situation. No, I'm not sure when, or how far along it is. But you can work around it this way."

I realize that allowing values of (almost) any type as generic arguments is a seductive and appealing idea. But from my perspective, Rust already has type-level integer arguments, in the form of fixed-size arrays. It's just that users don't have access to them for their own types, nor for impls of their own traits over arrays. It's a lot like how Go has built-in generic maps, etc. but you can't make your own generic types in it. There's this sort of implicit acknowledgement in the language design that "this is useful to be able to do", but at the same time it doesn't let you do it yourself.

So I think that just focusing on pushing type-level integers across the finish line should be a goal for 2019, without all the complications of more general type-level values. It's a long-standing paper cut that feels like a missing feature, rather than a wish-list new feature. It makes Rust feel inconsistent, and has concrete, negative, practical impacts even in the standard library itself as per the link above.

Expanding Supported Architectures

This isn't directly interesting to me—I'm pretty happy in x86/x64 land. But moving more architectures to the higher support tiers seems like it would be good. Rust is a "systems" programming language, and even though people don't seem to agree what "systems programming" means any more than they agree on what "art" means, I nevertheless think having a rich set of well-supported architectures is worth aiming for in a language like Rust. It would be nice if people didn't have to reach for C or other memory unsafe languages just because they're targeting a different chipset.

Compile Times

This is a boring topic to bring up, but Rust's compile times still leave a lot to be desired. I am really pleased with the progress already made, and in no way mean to belittle that or the efforts that have led to that progress. It's more to say: more please! I think this is actually really important. Performance matters. In fact, caring about performance is one of the reasons for using Rust, right? So I think prioritizing performance and other not-so-shiny things (as opposed to e.g. new features) for 2019 might be worth considering.

In Short...

This is all to say: I think Rust is great, and I'm actually really happy with where it's at now. I realize that my use-cases are not the same as everyone else's (e.g. I don't personally have much use for async/await). But there are a lot of little things that I would like to see some focus put on. That is my hope for 2019: the year of polish. Not shiny new features, not even ergonomics improvements per se as in the previous cycle. But polish on technical items.

Rust 2021 Edition

As for the possible future Rust 2021 Edition... well, this may also be an uncommon opinion, but I hope we don't have one. Frankly, I hope we don't ever have another edition of Rust at all. I hope 2018 is the first and last one.

That may seem harsh, but let me clarify: Rust 2018 was absolutely a good thing, and I'm glad that we did it. But it was also a bad thing, in that it makes the world of Rust more confusing.

This was highlighted for me at the recent Seattle Rust Meetup, where a new user couldn't get his code to compile. I also struggled to figure out why while helping him. Eventually someone else came by and asked, "Is this in Rust 2018?" We checked, and indeed the new project was Rust 2018, but his code was written in Rust 2015 style in a way that caused an error in 2018 (I don't recall how anymore).

Rust 2018 is backwards compatible in the sense that the compiler can still build 2015 code, and even use both 2015 and 2018 crates together. This is an impressive feat, and a great way to move forward with breaking changes. But technological compatibility is not the only important kind of compatibility.

All of the articles and tutorials on Rust, all of the Stack Overflow questions and answers, and really anything written about Rust at all is now not only out of date but incompatible with the user's first out-of-the-box experience. We went through this once already with Rust 1.0. It's not a good thing. It's quite bad, actually. And I would really, really like to avoid doing it again.

So what I really meant above is not that we shouldn't ever have another edition, but rather that we should try not to. If there are good reasons and things we really want to improve, sure, let's go for it. But let's absolutely not take editions as a given thing, and certainly not as a regular thing.

Let's make new editions only when justified--when the benefits outweigh the drawbacks. Not as part of a cadence. And let's try to make them as infrequent as possible. If we can go ten years without a new edition, I think that should be viewed as a good thing. Editions are a tool we can reach for when needed, but they are a tool with a cost.

2018 - 10 - 11

Optimal Anki Settings - Take 2

I discovered a significant error in the simulations I ran in the previous post. So I'd like to present the corrected simulations, and also explain more deeply the assumptions and known weaknesses even in these fixed simulations. I also made the charts easier to use. However, I won't re-explain the background, so please see the previous post for that.

Summary for those who don't want to read the whole thing: Using an interval modifier near 140% is probably a reasonable default for language learning with Anki.

The Error(s)

The major mistake I made was failing to account for the additional reviews that you do after lapsing on a card. This, unsurprisingly, has a huge impact on the results. In fact, the amount of extra time you spend on a lapse has a far more significant impact than the initial time you spend creating and learning a new card.

I also had an error that Matt vs Japan caught in his own work, which is that calculating the total number of cards you know isn't quite as simple as just counting the cards in your deck. There's a forumla that the Supermemo people derived to get a correct approximation of known cards at any given point in time, and I am now using that. This also turns out to have a significant impact on the results.

Assumptions and Limitations

Before I jump into the updated results, I want to lay out more explicitly what the assumptions of my simulations are, as well as some limitations you should be aware of.

Anki Settings

I'll actually cover a few variations in the results, but the simulation's assumption about Anki settings is that you are pretty much using the defaults (aside from the Interval Modifier, of course).

The only exception is the lapse "new interval" setting, which determines how much a card's interval is reduced when you lapse on it. The Anki default is to reset the interval to zero, but that seems like a flagrantly bad choice to me: if you already successfully remembered it just one interval prior, there's no reason to drop you back to square one. Matt vs Japan uses 75% as his default, but that also seems not optimal. If your interval grows by, for example, 10x every time you get it right, then dropping it to 75% seems like it doesn't go far enough. Alternatively, if you're only growing by 1.1x, then 75% drops you back too far.

What I've come up with—and this is totally just hand-waving "this seems reasonable" territory—is to configure it so that lapsing twice will reverse a single success. So, for example, if getting a card right increases the interval 4x, then lapsing will halve the interval, because halving something twice results in 1/4 the size, reversing the 4x. For the curious, the way to calculate that is 1.0 / sqrt(N), where N is how much your interval is multiplied on success (4.0 in the example I just gave).

Other than that, it's all Anki defaults (except where noted in the variations). My reasoning for this is that it's not clear what effect all of the settings have on retention etc., so sticking to the defaults gives us some reasonable confidence that the formulas for the interval modifier will apply with some accuracy.

The Simulated User

There are two properties of the simulated user that affect the simulation:

  • How much time they spend creating and learning a new card.
  • How much time they spend per review.

However, for the graphs the only thing that actually matters is the ratio between these two items. Knowing their absolute magnitudes is unnecessary for calculating optimal efficiency, although it may still be interesting.

The assumption I've used is that creating a new card and doing the initial reviews to learn it take a combined time of 120 seconds (or two minutes). And each individual review after that takes 20 seconds. These seem like reasonable numbers to me. Moreover, except at bizarre extremes (e.g. new cards take almost no time) it doesn't appear that the ratio impacted the graphs significantly.

Length of the Simulation

For all the simulations in this post, I use a simulated time of 365 days (one year). Doing it for fewer days or more days does impact the results some, but basing things on one year seems reasonable to me. If your cards grow older than a year, it's not totally clear to me how much you're really getting out of them—at least in the context of language learning. And studying for significantly less than a year doesn't make sense for seriously learning a language.

General Limitations

I alluded to this earlier, but there are things that these simulations don't account for. The biggest one is that it's not at all clear how the following variables impact retention of cards:

  • Increasing/decreasing the number of reviews for initial learning of new cards.
  • Increasing/decreasing the number of additional reviews after lapses.
  • Increasing/decreasing the lapse "new interval" setting.

I mean, it's pretty clear that increasing the first two and decreasing the latter will improve retention, but it's not at all clear how to quantify that, or create formulas for them (or at least I don't know). This is also a limitation of Matt vs Japan's work, and the simulator that he's now using.

Because of that, the retention impacts of these factors are completely unaccounted for in these simulations. This means that the particular choice used for one of these factors, even though it affects the simulation, does not affect it accurately. For example, you could set the lapse new-interval setting to 100000% (1000x multiplier), so that whenever you lapse a card it will launch its interval into the stratosphere. From the simulation's perspective, that's a massive increase in efficiency, because it assumes the retention rate for that card still stays the same. But that's obviously false in reality—in reality that setting would be roughly equivalent to deleting all lapsed cards from your deck.

That's why I'm generally trying to stick to Anki's defaults, or at least "reasonable" settings. And that also means that all results from not only my simulations, but also from Matt or anybody else, should be treated as fuzzy guides, not as an exact science. There's a lot that we're not accounting for, and it's not totally clear how that affects the simulations.

Results

So with that out of the way, here are the results. As in the last post, here's how to interpret the graphs:

  • The vertical axis is the Interval Modifier setting in Anki.
  • The horizontal axis is your personal retention rate at (roughly) default Anki settings.
  • The white strip is 99+% optimal efficiency.
  • The light gray strip is 95+% optimal efficiency.

"Efficiency" in this case means "cards learned per hour of time studying". And studying includes both reviews and time spent creating and learning new cards.

All of these graphs are normalized, and therefore don't reflect efficiency differences between different graphs. They are also normalized individually within each vertical slice of pixels, and therefore also don't reflect differences in efficiency between personal retention rates. The latter is intentional, as it makes it easy to e.g. find your personal retention rate and determine what interval modifier results in your personal optimal efficiency. See the previous post for more details.

The "Reasonable" Graph

I actually have quite a few graphs this time, but this is the one I think most people should use:

It represents precisely the settings and simulated user I described earlier, and I think is a reasonable graph to work from. In particular, it assumes:

  • An average of two minutes spent creating + learning each new card.
  • An average of 20 seconds per review.
  • A single additional review per lapse.
  • A "max lapse" setting of 8 (cards are suspended when they exceed 8 lapses).
  • The 1.0 / sqrt(N) lapse new-interval setting I described earlier.

The main take-away is that an interval modifier of around 140% gets you 95% efficiency in almost the entire 75%-95% personal retention rate range. Which is pretty great! So I think this is probably a good default. But going up as high as 160% also seems quite reasonable. And if you have especially good retention, even as high as 200% might make sense.

But now that we have the "reasonable" graph out of the way, let's start playing with the settings!

Varying Max Lapses

The above graph changes the "max lapse" setting to 4.

And this one changes the "max lapse" setting to 12.

These graphs together illustrate something useful: increasing your max lapse setting beyond 8 doesn't make much difference in efficiency, but lower numbers definitely do!

It's also worth noting (although not illustrated in the normalized graphs) that decreasing your max lapse setting has a negative impact on your efficiency. In general, with a max lapse setting of 8 or higher, you're at 99+% of optimal efficiency, but a setting of zero slashes your efficiency by a factor of 2, depending on your personal retention rate. A setting of 4 gives you 95+% max efficiency.

As far as the simulation is concerned, increasing your max lapse setting always improves efficiency. I think this makes sense. Although it's not in these simulations, I also did some sims where each card had a slightly different difficulty (i.e. individual retention rate), and the variance between cards had to get pretty huge before it wasn't always beneficial to increase max lapses.

So my takeaway is this: beyond a max lapse setting of 8 doesn't really make a difference to efficiency. But feel free to max out the setting if it makes you feel better.

However, there is also a psychological component to studying, and culling out cards you're having a tough time with might make sense. In that case, a setting of 4 probably makes sense, since it's pretty low but still has a minimal impact on efficiency.

Varying Additional Lapse Reviews

This should be taken with a significant grain of salt, as per the "limitations" section earlier in the article. But here are a few graphs varying how many additional reviews are done when you lapse. The "reasonable graph" earlier is "one additional review".

^ Zero additional reviews.

^ Two additional reviews.

^ Three additional reviews.

The differences are pretty stark. Especially at zero additional reviews, it seems like you can get great efficiency with much larger interval modifiers! However, that seems very suspect to me, because the simulation isn't accounting for how these additional reviews may improve retention.

Having said that, because we don't know exactly how much those additional reviews help, any of these graphs are potentially as valid as the "reasonable" one that I presented. This is an open area for additional work. If anyone has any data about the impact of additional lapse reviews, I would be very interested to see it!

Varying Lapse New Interval

Finally, let's see how varying the lapse "new interval" setting (how much intervals are reduced on lapses) impacts things.

^ Matt's 75% setting.

^ Anki's default 0% setting.

As you can see, this setting also has a notable impact. But just like varying the additional lapse reviews, I have no idea how this impacts things in reality—this simulation doesn't account for important (but currently unknown) factors. So this also needs further study!

Wrap-up

I'm reasonably confident in the results from these simulations, except for the limitations and unknown factors I've described throughout this post. If anyone has any insight around those issues, I would be very interested to hear from you!

But for now, I think it is at least reasonable to go with an interval modifier of %140. If you want to use my lapse "new interval" setting scheme, that corresponds to a lapse "new interval" settings of 53%.