2014 - 02 - 14

Ray Reordering and Breadth-First Ray Tracing

As I mentioned in my post about random number generation, I've been working on an experimental 3d renderer named "Psychopath".

I haven't posted much about it, but I would like to change that. Psychopath is easily the largest one-man personal project I've ever undertaken, and it's something I regularly get excited and obsessed about, disappearing into my own little programming world, often not eating for extended periods because I don't want to stop working on it.

So I'm going to start by writing about a core aspect of Psychopath that makes it different from most renderers I've seen out there: ray reordering.

Psychopath is what is commonly referred to as a "path tracer", which means (in over-simplified terms) that it is based on tracing rays of light through the scene, bouncing all over the place.

The traditional way to do this kind of ray tracing is to trace one ray through the scene at a time. This has the advantage that you only have to store one ray at a time in memory, and also has the advantage of a large body of research to draw from for making it as fast as possible. It has the disadvantage, however, of accessing the scene in a very incoherent way.

There are a couple of alternative ways of doing ray tracing, both of which are based around the idea of handling many rays at once, and both of which have many variations (some of which blur the distinction between the two).

The first is called "ray reordering". The basic idea behind ray reordering is to queue many rays to be traced, and adjust the order in which they are actually traced to improve access patterns of the scene. Sometimes this means just reordering the rays based on some criteria before any of them have been traced, and then tracing them in the typical one-at-a-time fashion. But typically ray reordering schemes are more complex, and involve partially tracing rays through the scene and pausing them at strategic points to be resumed later, allowing more fine-grained reordering.

The second is called "breadth-first ray tracing". Breadth-first ray tracing, like ray reordering, also deals with many rays at once. However, while ray reordering is still fundamentally dealing with tracing individual rays through the scene (just rearranged), breadth-first ray tracing deals with an entire batch of rays at every stage of the ray tracing process. So, for example, when traversing the acceleration structure of the scene, breadth-first ray tracing will test every relevant ray against a node when visiting it.

So what does this all have to do with Psychopath? Well, Psychopath has a weird quirk that most ray tracers don't have: Psychopath dynamically dices geometry into sub-pixel geometry before testing rays against them. And it doesn't do this is a preparation stage before rendering starts, it does it on-demand when a ray needs to be tested against a surface. This is much like the Reyes rendering architecture.

The first implementation of Psychopath accomplished this with standard one-ray-at-a-time ray tracing and a geometry cache. Whenever a surface was diced, the result was stored in a fixed-size LRU cache to be used again by later rays. That worked pretty well, but doing things that way has some problems. First, there was a hard performance cliff when the geometry cache wasn't large enough. Especially for more complex scenes, that could become a real problem. Second, since the cache is accessed separately for every single ray, there is a lot of fine-grained locking going on for multi-threading, causing a lot of thread contention.

So the second implementation of Psychopath used ray reordering to allow the cache access (and thus locking) to be amortized over many rays. I used the ray reordering scheme from the paper "Two-level ray tracing with Reordering for Highly Complex Scenes" by Hanika et al. This worked remarkably well at solving both issues. The ray reordering amortized not only cache access over many rays, but also cache misses. Even with a tiny cache that would normally slow rendering to a crawl, this new scheme rendered reasonably quickly.

This method worked really well, right up until I wanted to add object instancing. The trouble with object instancing is that it turns the scene acceleration structure from a tree into a DAG. And the reason that poses an issue is that traversing a DAG requires more state to be maintained. With ray reordering, you have to maintain traversal state for each of a large number of rays. When traversing a tree, you can make that state very small (e.g. this paper). But when traversing a DAG, you have to maintain a stack for each ray, which significantly limits the number of rays that you can hold in RAM at once.

This is where breadth-first ray tracing comes in. With breadth-first ray tracing, you can maintain just a single stack for your entire set of rays. This allows you to easily and efficiently handle instancing, among other things. There are down sides, however. In general, it tends to be notably slower than single-ray tracing or ray reordering (at least with the current research out there). But nevertheless, it seems to fit the goals of Psychopath better than single-ray tracing or ray reordering, so I've decided to go that direction and see how fast I can make it.

I have a basic implementation of breadth-first ray tracing in the Stream Tracing branch of Psychopath. It is indeed slower than ray reordering on most scenes (although there are some interesting exceptions). But I'm hoping I can narrow that gap in performance. We'll see.