Cessen's Ramblings

2014 - 02 - 14

Ray Reordering and Breadth-First Ray Tracing

As I mentioned in my post about random number generation, I've been working on an experimental 3d renderer named "Psychopath".

I haven't posted much about it, but I would like to change that. Psychopath is easily the largest one-man personal project I've ever undertaken, and it's something I regularly get excited and obsessed about, disappearing into my own little programming world, often not eating for extended periods because I don't want to stop working on it.

So I'm going to start by writing about a core aspect of Psychopath that makes it different from most renderers I've seen out there: ray reordering.

Psychopath is what is commonly referred to as a "path tracer", which means (in over-simplified terms) that it is based on tracing rays of light through the scene, bouncing all over the place.

The traditional way to do this kind of ray tracing is to trace one ray through the scene at a time. This has the advantage that you only have to store one ray at a time in memory, and also has the advantage of a large body of research to draw from for making it as fast as possible. It has the disadvantage, however, of accessing the scene in a very incoherent way.

There are a couple of alternative ways of doing ray tracing, both of which are based around the idea of handling many rays at once, and both of which have many variations (some of which blur the distinction between the two).

The first is called "ray reordering". The basic idea behind ray reordering is to queue many rays to be traced, and adjust the order in which they are actually traced to improve access patterns of the scene. Sometimes this means just reordering the rays based on some criteria before any of them have been traced, and then tracing them in the typical one-at-a-time fashion. But typically ray reordering schemes are more complex, and involve partially tracing rays through the scene and pausing them at strategic points to be resumed later, allowing more fine-grained reordering.

The second is called "breadth-first ray tracing". Breadth-first ray tracing, like ray reordering, also deals with many rays at once. However, while ray reordering is still fundamentally dealing with tracing individual rays through the scene (just rearranged), breadth-first ray tracing deals with an entire batch of rays at every stage of the ray tracing process. So, for example, when traversing the acceleration structure of the scene, breadth-first ray tracing will test every relevant ray against a node when visiting it.

So what does this all have to do with Psychopath? Well, Psychopath has a weird quirk that most ray tracers don't have: Psychopath dynamically dices geometry into sub-pixel geometry before testing rays against them. And it doesn't do this is a preparation stage before rendering starts, it does it on-demand when a ray needs to be tested against a surface. This is much like the Reyes rendering architecture.

The first implementation of Psychopath accomplished this with standard one-ray-at-a-time ray tracing and a geometry cache. Whenever a surface was diced, the result was stored in a fixed-size LRU cache to be used again by later rays. That worked pretty well, but doing things that way has some problems. First, there was a hard performance cliff when the geometry cache wasn't large enough. Especially for more complex scenes, that could become a real problem. Second, since the cache is accessed separately for every single ray, there is a lot of fine-grained locking going on for multi-threading, causing a lot of thread contention.

So the second implementation of Psychopath used ray reordering to allow the cache access (and thus locking) to be amortized over many rays. I used the ray reordering scheme from the paper "Two-level ray tracing with Reordering for Highly Complex Scenes" by Hanika et al. This worked remarkably well at solving both issues. The ray reordering amortized not only cache access over many rays, but also cache misses. Even with a tiny cache that would normally slow rendering to a crawl, this new scheme rendered reasonably quickly.

This method worked really well, right up until I wanted to add object instancing. The trouble with object instancing is that it turns the scene acceleration structure from a tree into a DAG. And the reason that poses an issue is that traversing a DAG requires more state to be maintained. With ray reordering, you have to maintain traversal state for each of a large number of rays. When traversing a tree, you can make that state very small (e.g. this paper). But when traversing a DAG, you have to maintain a stack for each ray, which significantly limits the number of rays that you can hold in RAM at once.

This is where breadth-first ray tracing comes in. With breadth-first ray tracing, you can maintain just a single stack for your entire set of rays. This allows you to easily and efficiently handle instancing, among other things. There are down sides, however. In general, it tends to be notably slower than single-ray tracing or ray reordering (at least with the current research out there). But nevertheless, it seems to fit the goals of Psychopath better than single-ray tracing or ray reordering, so I've decided to go that direction and see how fast I can make it.

I have a basic implementation of breadth-first ray tracing in the Stream Tracing branch of Psychopath. It is indeed slower than ray reordering on most scenes (although there are some interesting exceptions). But I'm hoping I can narrow that gap in performance. We'll see.

2012 - 06 - 26

Pseudo-Random Number Generators

Holy crap it's been a long time since I've blogged here. And you know what? I'm fine with that. I'll blog when I actually have something I want to share. Like now, for example.

So, I've been pecking away at a 3d renderer that I've dubbed "Psychopath". I've put it up on GitHub: https://github.com/cessen/psychopath_cpp

Anyway, I'm discovering all sorts of nifty little things as I'm working on this, many of which are handy and general-purpose. The latest one is a simple pseudo-random number generator that is quite robust.

For most simulation applications (advanced 3d rendering can be considered such) the included random functions in e.g. most C or C++ standard libraries are insufficient. Moreover, it is important to be able to guarantee that the RNG you are using is consistent across platforms and satisfies specific properties of randomness. Therefore it is generally a good idea, in these applications, to include a custom RNG implementation and use that.

The go-to RNG for such purposes these days is the Mersenne Twister, and as such that is the RNG I've been using in Psychopath up to this point. But the code for it is quite complex. And as it turns out, there are other RNG's out there that are pretty much just as good, but have far simpler code.

Several such RNG's are covered in the short and accessible paper "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications" by David Jones. I have opted to use his JKISS generator, which is a variation on George Marsaglia's KISS (Keep It Simple Stupid) RNG.

You can see my implementation here.

The generator has very good statistical properties, passing a large battery of intensive tests as outlined in the paper. In fact, it appears to be just as robust as the Mersenne Twister in almost every respect. The only way in which is it less robust is period length. Mersenne Twister has the enormous period 2^19937-1, whereas JKISS has a period of roughly 2^127. JKISS's period is still very large, however, and is very appropriate for most practical applications, 3d rendering included.

For applications that require larger periods, several other simple-yet-robust generators are also listed in the paper, some of which have enormous periods exceeding even the Mersenne Twister.

2011 - 10 - 09

Middle Schools In Seattle

I have been helping one of Tiffany’s daughters with her math homework recently. She’s currently learning about ratios. It is no wonder to me that she needs help. Not because she isn’t bright. In fact, she is very bright and has a natural knack for math, I would say. Rather, the text book is horrendous.

I am pretty decent at math, and I deal with ratios all the time in computer graphics and animation. But this text book confuses me. And I already know the topics it is covering. How on earth is someone new to the material supposed to find it helpful? And from what I hear, the teachers are extremely hit-or-miss.

Good god. This makes me really angry. The only kids that are going to learn this math are the kids that go off and learn it from completely separate sources. And at that point, what is the point of the school system? It seems like all it’s doing then is taking up valuable time that the kids could otherwise spend actually learning things. And how are kids supposed to develop a love of learning if this is their experience of it? It is sabotaging these kid’s lives.

Public schools need to be better than this. Why aren’t we funding them well, again? Why are we throwing them bone scraps? How do we stop this?

Also, apparently students are only allowed to go to the bathroom during class 3 times a week. Not per class, but for all of school. 3 times a week. How is that okay? If someone needs to go, they need to go. And they only have 5 minutes to get between classes, and their lunch break is 30 minutes. It sounds to me as if they need bladders and colons of steel if they expect to eat and get to their classes on time. Not to mention that is just an oppressive environment. I seriously fear for these kid’s psychological health.

This makes me absolutely livid.

2011 - 05 - 26

DRCS For Content Creation

I love distributed revision control systems (DRCS). Git, Mercurial, Bazaar... fantastic things. And I use DRCS's all the time for code. They have a lot of benefits over centralized models.

Unfortunately, they are not quite so nice for content creation. And there are two main reasons for this:

  1. Media files are often large. A whole project of media files is typically huge. And a bunch of revisions of a project of media files is enormous. And that enormity takes a freakin' huge time to transfer over a network. All of the DRCS's that I am aware of force you to get a copy of the entire repo (excepting for Git, but the result of a partial clone is a repo that can't push/pull). This is pretty much a deal breaker.
  2. Changes to media files cannot be merged. In a distributed system this is particularly problematic. Most RCS's that are used for media (e.g. Perforce) have locking systems to prevent more than one person from editing a file at the same time, which prevents conflicts. This is not possible with DRCS's due to their nature, which makes the ability to merge changes critical.

Point #1 should be fairly straight-forward to deal with if it is a design goal up-front.

Point #2, however, is a lot trickier, especially when you consider that there are a huge number of domain-specific and software-specific file formats out there. I wonder, however, if a plugin (or similar) system could make it workable. In that case, if you wanted to support a format, you would write a plugin that can diff, detect conflicts, and apply non-conflicting diffs for that format.

For example, let's take the simple case of lossless 24-bit RGB bitmaps (e.g. png's, bmp's, etc.). Writing a diff utility for them would actually be pretty painless. For each pixel in the image you just subtract each of the channels. Simple! And to apply the diff, you just add the diff. To detect conflicts, you take diffs of each changeset and compare to see if any pixels have non-zero values in both files. Should any conflict arise, an artist can take both versions into Photoshop or Gimp and combine the two versions however he or she likes.

(As a happy side-effect this could also enable more efficient storage of image data in the repo, since only the diffs would need to be stored. This is especially the case considering that many image compression algorithms make straight diffing far less effective.)

Clearly, manual merges would be necessary at times, probably more often than with code files, especially if you consider common image-wide changes like color balance, levels, resizing, etc. But in theory a plugin could get pretty sophisticated about detecting and merging certain classes of differences.

Of course, images are a fairly simple example, and are also an example that is not so critical. It is not hugely painful for artists to manually merge flat image files if need be, even with a traditional revision control system.

The really interesting bits would be more complex file types. 3D animation files, for example. And that is where the real benefits would come in. You can imagine one person working on one part of an animation, and someone else working on another part, and as long as there are no conflicts, it merges automatically.

But what happens in the case of a conflict? With software-specific files, how could the user do a manual merge? It would be extremely painful, and in many cases not even practical or feasible. So that is kind of an open question, there. But I would hope that if such a DRCS were created, software developers of these other applications would start to be motived to include a "merge" mode or some such thing that would highlight changes with some reasonable level of granularity, and allow users to compare and cherry-pick those changes.

The cool thing about a system like this would be that text files and source files would just become a special case. They are just another file type. In fact, a system like this would have broader implications than just content creation, because there is no reason for it to be limited to media-oriented file types.