Cessen's Ramblings

2011 - 05 - 26

DRCS For Content Creation

I love distributed revision control systems (DRCS). Git, Mercurial, Bazaar... fantastic things. And I use DRCS's all the time for code. They have a lot of benefits over centralized models.

Unfortunately, they are not quite so nice for content creation. And there are two main reasons for this:

  1. Media files are often large. A whole project of media files is typically huge. And a bunch of revisions of a project of media files is enormous. And that enormity takes a freakin' huge time to transfer over a network. All of the DRCS's that I am aware of force you to get a copy of the entire repo (excepting for Git, but the result of a partial clone is a repo that can't push/pull). This is pretty much a deal breaker.
  2. Changes to media files cannot be merged. In a distributed system this is particularly problematic. Most RCS's that are used for media (e.g. Perforce) have locking systems to prevent more than one person from editing a file at the same time, which prevents conflicts. This is not possible with DRCS's due to their nature, which makes the ability to merge changes critical.

Point #1 should be fairly straight-forward to deal with if it is a design goal up-front.

Point #2, however, is a lot trickier, especially when you consider that there are a huge number of domain-specific and software-specific file formats out there. I wonder, however, if a plugin (or similar) system could make it workable. In that case, if you wanted to support a format, you would write a plugin that can diff, detect conflicts, and apply non-conflicting diffs for that format.

For example, let's take the simple case of lossless 24-bit RGB bitmaps (e.g. png's, bmp's, etc.). Writing a diff utility for them would actually be pretty painless. For each pixel in the image you just subtract each of the channels. Simple! And to apply the diff, you just add the diff. To detect conflicts, you take diffs of each changeset and compare to see if any pixels have non-zero values in both files. Should any conflict arise, an artist can take both versions into Photoshop or Gimp and combine the two versions however he or she likes.

(As a happy side-effect this could also enable more efficient storage of image data in the repo, since only the diffs would need to be stored. This is especially the case considering that many image compression algorithms make straight diffing far less effective.)

Clearly, manual merges would be necessary at times, probably more often than with code files, especially if you consider common image-wide changes like color balance, levels, resizing, etc. But in theory a plugin could get pretty sophisticated about detecting and merging certain classes of differences.

Of course, images are a fairly simple example, and are also an example that is not so critical. It is not hugely painful for artists to manually merge flat image files if need be, even with a traditional revision control system.

The really interesting bits would be more complex file types. 3D animation files, for example. And that is where the real benefits would come in. You can imagine one person working on one part of an animation, and someone else working on another part, and as long as there are no conflicts, it merges automatically.

But what happens in the case of a conflict? With software-specific files, how could the user do a manual merge? It would be extremely painful, and in many cases not even practical or feasible. So that is kind of an open question, there. But I would hope that if such a DRCS were created, software developers of these other applications would start to be motived to include a "merge" mode or some such thing that would highlight changes with some reasonable level of granularity, and allow users to compare and cherry-pick those changes.

The cool thing about a system like this would be that text files and source files would just become a special case. They are just another file type. In fact, a system like this would have broader implications than just content creation, because there is no reason for it to be limited to media-oriented file types.