Ayende @ Rahien

Refunds available at head office

RavenFS & Rsync

One of the things that I considered when starting to design RavenFS is “Do we really need this? Can’t we just use rsync?”

There were several reasons why I decided to go ahead with RavenFS. The first was that rsync isn’t really used on the Windows platform, for various reasons that I won’t get into.

The second is that rsync is meant to be a general purpose file synchronization tool. RavenFS is meant to be more than that.

More specifically, RavenFS is aimed specifically at distribution and synchronization of large files (tens of MB are considered small, hundreds of MB are common and multiple GBs are frequently used). It turns out that very large files mean a whole different kettle of fish. I am going to reference the thesis project of Andrew Tridgell quite frequently for the rest of this post (and you can find the link at the bottom of this post). Andrew is the author of rsync, so he probably knows a thing or two about synchronization problems.

In particular, he had thought about and discarded my approach for synchronizing files in his thesis:

This algorithm is very simple and meets some of the aims of our remote update algorithm, but it is useless in practice
The problem with it is that A can only find matches that are on block boundaries. If the file on A is the same as B except that one byte has been inserted at the start of the file then no block matches will be found and the algorithm will transfer the whole file.

This is absolutely correct. My approach, by design, suffer from the “inserting a single byte at beginning of file”. Why do I take such a stupid approach?

Put simply, because anything else is too expensive. The rsync approach is to compute a hash at byte boundaries, it gets away with that with having multiple hash functions, one of which is very cheap to compute (with higher number of collisions possible) and another that is more expensive to compute but has far lower probability of collisions.

Now, go up and look at the description of RavenFS. It is meant for very large files. Are you really going to perform an operation on every single byte on the file when the file is 2 GB in size?

This is where a very interesting property of file systems come into place. Let us assume that you have the following file, shown as a sequence of bytes:

[1,2,3,4]

And let us say that we want to add the byte 0 at the beginning of the file. For now, we will assume buffer size of 1, we will have to issue the following commands to the file system to do so:

  • Write 0 to position 1
  • Write 1 to position 2
  • Write 2 to position 3
  • Write 3 to position 4
  • Write 4 to position 5 <—increase the file size

In other words, inserting a value (vs modifying a value) is an O( File.Length – File.Position). In other words, the closer to the beginning of the file you are, the more expensive it is to insert a new value.

Ponder that while considering the cost of doing something like that on files that are big. Let us assume a more reasonable file size of 4,096 (the .NET default), and we realize that inserting a value into the beginning of a 500 MB file would require 128 thousand file system operations.

In practice, this isn’t a real issue because most of the time, when we are talking about such large files, we are talking about either files that are using fixed sized records ( no inserting whatsoever ) or appending data to the end of the file. That is mostly because there really isn’t any other choice for them, in order to preserve reasonable performance.

RavenFS is meant to take advantage on this aspect to make the “useless in practice” option a viable option.

On a separate issue, this quote made me laugh:

The lack of such a routine was quite a surprise and prompted me to look into the whole issue of parallel sorting, totally ignoring the fact that I only wanted a parallel sorting routine in order to solve a problem that didn’t really require sorting at all.

~ Andrew Tridgell http://samba.org/~tridge/phd_thesis.pdf

Comments

tobi
05/01/2011 01:34 PM by
tobi

I implemented a backup tool that works like rsync for fun myself. I noticed only in hindsight that either the whole file changes entirely (docx, zip, avi, ... no partial changes) or it changes in blocks (database, truecrypt). Thats for exactly the reason you gave - it is to expensive for the application itself to do it otherwise.

You pragmatism is what makes this blog interesting. You just jettisoned a part of the problem that is of high cost and of low value by not allowing insertions. Its good enough.

Bogdan nedelcu
05/01/2011 05:43 PM by
Bogdan nedelcu

I belive you considered using torrent server. Why aren't torents an option?

Ayende Rahien
05/02/2011 06:26 AM by
Ayende Rahien

John,

Yes, I have.

We had customers complain about issues with those.

I suspect that it had to do with the complexity involving security, domains, configuration, etc.

vigj
05/02/2011 07:05 AM by
vigj

I would pay for a tool like this...but has to be simple to use and efficient!

Scooletz
05/02/2011 10:36 AM by
Scooletz

Will the RavenFS be used as a synchronization tool for some db with an append only model? It looks like a perfect match.

Ayende Rahien
05/02/2011 11:26 AM by
Ayende Rahien

Schooletz,

I haven't considered this, but it certainly can, in a one way fashion.

Alex K
05/02/2011 06:49 PM by
Alex K

I think that with all the press that scalability is getting with all the various DBs and file systems from Amazon, Facebook, Google, MongoDB, etc. that this issue is not a solved one. So either they do have a solution and aren't sharing, or it's so hard that they can't solve it well enough for general case. Watching a one man team beat them would be fun, if nothing else :)

Davidsleeps
05/03/2011 01:02 AM by
Davidsleeps

Are you considering using it to create, sorry for the crappy term but a dropbox-killer?

Steve Py
05/03/2011 08:58 AM by
Steve Py

I guess it would depend on closely analyzing the pattern of modification of typical files the application would be serving. So long as files that use header information have a relatively static sized header (so that changes to the file size don't mean"early" page sizes are invalidated.)

The technology also lends itself to peer-to-peer updates where clients can query not only the master server for the most current pages, but each other as well to take advantage of available bandwidth.

Tapio Kulmala
05/06/2011 06:45 PM by
Tapio Kulmala

Have you considered "deduplication". It's an interesting idea and It could speed up synchronization of some files. It won't help with video files unless the FS or the receiving system already has a copy of the file somewhere else.

highscalability.com/.../...ical-deduplication.html

Ayende Rahien
05/07/2011 08:50 AM by
Ayende Rahien

Tapio,

Yes, of course. Please see the previous post about this topic

Comments have been closed on this topic.