Get thou out of my head, damn idea

Oct 08 2012

Get thou out of my head, damn idea

time to read 10 min | 1945 words

Sometimes I get ideas, and they just won’t leave my head no matter what I do.

In this case, I decided that I wanted to see what it would take to implement an event store in terms of writing a fully managed version.

I am not really interested in the actual event store, I care a lot more about the actual implementation idea that I had (I/O queues in append only mode, if you care to know).

After giving it some though, I managed to create a version that allow me to write the following code:

   1: var diskData = new OnDiskData(new FileStreamSource(), "Data");

2:

   3: var data = JObject.Parse("{'Type': 'ItemCreated', 'ItemId': '1324'}");

   4: var sp = Stopwatch.StartNew();

   5: Parallel.For(0, 1000*10, i =>

   6:     {

   7:         var tasks = new Task[1000];

   8:         for (int j = 0; j < 1000; j++)

   9:         {

  10:             tasks[j] = diskData.Enqueue("users/" + i, data);

  11:         }

  12:         Task.WaitAll(tasks);

  13:     });

14:

  15: Console.WriteLine(sp.ElapsedMilliseconds);

Admittedly, it isn’t a really interesting client code, but it is plenty good enough for what I need, and it allowed me to check something really interesting, just how hard would I have to go to actually get really good performance. As it turned out, not that far.

This code writes 10 million events, and it does so in under 1 minutes (on my laptop, SSD drive). Just to give you some idea, that is > 600 Mb of events, and about 230 events per milliseconds or about 230 thousands events per second. Yes, that is 230,000 events / sec.

The limiting factor seems to be the disk, and I have some ideas on how to implement that. I still got roughly 12MB/s, so there is certainly room for improvement.

How does this work? Here is the implementation of the Enqueue method:

   1: public Task Enqueue(string id, JObject data)

   2: {

   3:     var item = new WriteState

   4:         {

   5:             Data = data,

   6:             Id = id

   7:         };

8:

   9:     writer.Enqueue(item);

  10:     hasItems.Set();

  11:     return item.TaskCompletionSource.Task;

  12: }

In other words, this is a classic producer/consumer problem.

The other side is reading the events from the queue and writing them to disk. There is just one thread that is doing that, and it is always appending to the end of the file. Moreover, because of the way it works, we are actually gaining the ability to batch a lot of them together into a stream of really nice IO calls that optimize the actual disk access. When we finished with a batch of items and flushed them to disk, only then are we going to complete the task, so the fun part is that for all intents and purposes, we are doing that while preserving transactionability of the system. Once the Enqueue task returned, we can be sure that the data is fully saved on disk.

That was an interesting spike, and I wonder where else I would be able to make use of something like this in the future.

Yes, those are pretty small events, and yes, that is a fake test, but the approach seems to be very solid.

And just for fun, with absolutely no optimizations what so ever, no caching, no nothing, I am able to load 1,000 events per stream in less than 10 ms.

0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB