Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,566
|
Comments: 51,185
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 785 words

This can be a very short post, just: See CAP. Unfortunately, we have a lot of people who actually have experience in using distributed transactions, and have a good reason to believe that they work. The answer is that yes, they do, as long as you don’t run into some of the interesting edge cases.

By the way, that is not a new observation, see The Two Generals.

Allow me to demonstrate. Assume that we have a distributed system with the following actors:

image

This is a fairly typical setup. You have a worker that pull messages from a queue and read/write to a database based on those messages. To coordinate between them, it uses a transaction coordinator such as MSDTC.

Transaction coordinators use a two phase commit (or sometimes a three phase commit protocols) to ensure that either all the data would be committed, or none of it would be.

The general logics goes like this:

  • For each participant in the transaction, send a prepare message.
    • If the participant answered “prepared”, it is guaranteeing that the transaction can be committed.
  • If any of the participants failed on prepare, abort the whole transaction.
  • If all of the participants successfully prepared, send the commit message to all of them.

The actual details are a bit more involved, obviously, but that is pretty much it.

Now, let us take a look at an interesting scenario. Worker #1 is pulling (in a distributed transaction) a message from the queue, and based on that message, it modify the database. Then it tells the transaction coordinator that it can commit the transaction. At this point, the TC is sending the prepare message to the database and the queue. Both reply that they have successfully prepared the transaction to be committed. The TC sends a commit message to the queue, completing the transaction from its point of view, however, at this point, it is unable to communicate with the database.

What happens now?

Before I answer that, let us look at another such scenario. The TC needs to commit a transaction, it sends a prepare message to the database, and receive a successful reply. However, before it manages to send a prepare message to the queue, it becomes unavailable.

Note that from the point of view of the database, the situation looks exactly the same. It got (and successfully replied) to a Prepare message, then it didn’t hear back from the transaction coordinator afterward.

Now, that is where it gets interesting. In an abstract world, we can just wait with the pending transaction until the connection with the coordinator is resumed, and we can actually get a “commit / abort” notification.

But we aren’t in abstract world. When we have such a scenario, we are actually locking records in the database (because they are in the process of being modified). What happens when another client comes to us and want to modify the same record?

For example, it is quite common for to host the business logic, queue and transaction coordinator on the same worker instance, while the database is on a separate machine. That means that in the image above, if Worker #1 isn’t available, we recover by directing all the users to the 2nd worker. However, at that point, we have a transaction that was prepared, but not committed.

When the user continue to make requests to our system, the 2nd worker, which has its own queue and transaction coordinator is going to try and access the user’s record. The same user whose record are currently locked because of the ongoing transaction.

If we just let it hang in this manner, we have essentially created a situation where the user’s data become utterly unavailable (at least for writes). In order to resolve that, transactions comes with a timeout. So after the timeout has expired, we can roll back that transaction. Of course, that leads to a very interesting situation.

Let us go back to the first scenario we explored. In this scenario, the queue got both Prepare & Commit messages, while the database got just a Prepare message. The timeout has expired, and the database has rolled back the transaction.  In other words, as far as the queue is concerned, the transaction committed, and the message is gone. As far as the database is concerned, that transaction was rolled back, and never happened.

Of course, the chance that something like that can happen in one of your systems? Probably one in a million.

time to read 4 min | 683 words

I have an interesting situation that I am not sure how to resolve. We need to record the last request time for a RavenDB database. Now, this last request time is mostly used to show the user, and to decide when a database is idle, and can be shut down.

As such, it doesn’t have to be really accurate ( a skew of even a few seconds is just fine ). However, under load, there are many requests coming in (in the case presented to us, 4,000 concurrent requests), and they all need to update the last request.

Obviously, in such a scenario, we don’t really care about the actual value. But the question is, how do we deal with that? In particular, I want to avoid a situation where we do a lot of writes to the same value in an unprotected manner, mostly because it is likely to cause contentions between cores.

Any ideas?

It is actually fine for us to go slightly back (so thread A at time T+1 and thread B at time T+2 running concurrently, and the end result is T+1), which is why I said that a race is fine for us. But what I want to avoid is any form of locking / contention.

I wrote the following test code:

class Program
{
    static void Main(string[] args)
    {
        var threads = new List<Thread>();

        var holder = new Holder();

        var mre = new ManualResetEvent(false);

        for (int i = 0; i < 2500; i++)
        {
            var thread = new Thread(delegate()
            {
                mre.WaitOne();
                for (long j = 0; j < 500*1000; j++)
                {
                    holder.Now = j;
                }
            });
            thread.Start();
            threads.Add(thread);
        }

        mre.Set();

        threads.ForEach(t => t.Join());


        Console.WriteLine(holder.Now);
    }
}

public class Holder
{
    public long Now;
}

And it looks like it is doing what I want it to. This creates a lot of contention on the same value, but it is also giving me the right value. And again, the value of right here is very approximate. The problem is that I know how to write thread safe code, I’m not sure if this is a good way to go about doing this.

Note that this code (yes, even with 2,500 threads) runs quite fast, in under a second. Trying to use Interlocked.Exchange is drastically more expensive, and Interlocked.CompareExchange is even worse.

But it is just not sitting well with me.

time to read 2 min | 249 words

In my previous post, I did some rough “benchmarks” to see how message passing options behave. I got some great comments, and I thought I’ll expand on that.

The baseline for this was a blocking queue, and we managed to process using that we managed to get:

145,271,000 msgs in 00:00:10.4597977 for 13,888,510 ops/sec

And the async BufferBlock, using which we got:

43,268,149 msgs in 00:00:10 for 4,326,815 ops/sec.

Using LMAX Disruptor we got a disappointing:

29,791,996 msgs in 00:00:10.0003334 for 2,979,100 ops/sec

However, it was pointed out that I can significantly improve this if I changed the code to be:

var disruptor = new Disruptor.Dsl.Disruptor<Holder>(() => new Holder(), new SingleThreadedClaimStrategy(256), new YieldingWaitStrategy(), TaskScheduler.Default);

After which we get a very nice:
141,501,999 msgs in 00:00:10.0000051 for 14,150,193 ops/sec
Another request I got was for testing this with a concurrent queue, which is actually what it is meant to do. The code is actually the same as the blocking queue, we just changed Bus<string> to ConcurrentQueue<string>.
 
Using that, we got:
170,726,000 msgs in 00:00:10.0000042 for 17,072,593 ops/sec
And yes, this is pretty much just because I could. Any of those methods is quite significantly higher than anything close to what I actually need.
time to read 7 min | 1324 words

I got some replies about the async event loop post, mentioning LMAX Disruptor and performance. I decided to see for myself what the fuss was all about.

You can read about the LMAX Disruptor, but basically, it is a very fast single process messaging library.

I wondered what that meant, so I wrote my own messaging library:

public class Bus<T>
{
Queue<T> q = new Queue<T>();

public void Enqueue(T msg)
{
lock (q)
{
q.Enqueue(msg);
}
}

public bool TryDequeue(out T msg)
{
lock (q)
{
if (q.Count == 0)
{
msg = default(T);
return false;
}
msg = q.Dequeue();
return true;
}
}
}

I think that you’ll agree that this is a thing of beauty and elegant coding. I then tested this with the following code:

public static void Read(Bus<string> bus)
{
int count = 0;
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
string msg;
while (bus.TryDequeue(out msg))
{
count++;
}
}
sp.Stop();

Console.WriteLine("{0:#,#;;0} msgs in {1} for {2:#,#} ops/sec", count, sp.Elapsed, (count / sp.Elapsed.TotalSeconds));
}

public static void Send(Bus<string> bus)
{
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
for (int i = 0; i < 1000; i++)
{
bus.Enqueue("test");
}
}
}

var bus = new Bus<string>();

ThreadPool.QueueUserWorkItem(state => Send(bus));

ThreadPool.QueueUserWorkItem(state => Read(bus));

The result of this code?

145,271,000 msgs in 00:00:10.4597977 for 13,888,510 ops/sec

Now, what happens when we use the DataFlow’s BufferBlock as the bus?

public static async Task ReadAsync(BufferBlock<string> bus)
{
int count = 0;
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
try
{
await bus.ReceiveAsync(TimeSpan.FromMilliseconds(5));
count++;
}
catch (TaskCanceledException e)
{
}
}
sp.Stop();

Console.WriteLine("{0:#,#;;0} msgs in {1} for {2:#,#} ops/sec", count, sp.Elapsed, (count / sp.Elapsed.TotalSeconds));
}

public static async Task SendAsync(BufferBlock<string> bus)
{
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
for (int i = 0; i < 1000; i++)
{
await bus.SendAsync("test");
}
}
}

What we get is:

43,268,149 msgs in 00:00:10 for 4,326,815 ops/sec.

I then decided to check what happens with the .NET port of the LMAX Disruptor. Here is the code:

public class Holder
{
public string Val;
}

internal class CounterHandler : IEventHandler<Holder>
{
public int Count;
public void OnNext(Holder data, long sequence, bool endOfBatch)
{
Count++;
}
}

static void Main(string[] args)
{
var disruptor = new Disruptor.Dsl.Disruptor<Holder>(() => new Holder(), 1024, TaskScheduler.Default);
var counterHandler = new CounterHandler();
disruptor.HandleEventsWith(counterHandler);

var ringBuffer = disruptor.Start();


var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
for (var i = 0; i < 1000; i++)
{
long sequenceNo = ringBuffer.Next();

ringBuffer[sequenceNo].Val = "test";

ringBuffer.Publish(sequenceNo);
}
}
Console.WriteLine("{0:#,#;;0} msgs in {1} for {2:#,#} ops/sec", counterHandler.Count, sp.Elapsed, (counterHandler.Count / sp.Elapsed.TotalSeconds));
}

And the resulting performance is:

29,791,996 msgs in 00:00:10.0003334 for 2,979,100 ops/sec

Now, I’ll be the first to agree that this is really and absolutely not even close to be a fair benchmark. It is testing wildly different things. Distruptor is using a ring buffer, and the BlockBuffer didn’t, and the original Bus implementation just used an unbounded queue.

But that is a very telling benchmark as well. Pretty much because it doesn’t matter. What I need this for is for network protocol handling. As such, even assuming that every single byte is a message, we would have to go far beyond what any reasonable pipe can be expected to be handle.

time to read 4 min | 656 words

I’m designing a new component, and I want to reduce the amount of complexity involved in dealing with it. This is a networked component, and after designing several such, I wanted to remove one area of complexity, which is the use of explicitly concurrent code. Because of that, I decided to go with the following architecture:

image

 

The network code is just reading messages from the network, and putting them in an in memory queue. Then we have a single threaded event loop that simply goes over the queue and process those messages.

All of the code that is actually processing messages is single threaded, which make it oh so much easier to work with.

Now, I can do this quite easily with a  BlockingCollection<T>, which is how I usually did those sort of things so far. It is simple, robust and easy to understand. It also tie down a full thread for the event loop, which can be a shame if you don’t get a lot of messages.

So I decided to experiment with async approaches. In particular, using the BufferBlock<T> from the DataFlow assemblies.

I came up with the following code:

var q = new BufferBlock<int>(new DataflowBlockOptions
{
CancellationToken = cts.Token,
});

This just create the buffer block, but the nice thing here is that I can setup a “global” cancellation token for all operations on this. The problem is that this actually generate bad exceptions (InvalidOperationException, instead of TaskCancelledException). Well, I’m not sure if bad is the right term, but it isn’t the one I would expect here, at least. If you pass a cancellation token directly to the method, you get the behavior I expected.

At any rate, the code for the event loop now looks like this:

private static async Task EventLoop(BufferBlock<object> bufferBlock, CancellationToken cancellationToken)
{
while (true)
{
object msg;
try
{
msg = await bufferBlock.ReceiveAsync(TimeSpan.FromSeconds(3), cancellationToken);
}
catch (TimeoutException)
{
NoMessagesInTimeout();
continue;
}
catch (Exception e)
{
break;
}
ProcessMessage(msg);
}
}

And that is pretty much it. We have a good way to handle timeouts, and processing messages, and we don’t take up a thread. We can also be easily cancelled. I still need to run this through a lot more testing, in particular, to verify that this doesn’t cause issues when we need to debug this sort of system, but it looks promising.

RavenDB Events

time to read 1 min | 180 words

In the following months, there are going to be quite a few RavenDB Events, and I have been remiss in talking about them.

  • The Triangle RavenDB User Group is meeting in Raleigh. Jul 30, you can register here.
  • Mauro is going to be giving a 3 days course in RavenDB in London. Aug 11 – Aug 13, you can register here.
  • The Arizona RavenDB User Group is meeting in Scottsdale. Aug 12, you can register here.
  • We are going to be giving 2 full days events in Sweden. Sep 18 – Sep 19, you can register here.
  • I’m also going to be speaking in NSB Conf in New York City. Sep 29 – Sep 30, you can register here.
  • The RavenDB in Action should come out in October. You can purchase the early access copy here.
  • We are going to show up for Oredev with a lot of exciting stuff. Nov 4 – Nov 7, you can register here.

We also have a couple of surprises, but I’ll keep them for later.

time to read 2 min | 268 words

About 6 years ago, I started writing the DSLs in Boo book, it came out in 2010, and today I got an email saying that this is now officially out of print. It was never a hugely popular book, so I’m not really surprised, but it really got me thinking.

I got to build several DSLs for production during the time I was writing this book, but afterward, I pretty much pivoted hard to RavenDB, and didn’t do much with DSLs since. However, the knowledge acquired during the writing of this book has actually been quite helpful when writing RavenDB itself.

I’m not talking about the design aspects of writing a DSLs, or the business decisions that are involved with that, although that is certainly a factor. I’m talking about the actual technical details of working with a language, a parser, etc.

In fact, you won’t see that, probably, but RavenDB indexes and transformers are actually DSLs, and they use a lot of the techniques that I talk about in the book. We start with something that looks like a C# code, but what ends up running is actually something that is far different. The Linq provider, too, rely heavily on those same techniques. We show you one thing but actually do something quite different under the cover.

It is interesting to see how the actual design of RavenDB was influenced by what my own history and the choices I made in various places. If I wasn’t well versed with abusing a language, I would probably have to go with something like CouchDB’s views, for example.

time to read 2 min | 280 words

We are getting closer & closer for a 3.0 release. And as I mentioned, we are doing a lot of UI work. I’m quite excited about that, even though I don’t think you’ll be aware of all of the changes that we are adding.

But here is one that will be very visible, the new theme for the studio. So far, we have gone with the default theme, and pushed the actual design for later. Now we have some options to consider:

Dark:

dark

Darkly:

darkly

Flatly:

Flatly

Lumen:

lumen
RClassic

RClassic
Space lab:

space lab

 

What do you think is best?

time to read 1 min | 94 words

After the success of the RavenDB conference, we have teamed with Oredev to bring the RavenDB conference to Europe.

 

I’m very pleased to announce that we’ll be doing 2 full day events, in Malmo and in Stockholm in September. You can read all about it here.

We’re going to have talks by core team members, customer stories and actual on field experiences.

Looking forward to seeing you there…

time to read 2 min | 378 words

It is silly, but I just had a conversation with one of our developers on SQL Injection. In RavenDB we support replicating to a relational database, which obviously require using SQL. We are doing things properly, with parameters and everything.

No chance for SQL Injection from there. Great, and end of a very short blog post if it was everything.

As it turned out, there is a significant performance difference between:

@p1 = 'users/1'
@p2 = 'users/2'

DELETE FROM Users WHERE Id IN (@p1,@p1)

And:

DELETE FROM Users WHERE Id IN ('users/1', 'users/2')

Enough that we added that as an option. The reason why related to the vagaries of the database query optimizer, and not really relevant.

This is off by default, obviously. And we use parameters by choice & preference. But we still added a minimal “protection” by adding:

sqlValue.Replace("'", "''")

Considering that this isn’t meant for user’s input (it is for document ids), that is something that is annoying.

Any suggestions on how to improve this?

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  2. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  3. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
  4. RavenDB 7.0 Released (4):
    07 Mar 2025 - Moving to NLog
  5. Challenge (77):
    03 Feb 2025 - Giving file system developer ulcer
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}