Ayende @ Rahien

Refunds available at head office

RavenDB Changes API on the wire

I promised that I’ll talk about the actual implementation details of how RavenDB deal with changes, after moving from SignalR to our own implementation.

First, let us examine the problem space. We need to be able to get notified by the server whenever something interesting happened. We don’t want to do active polling.

That leaves the following options:

  • TCP Connection
  • WebSockets
  • Long Polling
  • Streamed download

TCP Connections won’t work here. We are relying on HTTP for all things, and I like HTTP. It is easy to work with, there are great tools (thanks, Fiddler!) around to do that and you can debug/test/scale it without major hurdles. Writing you own TCP socket server is a lot of fun, but debugging why something went wrong is not.

WebSockets would have been a great options, but they aren’t widely available yet, and won’t work well without special servers, which I currently don’t have.

Long Polling is an option, but I don’t like it. It seems like a waste and I think we can do better.

Finally, we have the notion of a streamed download. This is basically the client downloading from the server, but instead of having the entire request download in one go, the server will send events whenever it has something.

Given our needs, this is the solution that we choose in the end.

How it works is a tiny bit complex, so let us see if I can explain with a picture. This is the Fiddler trace that you see when running a simple subscription test:

image

The very first thing that happens is that we make a request to /changes/events?id=CONNECTION_ID, the server is going to keep this connection open, and whenever it has something new to send to the client, it will use this connection. In order to get this to work, you have to make sure to turn off bufferring in IIS (HttpListener doesn’t do buffering) and when running in Silverlight, you have to disable read buffering. Once that is done, on the client side you need to read from the server in an async manner and raise events whenever you got a full response back.

For our purposes, we used new lines as response marker, so we would read from the stream until we got a new line, raise that event, and move on.

Now, HTTP connections are only good for one request/response. So we actually have a problem here, how do we configure this connection?

We use a separate request for that. Did you note that we have this “1/J0JP5” connection id? This is generated on the client (part an always incrementing number, part random) for each connection id. The first part is a sequential id that is used strict to help us debug things “1st request, 2nd request” are a log easier than J0JP5 or some guid.

We can then issue commands for this connection, in the sample above you can see those commands for watching a particular document and finally stopping altogether.

This is what the events connection looks like:

image

Each change will be a separate line.

Now, this isn’t everything, of course. We still have to deal with errors and network hiccups, we do that by aborting the events connection are retrying. On the server, we keep  track of connections and pending messages for connections, and if you reconnect within the timeout limit (a minute or so), you won’t miss any changes.

If this sounds like the way SignalR works, that is no accident. I think that SignalR is awesome software, and I copied much of the design ideas off of it.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

satish
07/27/2012 10:18 AM by
satish

HttpStreaming may need more resource incase if the server is holding the Request . So its better RavenDB should release a best practice document. it may be evil . Is it possible to switch to Web sockets if the implementation is available.

Ayende Rahien
07/27/2012 10:20 AM by
Ayende Rahien

Satish, Typically, you have one request per client, and it isn't hold a great deal of resources, so I am not worried about it at all.

tobi
07/27/2012 11:13 AM by
tobi

I hope the misspelling in "InitializeConnetion" doesn't go into production as you will be stuck with it like with "Referer" ;-)

Ayende Rahien
07/27/2012 11:23 AM by
Ayende Rahien

Tobi, What mispelling?

Oh, I see this now.

Olav
07/27/2012 12:08 PM by
Olav

Interesting stuff. How do you detect dropped connections if nothing is going across? I mean how will the client know if the server has just disappeared?

Ayende Rahien
07/27/2012 12:12 PM by
Ayende Rahien

Olav, We have heartbeat going on in there.

jonnii
07/27/2012 12:26 PM by
jonnii

this looks very similar to server side events in html5:

http://dsheiko.com/weblog/html5-and-server-sent-events/

Ayende Rahien
07/27/2012 02:23 PM by
Ayende Rahien

Jonnii, That is the idea, and it will be compatible with that when we release.

Shane
07/27/2012 03:28 PM by
Shane

How many concurrent connections do you expect to handle or be able to handle. I'm sure server hardware has a large impact, but on an average webserver is this just background noise or will the perf be noticeable. I'm speaking more in general terms for this approach and not specifically how it works with RavenDB. I'm looking at 5k concurrent connections likely to a single server in my scenario and am wondering if this approach would be feasible.

Ayende Rahien
07/27/2012 03:31 PM by
Ayende Rahien

Shane, Oh, that shouldn't be an issue. It is really cheap to do this sort of things. You might need special config to handle this (remove default concurrent connection limits), but that would be about it. See how SingalR handles this

john
07/27/2012 04:10 PM by
john

Ayende, Its interesting that you decided to roll your own solution. Usually its happening at the moment developer gets fed up with something and say - that's it I write my own!

As for everyone streaming data/notifications though HTTP you do face exactly the same challenges, nothing unique.

Question would be then, why not just use bosh with ejabberd, already proven/stable/scalable OS solution with multitude of extensions?
Is it the same reason which brought you into writing your own solution?

Ayende Rahien
07/27/2012 04:17 PM by
Ayende Rahien

John, See my post about SignalR. I didn't feel like spending more time learning, testing and integrating another library (with the potential of discarding it in the end), where as I could spend a lot less time just building it myself.

Bill
07/29/2012 06:44 AM by
Bill

Seems silly.. Why not just use WebSync from Frozen Mountain? Way more developed than what you're talking about.

Beyers
07/30/2012 11:10 AM by
Beyers

Seeing that you have to configure special settings for IIS "turn off bufferring", does this mean anyone using shared hosting where you do not have administrative privileges on IIS will not be able to use this feature?

Ayende Rahien
07/30/2012 05:03 PM by
Ayende Rahien

Beyers, The custom config is done in code, and requires no config changes on IIS. You can run this on shared hosting

Comments have been closed on this topic.