Ayende @ Rahien

Refunds available at head office

RavenDB Multi GET support

One of the annoyances of HTTP is that it is not really possible to make complex queries easily. To be rather more exact, you can make a complex query fairly easily, but at some point you’ll reach the URI limit, and worse, there is no easy way to make multiple queries in a single round trip.

I have been thinking about this a lot lately, because it is a stumbling block for a feature that is near and dear to my heart, the Future Queries feature that is so useful when using NHibernate.

The problem was that I couldn’t think of a good way of doing this. Well, I could think of how to do this quite easily, to be truthful. I just couldn’t think of a good way to make this work nicely with the other features of RavenDB.

In particular, it was hard to figure out how to deal with caching. One of the really nice things about RavenDB’s RESTful nature is that caching is about as easy as it can be. But since we need to tunnel requests through another medium for it to work, I couldn’t figure out how to make this work in a nice fashion. And then I remembered that REST didn’t actually have anything to do with HTTP itself, you can do REST on top of any transport protocol.

Let us look at how requests are handled in RavenDB over the wire:

GET http://localhost:8080/docs/bobs_address

HTTP/1.1 200 OK

{
  "FirstName": "Bob",
  "LastName": "Smith",
  "Address": "5 Elm St."
}

GET http://localhost:8080/docs/users/ayende

HTTP/1.1 404 Not Found

As you can see, we have 2 request / reply calls.

What we did in order to make RavenDB support multiple requests in a single round trip is to build on top of this exact nature using:

POST http://localhost:8080/multi_get

[
   { "Url": "http://localhsot:8080/docs/bobs_address", "Headers": {} },
   { "Url": "http://localhsot:8080/docs/users/ayende", "Headers": {} },
]

HTTP/1.1 200 OK

[
  { "Status": 200, "Result": { "FirstName": "Bob", "LastName": "Smith", "Address": "5 Elm St." }},
  { "Status": 404 "Result": null },
]

Using this approach, we can handle multiple requests in a single round trip.

You might not be surprised to learn that it was actually very easy to do, we just needed to add an endpoint and have a way of executing the request pipeline internally. All very easy.

The really hard part was with the client, but I’ll touch on that in my next post.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Steven Robbins
08/18/2011 10:05 AM by
Steven Robbins

Out of interest, does this gain enough compared to multiple pipelined http 1.1 requests to be worth the "cost" of losing upstream and/or reverse proxy caching that you'd get with GETs?

njy
08/18/2011 10:51 AM by
njy

Oren, any inspiration taken from https://developers.facebook.com/docs/reference/api/batch/ ? Not that it is a bad thing really.

tobi
08/18/2011 11:36 AM by
tobi

With this approach you cannot rely on any built-in HTTP caching functionality (if you are using it), because the framework does not understand you custom protocol. It is basically like using a custom binary protocol - nothing built-in.

Brian Vallelunga
08/18/2011 12:05 PM by
Brian Vallelunga

When/if there's more broad support for it, perhaps Raven could utilize Google's SPDY protocol in addition to HTTP. I believe multiple calls per connection are one its main benefits.

Here's the link: http://www.chromium.org/spdy/spdy-whitepaper

Ayende Rahien
08/18/2011 12:59 PM by
Ayende Rahien

Steven, That assumes that you have a way proxy there. You don't usually have a proxy between the app server and the db server. We actually handling caching of this internally pretty well, so I don't think that is an issue. If you do have a proxy somewhere, you can make that decision on your own, based on real perf numbers.

Ayende Rahien
08/18/2011 01:00 PM by
Ayende Rahien

Njy, No, first time I hear about this, but it is really about the only thing that makes sense, to be truthful

Ayende Rahien
08/18/2011 01:02 PM by
Ayende Rahien

Tobi, Not really, no. If you have a proxy involved, that would bypass it, yes. But RavenDB handles the protocol at both ends and make it appear just like HTTP, that includes caching. A child request can send a 304, for example, which will be correctly processed

Ayende Rahien
08/18/2011 01:04 PM by
Ayende Rahien

SPDY is interesting, but as this is the very first time I heard about it, I guess it isn't really ready to be used yet.

Steven Robbins
08/18/2011 01:13 PM by
Steven Robbins

@Ayende no it doesn't, it just assumes you're not stopping HTTPs "built in" caching support form working, should you want to use it.

You may have excellent caching in RavenDb, but that's missing the point - this breaks the semantics of the web and my question was is that "cost" worth it for whatever gains you are getting?

The answer may well be "yes", but that doesn't make the question "pointless" if you don't have a proxy :-)

Ayende Rahien
08/18/2011 01:21 PM by
Ayende Rahien

Steven, I don't really understand what you mean here. When I am talking about RavenDB caching, I am talking about the low level HTTP cache, we extended that to also support multi_get, but that is it. All the semantics are still the same. And the cache level is on the request, not on the batch.

Sure, you can't do that through a proxy, but that is the only limitation.

Brian Vallelunga
08/18/2011 01:35 PM by
Brian Vallelunga

SPDY was announced a while ago and in fact Google is already using it when you use Chrome and their services. Check out this thread:

http://groups.google.com/group/spdy-dev/browse_thread/thread/4c2396ecbc36b1c4

Chris Wright
08/18/2011 04:26 PM by
Chris Wright

Two points against SPDY for RavenDB: 1. No .NET client. Ayende probably doesn't want to take the time to implement a new protocol stack. 2. The name makes me think "spidey", not "speedy".

Justin
08/18/2011 04:57 PM by
Justin

I am wondering the same thing as Steven, how is this better than Persistent/Pipelined connection that HTTP 1.1 introduced?

I still see two request and two responses it's just the order of events. Either way with HTTP 1.1 only one TCP Connection will be used.

SPDY allows for multiple concurrent requests per TCP connection, but this implementation of "http in http" encoded as json looks to be a FIFO just like HTTP 1.1 pipelining.

Perhaps I am just missing something.

Ayende Rahien
08/18/2011 06:28 PM by
Ayende Rahien

Justin, While you have a single TCP connection using pipelined mode, it is still a request/reply mode. So you have to send a request, wait for reply, etc. In this mode, you send a single request to the server, get a single reply back, but you get it at once.

This means that you save on round trips, not on connections

Justin
08/18/2011 07:13 PM by
Justin

Looks to me like your sending two request, and receiving two replies either way:

request1 request2 reply1 reply2

instead of:

request1 reply1 request2 reply2

Either way the same amount of data is sent/received, a single TCP connection is used and the same total time spent, so the efficiency gains are what, are the serialization/deserialization parts expensive to setup tear down on the client/server?

The way it would be exposed in the client API could still look like a single request/reply it's really just how the request/reply is encoded on the single TCP connection at that point.

It would interesting to see how much faster your implementation is over just pipelined http 1.1, and where the extra overhead is coming from.

Colin Jack
08/18/2011 09:22 PM by
Colin Jack

When rhino receives the batch does it send off the contained http requests?

tobi
08/18/2011 10:10 PM by
tobi

Ayende, I seemingly misunderstood your implementation. If you own the caching layer, you can of course interpret your response as you see fit. I thought you were using the caching layer of something else.

But if you own the caching layer, what was the problem to begin with? HTTP is just an implementation detail in this case.

Ayende Rahien
08/18/2011 10:32 PM by
Ayende Rahien

Justin, The problem is saving the round trip. Let us take the case of grocery shopping as an example, you need to buy milk & sugar.

HTTP 1.x (making two requests using two separate tcp connections) means that you have to leave the house, get to the store, get the milk, pay, go home, pay, milk in fridge, go back to the store, get the sugar, pay, put sugar in cupboard.

HTTP 1.1 (making two requests using a single tcp connection) means that you have to leave the house, go to the store, use the drive in window to get the milk, drive home, drop the milk off without getting out of the car, drive back to the store, get the sugar, drive home again.

Multi GET approach (making a single request) means that you go to the store, pick up milk & sugar and go home.

The major difference is that we only have to go to the server once. Where as even with HTTP 1.1, using a single tcp connection, you have to go to the db multiple times.

Ayende Rahien
08/18/2011 10:32 PM by
Ayende Rahien

Colin, There are no rhinos involved :-) And I don't understand the question

Ayende Rahien
08/18/2011 10:33 PM by
Ayende Rahien

Tobi, The major difference is how you detect changes. We use the HTTP methods to do that 304, etags, etc. We basically implemented HTTP caching as part of RavenDB client API.

Vadi
08/22/2011 04:01 AM by
Vadi

I really really think HTTP calls in db level does not makes sense for a high performance application, and I guess everyone wants their app to be well performed.

One other problem is -- Code gets lot messier here and abstraction is tough to achieve.

Ayende Rahien
08/22/2011 06:53 AM by
Ayende Rahien

I don't really follow you here It isn't http calls at db level, the http is merely a transport for the calls, nothing more. And the question is how we catake advantage of that and make the most performing db /app using it.

Pedro Félix
08/28/2011 10:02 PM by
Pedro Félix

Ayende,

With HTTP 1.1 pipelining, the second request can be sent without waiting for the first reply [RFC 2616, section 8.1.2.2]. Pipelining is more than only sharing the same connection.

-- GET #1 request --> -- GET #2 request --> <-- GET #1 response -- <-- GET #2 response

Ayende Rahien
08/28/2011 06:43 PM by
Ayende Rahien

Pedro, Can you show me how this can be done in .NET ?

Pedro Félix
08/30/2011 01:38 PM by
Pedro Félix

1) On the client side, HttpWebRequest supports pipelining (see http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.pipelined.aspx). Note however, that this pipelining only starts after the client is assured that the server is HTTP/1.1 compliant, that is, after the first response.

2) HTTP.SYS and HttpListener also support pipelining. However, from my observations, it appears that the requests are delivered to the handlers (BeginGetContext callback) in sequential order. This means that the Nth request starts processing only after the (N-1)th request is completed.

Comments have been closed on this topic.