Ayende @ Rahien

Refunds available at head office

RavenDB & HTTP Caching

The RavenDB’s Client API uses the session / unit of work model internally. That means that this code will only go to the database once:

session.Load<User>("users/1");
session.Load<User>("users/1");
session.Load<User>("users/1");

And that all three calls will return the same instance as well. This is just the identity map at work, and with NHibernate, it is also called the first level cache or the session level cache.

Having implemented that, a natural progression was to ask what about the second level cache. NHibernate’s second level cache is complicated (it takes an hour just to explain how exactly it works, and that is when skipping on all the actual implementation details).

For a while, my response was that we don’t actually need that, RavenDB is fast enough that we don’t need caching. Except that I forgot about the Fallacies of Distributed Computing, the first three rules of which state:

  • The network is reliable.
  • Latency is zero.
  • Bandwidth is infinite.

Most specifically, caching can help with the third rule, since when you are querying potentially large documents (or over a large set of documents), you are going to spend most of your time just on the network, sending bytes to and fro.

It is to avoid that that we actually need caching.

I was slightly depressed that I actually had to implement the same complicated logic as NHibernate for caching, so I dawdled in implementing this. And suddenly it dawned on me that as usual, I was being stupid.

RavenDB is REST based. One of the important parts of REST is that:

Cacheable
As on the World Wide Web, clients are able to cache responses. Responses must therefore, implicitly or explicitly, define themselves as cacheable or not to prevent clients reusing stale or inappropriate data in response to further requests. Well-managed caching partially or completely eliminates some client–server interactions, further improving scalability and performance.

RavenDB is an HTTP server, in the end. Why not use HTTP caching?

That required some thought, I’ll admit. It couldn’t be that simple, right?

HTTP Caching is a somewhat complex topic, if you think it is not, talk to me after reading this 24 pages document describing it. But in essence, I am actually using only a small bit of it.

Whenever RavenDB sends a response to a GET request (the only thing that can be safely cached), it adds an ETag header. The ETag header stands for Entity Tag, and it changes every time that the resource is changed.

RavenDB already generated ETags for documents and attachments, those are part of how we implement optimistic concurrency. But since we already had those, we could now move to the next stage, and that was to have the client remember the responses for all the GET requests and when a new request for a Url that we already GET before, it will generate a If-None-Match header for the request.

RavenDB then checks whatever the ETag that the client holds matches the ETag on the server, and if so, will generate a 304 Not Modified response. That instruct the client that it can use the cached response safely.

In order to fully implement caching on the client, that was all we had to do. On the server side, we had to modify a few endpoints to properly generate an ETag and 304 if the client sent us the current If-None-Match value. With RavenDB, this is handled very deep in the guts of the client api, directly on top of the HTTP layer. It is always on by default and it should drastically reduce the amount of data across the network when the data hasn’t been modified.

Please note that unlike NHibernate’s second level cache, we don’t need a distributed cache to ensure consistency. Each node has its own local cache, but all of them will always get valid results, thanks to RavenDB’s ETag checks. In fact, the biggest challenge was actually involved in figuring out how to cheaply generate a valid ETag without performing the actual work for the request Smile.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Jason Meckley
01/11/2011 02:37 PM by
Jason Meckley

So, http (2nd level) cache is on by default, just update the client and server binaries? That is frictionless :)

Brian Vallelunga
01/11/2011 02:41 PM by
Brian Vallelunga

This is really great, but I have an implementation question. Where is the client cache stored? Is it in the DocumentSession object, DocumentStore, or somewhere else?

I ask because in a web application the DocumentSession will likely be created and destroyed per request, making the cache not too useful, unless it's a static property of the session that sticks around.

Ayende Rahien
01/11/2011 02:43 PM by
Ayende Rahien

This is stored in AppDomain level, it is a static property

Yuriy
01/11/2011 03:28 PM by
Yuriy

Can cache size be somehow limited?

El Jobso
01/11/2011 03:34 PM by
El Jobso

Yep Oren, welcome to The Internet ;-) , a place that is essentially a Representational Resource State Transporation System.

All you need (well, 99.9%) is already there!

Ayende Rahien
01/11/2011 03:45 PM by
Ayende Rahien

Yuriy,

Internally this is a MemoryCache with the name:

"Raven.Client.Client.HttpJsonRequest.Cache"

You can configure this any way you want.

stephane
01/11/2011 09:43 PM by
stephane

ha, I was at a .NET user group session by Glenn Block about the next WCF api for http endpoint where he explain exactly that.

Wouldn't it be possible to use it? It is available on codeplex I think.

Cassio Tavares
01/12/2011 01:20 AM by
Cassio Tavares

Ayende, are you using any third party API to implement REST and JSON serialization?

Like stephane said, Glenn Block is ahead of a project to support REST over WCF. There is OpenRasta too but I would like to hear more opinions.

I know that WCF REST doesn't support ETag yet but will in future. Probably OpenRasta already support it.

Ayende Rahien
01/12/2011 07:59 AM by
Ayende Rahien

Stephane,

Can you send me a link to this?

Ayende Rahien
01/12/2011 07:59 AM by
Ayende Rahien

Cassio,

no, just standard http call from .NET

Cassio Tavares
01/12/2011 08:52 AM by
Cassio Tavares

Glenn is working on this project

wcf.codeplex.com

His blog - http://codebetter.com/glennblock/

But I'm pretty sure ETag is not implemented

It is in preview version and lacks docs, but is open source. You can digest it in one morning. :)

Ayende Rahien
01/12/2011 09:12 AM by
Ayende Rahien

Cassio,

I have a working version, one that supports ETags and caching and everything.

It is not a burden to maintain, so I think I'll not use it.

Ayende Rahien
01/12/2011 10:07 AM by
Ayende Rahien

Glenn,

Take a look at how this implemented in RavenDB:

github.com/.../HttpJsonRequest.cs

Maybe 20 lines of code, and it works. No configuration, no need to understand an extensibility mechanism.

blogs.msdn.com/gblock
01/12/2011 10:10 AM by
blogs.msdn.com/gblock

Ayende I was simply responding to the question of ETags. I wasn't necessarily saying you should take a dependency on it.

blogs.msdn.com/gblock
01/12/2011 10:13 AM by
blogs.msdn.com/gblock

Just as a side note, the code for the processors is going to get much cleaner / less verbose.

Ayende Rahien
01/12/2011 10:14 AM by
Ayende Rahien

Glenn,

Am I mistaken, or is the code you posted the server code?

blogs.msdn.com/gblock
01/12/2011 05:44 PM by
blogs.msdn.com/gblock

Yes that is just an illustration of the server side of generating ETags.

Comments have been closed on this topic.