Designing the Entity Framework 2nd level cache
One of the things that I am working on is another commercial extension to EF, a 2nd level cache. At first, I thought to implement something similar to the way NHibernate does this, that is, to create two layers of caching, one for entity data and the second for query results where I would store only scalar information and ids.
That turned out to be quite hard. In fact, it turned out to be hard enough that I almost gave up on that. Sometimes I feel that extending EF is like hitting your head against the wall, eventually you either collapse or the wall fall down, but either way you are left with a headache.
At any rate, I eventually figured out a way to get EF to tell me about entities in queries and now the following works:
// will hit the DB using (var db = new Entities(conStr)) { db.Blogs.Where(x => x.Title.StartsWith("The")).FirstOrDefault(); } // will NOT hit the DB, will use cached data for that using(var db = new Entities(conStr)) { db.Blogs.Where(x => x.Id == 1).FirstOrDefault(); }
The ability to handle such scenarios is an important part of what makes the 2nd level cache useful, since it means that you aren’t limited to just caching a query, but can perform far more sophisticated caching. It means better cache freshness and a lot less unnecessary cache cleanups.
Next, I need to handle partially cached queries, cached query invalidation and a bunch of other minor details, but the main hurdle seems to be have been dealt with (I am willing to lay odds that I will regret this statement).
Comments
Mike,
There are two main problems with the EF CachingProvider.
a) it is very invasive.
b) is does caching in a very brute force manner, that is, it handle all caching using the queries. Having a separate entities & queries cache make the process significantly more efficient.
Ayende, seems that this will be your second provider. Have you considered to extract common infrastructure for them and to open source it?
Mike,
How will this be a second provider?
And I intend to make this into a commercial product
My guess was that first one was for EfProf.
Regarding open source vs commercial product. I have had to ask this :). Actually I am facing to implement multitenancy to EF, and the only way is to use yet another plugged provider. I am just frustrated with amount of stupid classes that minimal implmentation of the provider requires. So I believe this is good candidate for sharing. Hope I will be able to share this.
Hi, I think the most natural place for caching query results is the database server, not the application. And for some reason database servers don't generally do it (at least MS SQL doesn't). The rationale is that in frequently modified tables the cache would be trashed way too often, besides it would work only if the application was sending mostly identical queries over and over, which is rarely the case.Why do you think caching query results in application would help, especially when the cache is generic, built in db access interface?
Is EF not extensible or are you just unaware of its api? nhibernate would of course be easier for you considering you know a considerable amount about its guts.
Seems to me that if EF wasn't extensible then you wouldn't be able to do any of these things period.. a .. 'takes a long while but I got there' sounds more like learning the api.
Rafal,
The problem is that the whole idea of a cache is to avoid hitting the DB server.
Much of the perf boost is the fact that you don't have to go and hit a remote server, and DB servers tend to be awfully busy, anyway.
Most DBs already do what you describe, by loading stuff to memory, so that isn't an issue, but the database has to also worry about things like ACID, which is a whole different kettle of fish.
And in most apps, you DO perform a lot of repeated queries. Take this blog, for instance.
We have the set of queries to show this page, and the set of pages that are being used in any one time. Caching those results would be very helpful in the long run, since you have high cache hit probability.
Stephen,
EF has very few real extension points. The major one is the provider, but the problem is that by the time you reach the provider is it usually too late to do the sort of things that you want to do.
With NHibernate, I have many extension points, at various levels, so I can not only pick the level of granularity that I have but also the particular location for the extension. That means that extending NHibernate to do something is six orders of magnitude easier, even if I want to do something that NHibernate was never meant to do.
I'm not sure why you would bother implementing a cache inside an ORM. The most efficient cache is kept at the application response level (i.e. page output cache, dto response, etc).
If for some reason you want to cache at the data level your better off using a dedicated cache service or DHT which can scale horizontally, be shared by multiple app servers, supports expiration etc.
Denis,
I don't think that you understand what I have in mind. You might want to read a bit about NHibernate's 2nd level cache to understand how this works
Hmm, 6 orders of magnitude easier...If the required amount of time to accomplish task A is proportional to the involved complexity, and also assuming that easiness of an action is directly related to its complexity, and let's further assume that the constant between required amount of time and complexity is 1, a feature that is implemented in 1 minute in NH takes about a million minutes in EF, or almost 2 years.
Careful with the superlatives there :)
Frank,
Good catch :-), but you get my drift, I assume
"and also assuming that easiness of an action is directly related to its complexity" Don't you mean inversely? You have lots of assumptions there ;)
"Don't you mean inversely?"
nope, I merely assumed a relationship without specifying its kind. ;)
Comment preview