Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,927 | Comments: 49,413

filter by tags archive
time to read 2 min | 315 words

Continuing to shadow Davy’s series about building your own DAL, this post is about the Executing Custom Queries.

The post does a good job of showing how you can create a good API on top of what is pretty much raw SQL. If you do have to create your own DAL, or need to use SQL frequently, please use something like that rater than using ADO.Net directly.

ADO.Net is not a data access library, it is the building blocks you use to build a data access library.

A few comments on queries in NHibernate. NHibernate uses a more complex model for queries, obviously. We have a set of abstractions between the query and generated SQL, because we are supporting several query options and a lot more mapping options. But while a large amount of effort goes into translating the user desires to SQL, there is an almost equivalent amount of work going into hydrating the queries. Davy’s support only a very flat model, but with NHibernate, you can specify eager load options, custom DTOs or just plain value queries.

In order to handle those scenarios, NHibernate tracks the intent behind each column, and know whatever a column set represent an entity an association, a collection or a value. That goes through a fairly complex process that I’ll not describe here, but once the first stage hydration process is done, NHibernate has a second stage available. This is why you can write queries such as “select new CustomerHealthIndicator(c.Age,c.SickDays.size()) from Customer c”.

The first stage is recognizing that we aren’t loading an entity (just fields of an entity), the second is passing them to the CustomerHealthIndicator constructor. You can actually take advantage of the second stage yourself, it is called Result Transformer, and you can provide you own and set it a query using SetResultTransformer(…);

time to read 2 min | 260 words

Continuing to shadow Davy’s series about building your own DAL, this post is about the Lazy Loading. Davy support lazy loading is something that I had great fun reading, it is simple, elegant and beautiful to read.

I don’t have much to say about the actual implementation, NHibernate does things in much the same way (we can quibble about minor details such as who holds the identifier and other stuff, but they aren’t really important). A major difference between Davy’s lazy loading approach and NHibernate’s is that Davy doesn’t support inheritance. Inheritance and lazy loading plays some nasty games with NHibernate implementation of lazy loaded entities.

While Davy can get away with loading the values into the same proxy object that he is using, NHibernate must load them into a separate object. Why is that? Let us say that we have a many to one association from AnimalLover to Animal. That association is lazy loaded, so we put a AnimalProxy (which inherit from Animal) as the value in the Animal property. Now, when we want to load the Animal property, NHibernate has to load the entity, at that point, it discovers that the entity isn’t actually an Animal, but a Dog.

Obviously, you cannot load a Dog into an AnimalProxy. What NHibernate does in this case is load the entity into another object (a Dog instance) and once that instance is loaded, direct all methods calls to the new instance. It sounds complicated, but it is actually quite elegant and transparent from the user perspective.

time to read 4 min | 760 words

Continuing to shadow Davy’s series about building your own DAL, this post is about the Session Level Cache.

The session cache (first level cache in NHibernate terms) exists to support one main scenario, in a single session, a row in the database is represented by a single instance. That means that the session needs to track all the instances that it loads and be able to search through them. Davy does a good job covering how it is used, and the implementation is quite similar to the way it is done in NHibernate.

Davy’s implementation is to use a nested dictionary to hold instances per entity type. This is done mainly to support  RemoveAllInstancesOf<TEntity>(), a method that is unique to Day’s DSL. The reasoning for that method are interesting:

When a user executes a custom DELETE statement, there is no way for us to know which entities were actually removed from the database. But if any of those deleted entities happen to remain in the SessionLevelCache, this could lead to buggy application code whenever a piece of code tries to retrieve a specific entity which has already been removed from the database, but is still present in the SessionLevelCache. In order to deal with this scenario, the SessionLevelCache has a ClearAll and a RemoveAllInstancesOf method which you can use from your application code to either clear the entire SessionLevelCache, or to remove all instances of a specific entity type from the cache.

Personally, I think this is using an ICBM to crack eggshells. But I am probably being unfair. NHibernate has much the same issue, if you issue a Delete via SQL or HQL queries, NHibernate doesn’t have a way to track what was actually delete and deal with it. With NHibernate, it doesn’t tend to be a problem for the session level cache, mostly because of usage habits than anything else. The session used to do so rarely have to deal with entities loaded that were deleted by the query issue (and if it does, the user needs to handle that by calling Evict() on all the objects manually). NHibernate doesn’t try to support this scenario explicitly for the session cache. It does support this very feature for the second level cache.

It make sense, though. With NHibernate, in the vast majority of cases deleting is going to be done using NHibernate itself, rather than a special query. With Davy’s DAL, the usage of SQL queries for deletes is going to be much higher.

Another interesting point in Davy’s post is the handling of queries:

When a custom query is executed, or when all instances are retrieved, there is no way for us to exclude already cached entity instances from the result of the query. Well, theoretically speaking you could attempt to do this by adding a clause to the WHERE statement of each query that would prevent cached entities from being loaded. But then you might have to add the cached entity instances to the resulting list of entities anyways if they would otherwise satisfy the other query conditions. Obviously, trying to get this right is simply put insane and i don’t think there’s any DAL or ORM that actually does this (even if there was, i can’t really imagine any of them getting this right in every corner case that will pop up).

So a good compromise is to simply check for the existence of a specific instance in the cache before hydrating a new instance. If it is there, we return it from the cache and we skip the hydration for that database record. In this way, we avoid having to modify the original query, and while we could potentially return a few records that we already have in memory, at least we will be sure that our users will always have the same reference for any particular database record.

This is more or less how NHibernate operates, and for much the same reasoning. But there is a small twist. In order to ensure query coherency between the data base queries and in memory entities, NHibernate will optionally try to flush all the items in the session level cache that have been changed that may be affected by the query. A more detailed description of this can be found here, Davy’s DAL doesn’t do automatic change tracking, so this is not a feature that can be easily added with this prerequisite.

time to read 4 min | 652 words

Continuing to shadow Davy’s series about building your own DAL, this post is about hydrating entities.

Hydrating Entities is the process of taking a row from the database and turning it into an entity, while de-hydrating in the reverse process, taking an entity and turning it into a flat set of values to be inserted/updated.

Here, again, Davy’s has chosen to parallel NHibernate’s method of doing so, and it allows us to take a look at a very simplified version and see what the advantages of this approach is.

First, we can see how the session level cache is implemented, with the check being done directly in the entity hydration process. Davy has some discussion about the various options that you can choose at that point, whatever to just use the values from the session cache, to update the entity with the new values or to throw if there is a change conflict.

NHibernate’s decision at this point was to assume that the entity that we have is correct, and ignore any changes made in the meantime to the database. That turn out to be a good approach, because any optimistic concurrency checks that we might want will run when we commit the transaction, so there isn’t much difference from the result perspective, but it does simplify the behavior of NHibernate.

Next, there is the treatment of reference properties, what NHibernate call many-to-one associations. Here is the relevant code (editted slightly so it can fit on the blog width):

private void SetReferenceProperties<TEntity>(
	TableInfo tableInfo, 
	TEntity entity, 
	IDictionary<string, object> values)
{
	foreach (var referenceInfo in tableInfo.References)
	{
		if (referenceInfo.PropertyInfo.CanWrite == false)
			continue;
		
		object foreignKeyValue = values[referenceInfo.Name];

		if (foreignKeyValue is DBNull)
		{
			referenceInfo.PropertyInfo.SetValue(entity, null, null);
			continue;
		}

		var referencedEntity = sessionLevelCache.TryToFind(
			referenceInfo.ReferenceType, foreignKeyValue);
			
		if(referencedEntity == null)
			referencedEntity = CreateProxy(tableInfo, referenceInfo, foreignKeyValue);
								   
		referenceInfo.PropertyInfo.SetValue(entity, referencedEntity, null);
	}
}

There are a lot of things going on here, so I’ll take them one at a time.

You can see how the uniquing process is going on. If we already have the referenced entity loaded, we will get it directly from the session cache, instead of creating a separate instance of it.

It also shows something that Davy’s promised to touch in a separate post, lazy loading. I had an early look at his implementation and it is pretty. So I’ll skip that for now.

This piece of code also demonstrate something that is very interesting. The lazy loaded inheritance many to one association conundrum.  Which I’ll touch on a future post.

There are a few other implications of the choice of hydrating entities in this fashion. For a start, we are working with detached entities this way, the entity doesn’t have to have a reference to the session (except to support lazy loading). It also means that our entities are pure POCO, we handle it all completely externally to the entity itself.

It also means that if we would like to handle change tracking (with Davy’s DAL currently doesn’t do), we have a much more robust way of doing so, because we can simply dehydrate the entity and compare its current state to its original state. That is exactly how NHibernate is doing it. This turn out to be a far more robust approach, because it is safe in the face of method modifying state internally, without going through properties or invoking change tracking logic.

I wanted to also touch about a few things that makes the NHibernate implementation of the same thing a bit more complex. NHibernate supports reflection optimization and multiple ways of actually setting the values on the entity, is also support things like components and multi column properties, which means that there isn’t a neat ordering between properties and columns that make the Davy’s code so orderly.

time to read 2 min | 224 words

Continuing the series about Davy’s build your own DAL, this post is talking about CRUD functionality.

One of the more annoying things about any DAL is dealing with the repetitive nature of talking to the data source. One of Davy’s stated goal in going this route is to totally eliminate those. CRUD functionality shouldn’t be something that you have to work with for each entity, it is just something that exists and that you get for free whenever you are using it.

I think he was very successful there, but I also want to talk about his method in doing so. Not surprisingly, Davy’s DAL is taking a lot of concepts from NHibernate, simplifying them a bit and then applying them. His approach for handling CRUD operations is reminiscent of how NHibernate itself works.

He divided the actual operations into distinct classes, called DatabaseActions, and he has things like FindAllAction, InsertAction, GetByIdAction.

This architecture gives you two major things, maintaining things is now far easier, and playing around with the action implementations is far easier. It is just a short step away from Davy’s Database Actions to NHibernate’s event & listeners approach.

This is also a nice demonstrations of the Single Responsibility Principle. It makes maintaining the software much easier than it would be otherwise.

time to read 4 min | 630 words

Continuing my shadowing of Davy’s Build Your Own DAL series, this post is shadowing Davy’s mapping post. 

Looking at the mapping, it is fairly clear that Davy has (wisely) made a lot of design decisions along the way that drastically reduce the scope of work he had to deal with. The mapping model is attribute base and essentially supports a simple one to one relation between the classes and the tables. This make things far simpler to deal with internally.

The choice has been made to fix the following:

  • Only support SQL Server
  • Attributed model
  • Primary key is always:
    • Numeric
    • Identity
    • Single Key

Using those rules, Davy create a really slick implementation. About the only complaint that I can make about it is that he doesn’t support having a limit clause of selects.

Take a look at the code he have, it should give you a good idea about what is involved in dealing with mapping between objects and classes.

The really fun part about redoing things that we are familiar with is that we get to ignore all the other things that we don’t want to do which introduce complexity. Davy’s solution works for his scenario, but I want to expand a bit on the additional features that NHibernate has at that layer. It should give you some understanding on the task that NHibernate is solving.

  1. Inheritance, Davy’s DAL is supporting only Table Per Class inheritance. NHibernate support 4 different inheritance model (+ mixed models that we won’t get into here).
  2. Eager loading, something that would add significantly to the complexity of the solution is the ability to load a Product with its Category. That requires that you’ll be able to change the generated SQL dynamically, and more importantly, that you will be able to read it correctly. That is far from simple.
  3. Property that spans multiple columns, it seems like a simple thing, but in actually it affects just about every part of the mapping layer, since it means that all property to column conversion has to take into account multiple columns. It is not so much complex as it is annoying. Especially since you have to either carry the column names or the column indexes all over the place.
  4. Collections, they are complex enough on their own (try to think about the effort involved in syncing changes in a collection in an efficient manner) but the part that really kills with them is trying to do eager loading with them. Oh, but I forgot about one to many vs. many to many. And let us not get into the distinctions between different types of collections.
  5. Optimistic Concurrency, this is actually a feature that would be relatively easy to add, I think. At least, if all you care about is a single type of versioning / optimistic concurrency. NHibernate supports several (detailed here).

I could probably go on, but I think the point was made. As I said before, the problem with taking on something like this is that it either take a day to get the basic functionality going or several months to really get it into shape to handle more than a single scenario.

This series of posts should give you a chance to appreciate what is going on behind the scenes, because you’ll have Davy’s effort at the base functionality, and my comments on what is required to take it to the next level.

time to read 2 min | 334 words

image Davy Brion, one of NHibernate’s committers, is running a series of posts discussing building your own DAL. The reasons for doing so varies, in Davy’s case, he has clients who are adamantly against using anything but the Naked CLR. I have run into similar situations before, sometimes it is institutional blindness, sometimes it is legal reasons, sometimes there are actually good reasons for it, although it is rare.

Davy’s approach to the problem was quite sensible. Deprived of his usual toolset, he was unwilling to give up the advantages inherent to them, so he set up to build them. I did much the same when I worked on SvnBridge, I couldn’t use one of the existing OSS containers, but I wasn’t willing to build a large application without one, so I created one.

I touched about this topic in more details in the past. But basically, a single purpose framework is significantly simpler than a general purpose one. That sounds simple and obvious, right? But it is actually quite significant.

You might not get the same richness that you will get with the real deal, but you get enough to get you going. In fact, since you have only a single scenario to cover, it is quite easy to get the features that you need out the door. You are allowed to cheat, after all :-)

In Davy’s case, he made the decision that using POCO and not worrying about persistence all over the place are the most important things, and he set out to get them.

I am going to shadow his posts in the series, talking about the implications of the choices made and the differences between Davy’s approach and NHibernate. I think it would be a good approach to show the challenges inherit in building an OR/M.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. re (24):
    12 Nov 2019 - Document-Level Optimistic Concurrency in MongoDB
  2. Voron’s Roaring Set (2):
    11 Nov 2019 - Part II–Implementation
  3. Searching through text (3):
    17 Oct 2019 - Part III, Managing posting lists
  4. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  5. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats