The false myth of encapsulating data access in the DAL

time to read 6 min | 1017 words

This is a question that I get routinely, both from random strangers and when I am at clients.

I would like to design a system/application using NHibernate. But I also want to so flexible that in future ,if I unplug the NHibernate and use ADO.NET Entity framework or other framework then my application should not
crash.

In short, I am completely opposed for even trying doing something like that.

It is based on flawed assumptions

A lot of the drive behind this is based on the historical drive built in the time where data access layers directly accessed a database using its own dialect, resulting in the need to create just such an encapsulation in order to support multiple databases.

The issue with this drive is that it is no longer a factor, all modern OR/Ms can handle multiple databases effectively. Moreover, modern OR/M are no longer just ways to execute some SQL and get a result back, which is how old style DAL were written. An OR/M takes on a lot more responsibilities, from things like change tracking to cache management, from ensuring optimistic concurrency to managing optimal communication with the database.

And those features matter, a lot. Not only that, but they are different between each OR/M.

It doesn’t work, and you’ll find that out too late

The main problem is that no matter how hard you try, there are going to be subtle and not so subtle differences between different OR/Ms, those changes can drastically affect how you build your application.

Here are a few examples, using NHibernate and EF.

Feature NHibernate Entity Framework

Futures Yes No

Batching Yes No

Transaction handling Requires explicit code Implicitly handled

Caching 1st & 2nd level caching 1st level caching only

This isn’t intended to be a NH vs. EF, and it doesn’t even pretend to be unbiased, I am simply pointing out a few examples of features that you can take advantage of which can greatly benefit you in one situation, which do not exists in another.

It has a high cost

In order to facilitate this data access encapsulation, you have to do one of two things:

Use the lowest common denominator, preventing you from using the real benefits of the OR/M in question.
Bleed those features through the DAL, allowing you to make use of those features, but preventing you from switching at a later time.

Either of those add complexity, reduce flexibility and creates confusion down the road. And in general, it still doesn’t work.

There are other barriers than the API

Here is an example from a real client, which insists on creating this encapsulation and hiding NHibernate inside their DAL. They run into the previously mentioned problems, where there are NHibernate features specifically designed to solve some of their problems, but which they have hard time to implement through their DAL.

Worse, from the migration perspective, most of the barrier for moving from NHibernate isn’t in the API. Their entity model make a heavy use on NHibernate’s <any/> feature, which large percentage other OR/Ms do not support. And that is merely the example that spring most forcibly to mind, there are others.

The real world doesn’t support it, even for the simplest scenarios

A while ago I tried porting the NerdDinner application to NHibernate. Just to point it out, that application have a single entity, and was designed with a nice encapsulation between the data access and the rest of the code. In order to make the port, I had to modify significant parts of the codebase. And that is about the simplest example that can be.

The role of encapsulation

Now, I know that some people would read this as an attack of encapsulation of data access, but that isn’t the case. By all mean, encapsulate to your heart’s content. But the purpose of this encapsulation is important. Trying to encapsulate to make things easier to work with, great. Trying to encapsulate so that you can switch OR/Ms? Won’t work, will be costly and painful.

So how do you move between OR/Ms?

There are reasons why some people want to move from one data access technology to the other. I was involved in several such efforts, and the approach that we used in each of those cases was porting, rather than trying to drop a new IDaataAccess implementation.

Tweet Share Share 73 comments

Tags:

Comments

30 Jul 2010
09:35 AM

evereq

Hi! :) While agree with most you say, can just add my few bits:

First - I think It CAN work! Check for example KIGG project! Ok, sure it's not NHibernate & EF example (and I am sure after somebody use NHibernate he will not switch to EF :) at least with current EF features), but say Linq to SQL and EF can be relatively easy switched!

And second - as always it's depends from requirements: if you build say CMS that you going to sell, or you build some kind of complex open source project that will benefit to have few different DALs - why not to implement this ? ;-) Developers that will use your CMS or your project will BENEFIT from this feature, because can use what they like or what they know or .... :D

Sure for most of projects agree with you - it's useless and I give my +1 to such posts, where people explain what better NOT to do :) (sure same like I give my +1 to posts where people explain what better TO DO)

30 Jul 2010
09:42 AM

Ayende Rahien

evereq,

Really?

Things that are possible in EF aren't possible in L2S, and vice versa.

Common example, in EF I can specify fetch paths per query, in L2S, it is per data context.

30 Jul 2010
10:04 AM

Stefan Wenig

Someone correct me if I'm wrong, but as far as I can remember this notion of being ORM-agnostic might come from Jimmy Nilsson's DDD book, which a lot of people in the .NET world use to get into DDD. That's unfortunate, because I believe it was never brought up in Evan's original book, so the association is plain wrong. POCOs can be a good idea, but hardly for that reason.

30 Jul 2010
10:05 AM

evereq

Migrations usually come from more simple ORM into latest (at least I don't see reasons to go back :D here)... If we consider your example, people will NOT migrate probably from EF into Linq to SQL. They or start from stretch to implement both (and then they have full freedom what you use from both ORM and how to build with say repositories and try to abstract away all ORM stuff, even possible ORM entities!) or start from more simple Linq to SQL (like was with KIGG) and add support for EF. And in worse case (I know you will not like, but... :D) they can just go to low level, say to SQL :D and made what they need (sometimes even more efficient than to try to fight with ORM!) It's again depends from goals you want to accomplish! If you goal will be SUPPORT few ORMs in DAL, I am sure you will done it and it will be not as bad as you may think :) So, think, don't say no if it will be goal, but sure avoid such complicity if you don't really need it!

30 Jul 2010
10:13 AM

Bogdan Marian

@Ayende

Just for the sake of DAL - what if I want to persist my data in another place, different than a relational database? For instance, in a object-oriented database, or just serialize the entities and store them somewhere on the disk... What about then ? NHibernate cannot handle this situation, can it ? Wouldn't I be entitled to abstract my DAL so that it can support such case ?

30 Jul 2010
10:13 AM

Felice Pollano

Ayende,

This is one of your best post! Actually I had to fight day by day with some IYetAnotherDefinitiveDataAccess creator.

30 Jul 2010
10:21 AM

Ayende Rahien

Stefan,

I run into that in many places that never read Jimmy's book

30 Jul 2010
10:25 AM

Ayende Rahien

Evereq,

Obviously there are problems if you move from the full featured product to the simpleton product, but that is actually goes beyond that.

There problem isn't with obvious features (like L2S M:N support), it is about things like how it handle change tracking, how you optimize things, how it handle things like SELECT N+1, etc.

Those are the things that are going to leak through, and you really want to be able to tweak those, because they are important.

30 Jul 2010
10:28 AM

Ayende Rahien

Bogdan,

It isn't going to work.

Consider how you work with a relational store vs. document store.

Something as simple a Post -}} Comments is structured in a radically different form.

Unless you are using the persistence store for key/value (the simplest form), it isn't going to work

30 Jul 2010
11:15 AM

Ralf Westphal

Hm... my feeling is there is a cause-effect confusion.

If the cause is "ORM", then the effect is "no flexibility". I agree that it is near impossible to encapusalte an ORM - and at the same time use the objects it loads/stores throughout an application´s codebase.

An ORM is a very, very pervasive dependency.

But if the cause is "encapsulate data access" (or "be flexible", "minimize dependencies"), then "ORM" cannot be the effect.

So it´s a question of principles and values. What does a project value high? If it´s encapsulation, independence of technologies etc., then they should not think about using an ORM.

And my guess is, that that´s the position of the CQRS guys. They don´t use nor need an ORM.

So ORMs are not necessary for data access. Maybe they are even overrated. They are a relict from focussing on RDBMS.

-Ralf

30 Jul 2010
11:50 AM

Stefan Wenig

Ayende: Might be. I find those cases annoying enough where people tell me I need to be ORM-agnostic because DDD says so. It does not.

Ralf: Neither ORMs nor RMDBSs are relicts. They are both very useful for a wide range of applications, especially two-tier enterprise apps. The problem seems to be that a lot of people look at lighthouse projects to get insights for their own architecture. While that's always interesting, you're guaranteed to make wrong decisions when you're just trying to copy ideas from the likes of Amazon.

Along the same line, you should not embrace a n-tier architecture when you can just build a 2-tier app. 2-tier can be a lot easier if you accept the limitations, and easier = time to market, cost, maintainability = money. Hardly a relict in my book.

30 Jul 2010
12:03 PM

Mythos

I thought a false myth was better than a real myth...

30 Jul 2010
13:10 PM

Alex Yakunin

Great post! I almost immediately recommended it in our own blog.

30 Jul 2010
14:16 PM

Christopher Bennage

I once had an employer who wanted to be able to switch out .NET with Java on the main application for the company. You know, just in case we had an anti-Microsoft client.

30 Jul 2010
15:05 PM

João P. Bragança

Also speaking from personal experience, Ayende is spot on. Trying to fight this is ultimately pointless, as I learned the hard way..

30 Jul 2010
15:17 PM

Frank Quednau

@Bogdan, I can't see how you could transparently support both a relational and an object database. When I use db40 significant parts of the app look quite different and stuff like what object identity actually is and what a change is, the fetch depth, transaction boundaries...how do you want to support that across different persistence mechanisms?

30 Jul 2010
17:10 PM

Vadim

What about for testing purposes?

Seems like it's much easier to write a test where you stub out a call to ISomeQuery.Fetch() rather than ISession.CreateCriteria().Add(...

30 Jul 2010
17:38 PM

Alex Yakunin

Vadim,

It's almost always easlier to test everything using real database. You can rollback any transaction in unit tests, regenerate the schema on each test fixture run, use some lightweight provider (e.g. SQLite) - nearly anything works.

30 Jul 2010
17:58 PM

Dominic Pettifer

Porting from ORM to ORM is one thing (Nhibernate to EF etc.). But what if I wanted to port my application to a NoSQL style database such as MongoDB. NHibernate understandably can't map to MongoDB because it isn't a RDBMS, it's a Document based database. Surely this is a good use case for a fully encapsulated DAL?

This is something I was planning to do to give clients maximum flexibility in choosing a database (they can go with a standard RDBMS, or use a NoSQL database, or even flat XML files), and I was going to use a fully decoupled/encapsulated Repository layer to achieve this. Although I've not attempted something like this before and am unaware of the challenges in mapping a DDD style Domain Model to a NoSQL database.

What are your thoughts?

30 Jul 2010
18:05 PM

Santos Ray Victorero, II

Sorry, I should have said IQueryable <t and not IEnumerable <t>

30 Jul 2010
18:24 PM

Leandro de los Santos

I disagree with this post, i do a model with DAL layer in mind and my model works fine in server side and client side. I think the problem isn't in the DAL idea, the problem is in the model and DAL interaction.

If you have a DAL layer is for abstraction, so you must make a model with the DAL idea.

This idea mean if you have a property "Client" in your Invoice Class, you must use a DAL for the invoce and another DAL for the Client. In this case you can do a migration, even separate the DAL and the model in differents machines (o even integrate a different system for the Client).

But if you do a DAL, and the invoce don't use the DAL for reading the client, your model assume that NH will load the Client Property... Then you lost time, you write a DAL just becouse you think is a good practice and don't understand what are you doing...

By the way... The invoce / client example is a real one...

30 Jul 2010
19:20 PM

Tim Mahy

@Leandro de los Santos:

it are systems like that, that are full of N+1 (or worse) situations

30 Jul 2010
20:14 PM

Vadim

@Alex,

Well, I'm not sure I want to be setting up db tests all over my app. This invariably involves populating the db and if your schema is complex makes your unit test setup time quite long. Compare that to abstracting the db access and stubbing out calls to it.

30 Jul 2010
20:38 PM

evereq

Ayende: maybe I don't get you, but this is how I see situation: I will give another example, in addition to CMS. Let's say you build DAL and use latest EF4 for now because it's just very easy and quick solution... Usually one does not even need to think a lot when he build DAL with EF! But for example you KNOW for sure that you will need to switch to some highly distributed "hashtable" storage like Amazon SimpleDb, Google BigTable or even Azure Table Storage. When you will do this "switch" you will 100% need to fight with very limited features set: ATS for example does not even support 'Skip' feature! But you just don't want to fight right now with all this "crap" and "complicity in simplicity" that exists around distributed databases! You want to build your DAL in a minutes, just to build say "prototype" of your application, focus on unique business logic instead! You don't want to build best DAL, just very basic! But you want to be flexible and be able to switch later with minimum effort. So if we read your post and if reader just take it "as is" he may think something like "it's just not works!" and use say EF entities in all Business layer and service layer, i.e. "couple" DAL powered by EF with other parts of app in situation where doing this will cause HUGE issues later! He will so couple all this together, that later it will be near to impossible to switch ORM (or go without ORM)! Yes, EF give you DAL in 5 minutes, but business logic that takes you say 6 months to develop will just be so coupled with ORM ....

Instead, in such situation reader probably should do:

a) Create another "abstraction" layer under ORM (under EF for example) same way like it's "promoted' in many DDD books: just select repository pattern will be ENOUGH to start.

b) instead of direct usage of EF entities / data context in BL / Service Layer (or sometimes I see even in ASP.NET MVC controllers!!!), he use Repositories and so have dependency only on repositories interfaces. What methods actually exists in repositories? GetOrders() for example! Inside, in implementation of it, you use full power of EF ORM now!

c) When he need to switch to another ORM he just implement another repositories and inject them, instead of repositories designed to use EF!

All complicity that dial with handling database related features going inside repositories implementation! If you build repositories for EF ORM you use ALL features you need from EF! If you build repositories that use Azure StorageClient Library directly you even not use any ORM at all etc, and instead write very sophisticated code for paging for example! But methods like GetOrders() that you use in your BL / SL will be not changed, so 6 months work will not gone because you just don't build this "thin" abstraction layer under ORM!

I don't want to name this approach (that actually highly promoted by DDD experts, i.e. to use repositories to "hide" ORM) as "best" that feet all! But in SOME situations it's do play it role and give ability to build applications that can "switch" ORM!

Sure building such "additional" layer of abstraction have as always own trade offs, like performance or time effort or ... a lot! But it's POSSIBLE! :) and projects like KIGG prove this!

This helps me before :), so hope this will helps somebody else :)

30 Jul 2010
21:20 PM

Frank Quednau

@evereq

just having a very shallow first look at kigg. Here, the domain and the repository is provided twice. Once for Linq2SQL and once for EF. Said ORM independence then comes at the price of multiplying a significant part of the codebase by the number of OR/Ms you want to support. As an example, introducing a new Domain Object means introducing an interface for that object, a class implementing it per ORM support and additional repository work. Can you explain to me why I would want to pay that price if the only thing I want is to construct is a digg-style website? I understand that a company wants to show off technologies, but this leads to sometimes pretty bizarre constructions.

Here, as consultants, making real-life projects, the predominant questions should not be whether this is technically possible but why on Earth a customer would want to do this. Because, everything is possible - as long as the customer is willing to pay for it.

30 Jul 2010
21:26 PM

Ayende Rahien

Dominic,

Won't work.

The models are different and the way you work with them, the type of queries you can execute, etc.

You literally can't write a real system that move between different storage mediums unless you treat it as a dumb key/value store (the lowest common ground)

30 Jul 2010
21:30 PM

Ayende Rahien

Santos,

Try doing that in a real application, then trying to move OR/Ms.

Things work differently in different OR/Ms. Something as simple as where you assign IDs for a new instance is going to break your code.

But more importantly, you can't write a real app on top of IEnumerable, you have to be able to control things like eager loading. That is different for every OR/M.

Moreover, the options that they offer are different. EF handle eager loading of multiple collections implicitly, while NH requires explicit work, for example.

Trying to port from EF to NH is going to cause massive problems even if you just port the API.

Remember, those are just examples off the top of my head. There are literally dozens such subtle and not so subtle differences.

30 Jul 2010
21:32 PM

Ayende Rahien

Leandro

In other words, you moved down to:

a) everything is key / value.

b) you can't take advantage of things like joins, eager loading, etc.

c) you are doing a whole lot more work that the ORM can do for you already.

d) you can't load a list of orders, because to load 25 orders, you would require 26 queries (SELECT N+1)

30 Jul 2010
21:44 PM

Eduardo

Spot on, Ayende.

I've run into this situation in at least three different projects.

Today we were trying to use Linq-to-hibernate to achieve exactly that, encapsulating hibernate at the persitence layer.

But guess what, since we need to be able to define fetching strategies dynamically, we are forced to let nhibernate dependencies propagate to the layer right on top of it. (ADOs holding dynamic queries, parameterized queries were externalized)

So, IMO, being able to completely encapsulate the ORM solution is exactly that, a myth.

30 Jul 2010
22:02 PM

Ayende Rahien

But for example you KNOW for sure that you will need to switch to some highly distributed "hashtable" storage like Amazon SimpleDb, Google BigTable or even Azure Table Storage.

Then you use a persistent hash table during development.

It make absolutely no sense to try building that on top of an OR/M.

Not only won't you be able to use any of the features, but it will actually be harder than anything else.

You want to build your DAL in a minutes, just to build say "prototype" of your application, focus on unique business logic instead!

You write something like this, and don't try to add complexity to your life:

public class PersistentHashTable

{

string directory;


public PersistentHashTable(string directory)

{

    this.directory = directory;

    if(Directory.Exists(directory)==false)

        Directory.Create(directory)

}


public void Save(string id, object entity)

{

    File.WriteAllText(Path.Combine(directory, id), 

        JObject.FromObject(entity).Tostring());

}


public T Read{T}(string id) where T : class

{

    var path = Path.Combine(directory, id);

    if(File.Exists(path) == false)

        return null;

    return new JsonSerializer().Deserialize{T}(File.ReadAllText(path));

}

}

just select repository pattern will be ENOUGH to start.

Won't _work_, because the access patterns are completely different.

Also, note that key/value store is about as basic as you can _get_. Anything beyond that and you get serious leakage of the persistence model to the application.

GetOrders() for example! Inside, in implementation of it, you use full power of EF ORM now!

Really? How are OrderLines retreived? Eagerly? Lazily?

When you move to a different model, is it going to kill you either way.

In a key/value store, you will have to make select n+1 queries, for example.

That totally ignore the problem that you have a tool that can remove all data access issues, and you just use it as another layer to create data access problems.

If you build repositories for EF ORM you use ALL features you need from EF!

No, you can't. Lazy loading, change tracking, optimistic concurrency, etc are all tied in.

30 Jul 2010
22:15 PM

tobi

This is an interesting discussion as I am currently moving from Linq2Sql to a different ORM. What was most painful for me (and what killed my spike for EF 4) was that the Linq support of Linq2Sql is really good - so I was relying on it (I would prefer NHibernate over everything else if it only had useful linq support. I could send in 3 bug reports from my spike that I got before I killed that spike too). I could rewrite half of the app if I was going to use EF4. I am currently evaluating the linq support of Dataobjects.Net which looks extremely good.

30 Jul 2010
23:00 PM

Leandro de los Santos

Ayende Rahien, yes, you loose all those capabilites, but the point is

Do you have a REAL requeriment who need a DAL?

If you have that requeriment, you pay the price. If not, you are doing some layer just for nothing.

In fact, a think if you develop a DAL layer, you don't have a ORM system, you use the ORM in the dal layer in a "helper" way, but your model / dal interaction must be like if no ORM in the system.

My point is, why you need a DAL layer?

For my, are few real cases who need this.

You have a real requirement for support multiples ORMs, or even ADO.NET.
You know that some part of your system can be replaced with other system (example: interact with a "invoice module" wich is your or a third party).
You want to run the DAL layer and the model layer in separate machines.
Multiple RDBMs supports only if you made SQL by hand, if not, today you don't need a DAL.

If you don't have this real DAL requeriments, you don't need this layer. It's the same for other arquitectured components. You don't program WCF services just in case...

You don't have a DAL layers just becouse you implement some sort of IGenericDAL <bussinesentity wich wrap Nhibernate. For me this are a nice helpers classes (really usefulls ones in most cases), but this doesn't mean you cover real DAL requeriments.

In others word, if someone have a real requeriments wich are covers with a DAL layer and when the time for make the changes come and can make this changes, if becouse the DAL / Model interaction was poolry designed, not becouse the encapsulating data access is a bad idea.

The bad idea is make an extra layer when you don't needed, if you want ORM benefits, then you don't need a DAL, just need some good programing practices like put the querys in one class and no in every view, etc...

Sorry for my bad english, hope be clear.

30 Jul 2010
23:08 PM

Ayende Rahien

Tobi,

Did you try NH 3.0?

30 Jul 2010
23:10 PM

Tobin Harris

Nicely put. I think that should let lots of developers "off the hook".

I used to strive for the super-clean goal of building the ultimate swappable DAL to isolate app code from the persistence mechanism and somehow abstract all it's flexibility. Waste of time. These days I'm happy to accept that if we need to change ORM, then that will come at a cheaper price than building crazy isolation layers to dance around!

30 Jul 2010
23:13 PM

Ayende Rahien

Do you have a REAL requeriment who need a DAL?

Yes, it is outlined here:

ayende.com/.../stealing-from-your-client.aspx

I don't believe in DAL, it is usually a waste of time and effort to try to write that, a remenant of time where you had very low level APIs.

That isn't the case anymore.

You have a real requirement for support multiples ORMs, or even ADO.NET.

Can you give me a real use case for that?

I can understand multiple _databases_, but that is something that the ORM handles.

You know that some part of your system can be replaced with other system (example: interact with a "invoice module" wich is your or a third party).

That isn't relevant, you put an interface there and forget about it. That is IInvoicingModule, not IInvoiceDAL, BTW.

You want to run the DAL layer and the model layer in separate machines.

Almost always a mistake.

ayende.com/.../...-castrate-your-architecture.aspx

You don't program WCF services just in case...

Tell that to Juval Lowey: "Every class should be a service"

I think that we are in agreement, then.

31 Jul 2010
01:10 AM

Leandro de los Santos

For me, today the only real one scenario when i choose a DAL layer (real DAL, no the IGenericDALWichIDoBecouseIReadSomewhere) is a app with client side procesing capabilities...

Like a (heavy) quotation system, and maybe only if i need the option for run in offline scenarios (bad connections, full availability)...

DAL is a good pattern wich allow me to share classes in servers, middle and client sides (obviously click once is a must for this)... Also allows me to implement client side cache, even a full load of some entitys without rewrite model logic.

You can do some tricky process like put queues for asynchronous messaging of Data Layer for low bandwith connections (of course, you can do the same without DAL, just a decent Service layer...)

Of course, i agree with you thinking DAL like a silver bullet for no ORM dependency is a myth, if you implement some interfaces but still use the ORM, you don't even respect the layer name (miss the A letter in this implementation). But i still believe in DAL for reals scenarios, even with the 3g everywhere...

So, don't blame the pattern, blame the bad implementations!!!

31 Jul 2010
05:58 AM

Alex Yakunin

@ Vadim:

This invariably involves populating the db and if your schema is complex makes your unit test setup time quite long.

Well, I represent a bit different "ORM camp", so in my particular case:

It's easy to automatically re-create the schema (1 LOC)
IMHO, it's better to write a code populating DB with test data rather then to populate the DB itself once. Reasoning is quite simple: test must run on generally any PC, and ideally - on different DBs.

Compare that to abstracting the db access and stubbing out calls to it.

I'd compare the approach described above:

No any plumbing code related to DAL abstraction
~ 1 LOC to recreate the DB
Let's say, 10-20 LOC per each table to populate it with the data, taking into account there is some shared stuff for this.

IMHO, that's much more clean and simple ;) If you use CI, even the fact that such tests run a bit longer is not a problem at all.

31 Jul 2010
06:27 AM

Alex Yakunin

@ Ayende: agree with all the posts.

Unbelievable, how people EVEN THINK (btw, it was just discussed seriously!) about possibility to abstract both ORM and e.g. BigTable in their particular application. Likely, they simply don't know how one of parts really works (e.g. BigTable), and what its proper usage implies.

Probably, Repository Pattern is one of the worst inventions from this point: it makes people (esp. with relatively small experience) believe all storages are quite similar - a huge myth, they only looks so.

In reality, this is nothing more that a simplification of abstractions to minimal common subset of features.

So, I have a question for guys pretending that ORM must be abstracted: can you name some widely used and relatively comlex application that really has an abstraction layer allowing to switch from one ORM to another with nearly zero code? Relatively complex means there are definitely more that 20 tables. Commercial application example would be even better.

That's really interesting, since I know only opposite examples of successful apps. This can be one of good cases showing how theory meets practice.

31 Jul 2010
06:31 AM

Alex Yakunin

Just reviewed my post. I'm sorry for some mistypings and my English.

with nearly zero code

I mean "nearly zero code beyond that layer (e.g. DAL)".

31 Jul 2010
06:34 AM

Alex Yakunin

Btw, speaking about NH3.0 and LINQ there: what's the replacement from NHibernateContext there? I tried to find this info, but so far without success.

31 Jul 2010
06:35 AM

Alex Yakunin

Sorry, "from" -> "for".

31 Jul 2010
06:35 AM

Anders Juul

Hi all,

Wonderful thought-provoking posts, thanks for joining in. I've learned a lot already - always aimed for ORM-agnostic design so far.

Anders, Denmark

31 Jul 2010
07:29 AM

evereq

Alex Yakunin:

Unbelievable, how people EVEN THINK (btw, it was just discussed seriously!) about possibility to abstract both ORM and e.g. BigTable in their particular application.

Why not to think? If you don't think about different stuff, you just become "automated" developer and hold "design patterns" book as your bible (many great developers does not actually like "design patterns" at all - they push us think only few predefined ways! I always want to THINK and Discuss DIFFERENT ideas!

Likely, they simply don't know how one of parts really works (e.g. BigTable), and what its proper usage implies.

Yep? How you know? ;-) Does proper usage of BigTable means for you that you should just use BigTable API in all your code, in BL, SL, Controllers???? Did you really think that SOMETIMES it can be good idea to abstract away it? Or at least TRY to do this? Again, NOT always make sense to do this - my +1 to 'Leandro de los Santos' and others who do believe that sometimes make sense to really build DAL, not just use ORM or hashtable API coupled inside most of application layers!

Probably, Repository Pattern is one of the worst inventions from this point: it makes people (esp. with relatively small experience) believe all storages are quite similar - a huge myth, they only looks so.

Storages ARE different! Nobody tell here that they are same! But in SOME projects it do make sense to focus this "differences" in NEW layer (very thin) of abstraction - repositories for example or providers or something else! It's always like this: you add another abstraction layer if you want to HIDE differences for some reason! And I don't see how to apply here "relatively small experience", sorry :)

In reality, this is nothing more that a simplification of abstractions to minimal common subset of features.

Why to minimal??? Still not get you!! You can use in Repository code for EF ALL features available in EF! You can use in Repository code for NHibernate ALL features available in NHibernate etc!

31 Jul 2010
09:00 AM

Frank Quednau

In reality, this is nothing more that a simplification of abstractions to minimal common subset of features.

Why to minimal??? Still not get you!!

Given 2 systems A and B with each a set of features, a third system using any of A or B interchangeably will only be able to use those features that are common to A or B. Any other solution will imply knowledge about A or B or both.

It is possible that you may be able to extract more commonalities between two systems than some other developer by finding adequate abstractions someone else could not think of, but then you may be just on your way to write the next great OR/M.

31 Jul 2010
09:10 AM

Frans Bouma

O/R mapper frameworks are not a solid abstraction, the abstraction is leaky because the services the O/R mapper provides are embedded into the application, no matter what you do. I therefore agree with the article.

People who still try to abstract the O/R mapper away should think about why they're spending their client's money (!) on doing something completely irrelevant. It's not as if the developer abstracts away the commercial grid they're using in the UI which dictates how things work as well.

Here's another one, in the same line of thought: if an o/r mapper is bleeding into the app anyway (as you use the services, so you don't have to write that yourself, you GAIN by this 'bleeding'!), is POCO really that important? After all, if the O/R mapper can't be swapped easily, if at all, why bother how the persistence classes are created, if the o/r mapper provides the services you want/need? POCO makes swapping easier, but it's still almost undoable, so it's a non-feature.

31 Jul 2010
09:20 AM

Ayende Rahien

DAL is a good pattern wich allow me to share classes in servers, middle and client sides (obviously click once is a must for this).

You never want to do that.

You end up with classes that have 3 responsabilities, act as entities from the DB, DTOS on the wire and UI bound.

That lead to very nasty code and is a huge violation of SRP.

31 Jul 2010
09:21 AM

Ayende Rahien

Alex,

You writer your own context based on your entities. Like you do with EF Code Only, frex.

31 Jul 2010
09:27 AM

Ayende Rahien

Did you really think that SOMETIMES it can be good idea to abstract away it?

Oh, you probably want to build some abstraction on top of BigTable, if only to make it easier to do stuff. But the abstraction you'll build are going to be different than the abstractions you'll have if you use a relational database.

That is the key point. The whole way you access the data is completely different.

You can use in Repository code for NHibernate ALL features available in NHibernate etc!

No, you can't. Because usage of those features is highlky dependant on where you are using things.

So either you end up with a repository with a lot of single use methods, or you use NH to define the context for operations.

Beside that, I think you are missing something important when you talk about using repository to move not only between different ORM (which is hard to impossible as it is) but also between different storage formats. Different storage formats offer different access patterns. And that leads to _different code_, not just different code calling the data store, but different code in how you call it.

31 Jul 2010
10:07 AM

evereq

Re Frank Quednau:

Given 2 systems A and B with each a set of features, a third system using any of A or B interchangeably will only be able to use those features that are common to A or B. Any other solution will imply knowledge about A or B or both.

Sorry, not agree :) It seems like here you think not about "abstractions", but instead about "inheritance".. more so, looks like for some reason you restrict yourself to only "is-a" inheritance model and forgot about "has-a" or / and aggregation!

The power of "abstraction" with repositories / providers that you use BOTH approaches (i.e. aggregation and inheritance) !

Developers tend to create very complex abstractions (sometimes too complex or just useless) and not only in DAL, but in whole a lot of layers! And it's always will be argumentation around specific situations like with DAL we have now... about say 30+ years ago, I think most of people was think that it's not possible to abstract away UI and so MVC help us now to do this! Same now happens with DAL, thanks to DDD. Your model should be simply "not aware" from persistence strategy you choice! And sorry, but ORM is just a part of all persistence we usually use in our projects! :)

To Ayende:

What I agree, that for some reason, .NET developers usually try to over complicate things when it's does not necessary and does not in requirements! Looking to say Grails or RoR frameworks, they does not even raise questions about switch ORM (while it's still possible there)... The issue here that Ayende have title for blog post "The false myth of encapsulating data access in the DAL" and say that it's "not work" and that's a bit not correct :) In most of web frameworks / platforms I take a look before, it's simply BY DEFAULT DAL encapsulate all data access inside :) But because such frameworks (like Django, Grails or RoR have usually just one main, default ORM nobody even try to switch it... or at least it's not common.)... in .NET world we have just too big variations and a lot of ways to build DAL: L2S, EF, NHibernate, ADO.NET, LLBLGen, ... and what is a bit issue (or maybe it's big +) that .NET framework itself does not "push" us to use any of them by default! Even latest ASP.NET MVC does not "push" us to use say EF! So that is why developers gain questions like in this blog post! :)

It is possible that you may be able to extract more commonalities between two systems than some other developer by finding adequate abstractions someone else could not think of, but then you may be just on your way to write the next great OR/M.

Yep EXACTLY, and this is great and this what drive us - developers :)

31 Jul 2010
10:34 AM

evereq

To Frank Quednau:

just having a very shallow first look at kigg. Here, the domain and the repository is provided twice. Once for Linq2SQL and once for EF. Said ORM independence then comes at the price of multiplying a significant part of the codebase by the number of OR/Ms you want to support. As an example, introducing a new Domain Object means introducing an interface for that object, a class implementing it per ORM support and additional repository work.

EXACTLY! see!? You get it in few minutes! Did I said somewhere it's not require code??? Sure you should add a whole a LOT of code for each supported ORM! :) But "common, base" code still small and nice! And you can hire few engineers each one build support of own ORM etc if needed.

Can you explain to me why I would want to pay that price if the only thing I want is to construct is a digg-style website? I understand that a company wants to show off technologies, but this leads to sometimes pretty bizarre constructions.

Ha :) It's question to KIGG people :) not to me :) As I see this this guys was so experienced that decide initially to build all "infrastructure" :) Maybe they just know that someday they will want EF support, instead of L2S (i.e. was time when community was not sure about L2S future at all) etc... In any case they do it NICE and it's WORKS :)

100% agree with you! You don't need any of this complex methods for a la Digg web site or even a la Gmail! :) See my previous comments where I explain WHERE to use it will be benefit :)

31 Jul 2010
10:48 AM

Ayende Rahien

Sure you should add a whole a LOT of code for each supported ORM!

ayende.com/.../stealing-from-your-client.aspx

That is not an acceptable solution.

You basically created a port, shouldered on a huge additional maintenance problem and going to have a lot more expensive solution.

In any case they do it NICE and it's WORKS :)

You can also swim the English channel, but most people take a boat or fly over.

The fact that you can make something work doesn't mean you should

31 Jul 2010
10:52 AM

evereq

To Frans Bouma:

People who still try to abstract the O/R mapper away should think about why they're spending their client's money (!) on doing something completely irrelevant. It's not as if the developer abstracts away the commercial grid they're using in the UI which dictates how things work as well.

Yes?? Do we speak about same things??? As for me, I will better end up attempting to abstract away ORM and spend client money on it, than some commercial GUI components (DevExpress for example) - it's usually useless to do this - you usually really change completely UI framework (say move to Silverlight or switch to jQuery Grid plug-in etc), than you just replace all DevExpress controls with Telerik or home build :D

I see before in my live one WinForms project (very BIG actually) where people try to abstract away all GUI controls, they build own "labels" and "textboxes", own Grid etc :D They spend on this 90% time of project, but they just want to "bill" client more :D In reality, I think abstract GUI controls is really hard, especially for Web! (because of Javascript mostly)... You should pick up one set (say Telerik) or few sets and just use them all around :)

31 Jul 2010
15:11 PM

Alex Yakunin

Planned to answer to evereq, but found Ayende and Frans already answered on nearly everything.

@evereq: I'd like to commend just few parts:

1) Examples you provided don't satisfy one of requirements with quire high probability: I mentioned there must be more then 20 tables. In short, all these examples show relatively simple system. I agree e.g. BlogEngine.NET is widely adopted. But believe me, it's quite far from real-life business application.

2) BigTable differs from nearly any database:

No transactions (= different update \ access pattern). No distributed transactions (read further).
Column (family) - oriented. This affects on access patterns.
Auto-partitioned. Affects on design, etc.
Provides access to historical values. There is no analogue in regular DBMS at all for this.
There are no indexes (in fact, you have just primary index there).
It stores byte streams (~= strings); there are no other data types.
No relation consistency is guaranteed even in scope of single BigTable instance. Obviously, there are no any warranties for different BigTable instances as well.
No constraints, no triggers, etc.
Sequential scan, that is frequent access pattern in case with regular DB, is almost unusable here - because of size. Standard solution is MapReduce.

So what do you think, can you, taking this into account, abstract both regular and such storage? :)

P.S. Ayende, thanks about NHibernateContext - we'll use this.

31 Jul 2010
15:14 PM

Alex Yakunin

@Frans, @Ayende - about POCO: I also noted that original statement in its extreme form implies "don't hunt for PI/POCO, if this isn't really necessary in other layers".

31 Jul 2010
15:23 PM

Alex Yakunin

You end up with classes that have 3 responsabilities, act as entities from the DB, DTOS on the wire and UI bound.

Fully agree.

On the other hand, I know people want this. That's probably the reason behind popularity of CSLA (their entities combine last 2 features).

We combine well DB+UI, + provide a special container allowing to serialize (=move) the state of such entities.

In both cases the main disadvantage is big learning curve: it's far from obvious how such entities really act in complex cases.

So probably, the most clean and obvious solution here is just avoidance of this (= follow SRP).

31 Jul 2010
15:24 PM

Ayende Rahien

Alex,

Don't get me started on CSLA. I haven't seen one good application using it.

31 Jul 2010
17:01 PM

Alex Yakunin

Well, me too, although I never seriously interested in its real life applications. I'm not saying that's what I like, it was just an example I know.

But it definitely has some popularity - partially, because of books.

Actually, I never heard you wrote about it. Did you? If so, I'd like to read ;)

31 Jul 2010
18:16 PM

Ayende Rahien

This is the closest:

ayende.com/.../Entity-vs.-Business-Object.aspx

I really don't know where to start with CSLA, it is focused on one set of problems, and does it in a very straightforward fashion.

I happen to disagree with just about all the decisions that are made there, but that is another issue

31 Jul 2010
19:02 PM

evereq

To Alex Yakunin:

1) Examples you provided don't satisfy one of requirements with quire high probability: I mentioned there must be more then 20 tables. In short, all these examples show relatively simple system. I agree e.g. BlogEngine.NET is widely adopted. But believe me, it's quite far from real-life business application.

I don't see big problem to "prolong" approach with Repositories or providers into even 200 tables in storage :) (sure maybe some changes will be needed, but generally - really don't see problem here)

2) BigTable differs from nearly any database:

.............

So what do you think, can you, taking this into account, abstract both regular and such storage? :)

I take not only this into account, but much more that I know about such engines... Luck of most of features that you list make it even more "easy" to use DDD with Repositories than in case if fully relational database!

I build DAL for both databases and from "client" point of view (i.e. from say business layer in our case) it's simply no difference exists what exactly resides inside repositories code! Say you build UserRepository and have here methods like Add, Remove, GetById, GetByUsername, GetAll, etc. What is the difference what resides inside repository for client (for business layer in our case)? Is it BigTable or filesystem or ATS or Oracle database??? No differences for CLIENT if you use repository via repository interface in client code!

Sure some problems still apply like Lazy loading or say transactions support, but all this you CAN done :) Sure it's a little bit "art" to build repositories interfaces right, so you not end with 1000s methods in each of say 20 repositories :) But it's just your "experience" counts here, not approach itself!

Btw, I just think up another example how people try to build actually complete, decoupled DAL and be able to switch ORM to some extend - look into LLBLGen - you can use here or build in ORM or switch into EF or L2S or even into NHibernate! Sure maybe you will need to add a bit of code, but this is just an approach that some can select if they need to be able to switch ORM! The only issues I see with this approach (i.e. code generation with LLBLGen) is a) it's not free, but it's not expensive if compare to developers time b) it's does not support (and maybe will never support) some "noSQL" databases.. But it's just situation right now... Maybe LLBLGen team figure out some "sophistical" ways to handle this!? If so - we will all end with LLBLGen as our "abstraction" layer UNDER ORM :)

Bfn, hope this helps! :)

01 Aug 2010
10:41 AM

Thomas Maierhofer

I will seize on Alex Yakunin's statements. To understand his position, one should keep in mind that DataObjects.NET has a very different approach to other ORM. DO.NET treads the underlying database as a indexing engine and not a SQL Server. Therefore DO.NET already acts as a full featured "Business Logic Engine" to arbitrary storage.

When Bigtable or whatever can be used as a storage provider for DO.NET, there is no need for an additional abstraction layer above DO.NET. DO.NET is the abstraction layer itself.

@Ralf Westphal:

CQRS is at first a optimization technique for reading data from a strongly normalized data storage. It doesn't help you writing data to data storage. We have implemented such an pattern ten years ago in C++ and called it "Query Cache". Old whine in new skins.

01 Aug 2010
20:11 PM

evereq

Thomas Maierhofer:

Ha!

"DO.NET treads the underlying database as a indexing engine and not a SQL Server"... Is it really what you want to tell or I just miss something??? You want to tread full featured Oracle 11g database same way as in memory hashtable??

Reading documentation on DataObjects.NET site I found it's looks like it's not exactly the road... so?

In any case, I think it's too much :) .... It what I and think many others really want to avoid :D because I think if you try to go this way you really use very small subset of features available in relational databases (even MySQL have whole a LOT of features!!!) and your performance / scalability will simply near the floor!

You should work with SQL Server as with SQL Server and be able to use ALL features in your DAL that give you performance benefits when work with relational database!

You should go to very limited set of features in "hashtable" + MapReduce storage engines, but use whole a lot of partitioning, distribution and parallel processing here!

That is why it's really seems better approach to try build some abstraction under ORM with manual design / coding (and probably with some code generation using T4 or LLBLGen), than to just skip all features away and use only small subset :D

More so as far as I understand DataObjects.NET try to COUPLE together Business Layer and Data Layer so deeply that even people who think "The false myth of encapsulating data access in the DAL" will be not happy! :) Or maybe I am wrong? ;-)

02 Aug 2010
07:15 AM

Thomas Maierhofer

evereq:

First of all an indexing engine has nothing to do with an hash table. If you look at the source code of DO4 you will find the so called RECORD SET ENGINE (RSE). This is one abstraction layer from the storage provider.

There is a memory storage provider and maybe it has some hash table to store the data internally. There are also storage providers for other storage types, especially several SQL Servers.

Second: Performance of Data storage.

The basic performance of every data storage is determined by CRUD operations. In case of transactional storage, the commit can be added to the CRUD operations. In case of strongly normalized data and IPC (except embedded databases) it is the ability to receive the request, materialize result and send it back to the client.

All Database Systems use in fact nearly the same techniques to drive this operations fast on OS level (B-Trees, paging, caching, transaction log, delayed commits, bla bla bla).

Some database systems like BerkleyDB, end here, and some SqlServers use such an engine as an backend. E.G. MySQL use BerkleyDB as a backend. Every SQL Server has internally such an component which is in fact an indexing engine.

I just wonder how an SQL server will scale and perform better than his underlying indexing engine? Out of the vast amount of features, there are four things that gain an performance benefit:

batching, stored Procedures, triggers and aggregates like sum and avg. To put imperative code blocks (batching, SP, triggers) on the SQL server itself, you reduce the IPC roundtrips and give some context information to the query optimizer. The same is true for aggregates, which are computed on the server.

In fact the DDL is mainly used to declare the business logic for the application. Todays SQL servers are overbloatet with this features. XML, geodata ... what is all put on the back of the poor SQL Server. And the server should always perform well.

In the last years many projects trend to implement an app server as a middle tier and want to implement the business logic in this tier. More and more code is shifted out of stored procedures and triggers and is put in the app server itself. Buisness logic is build on top of ORM frameworks. So it is more than consequent to get all that BL stuff out of the SQL server and put it into the app server.

And what keeps from the SQL server? The indexing engine! Why shouldn't a ORM threat every data storage as an indexing engine?

02 Aug 2010
09:49 AM

Thomas Maierhofer

One word to tools like LLBLGen:

If you want to use such tools you must have a completely defined schema. You must set up your referential integrity and have all your business logic defined in constraints stored procedures and triggers.

I know there are nice entity relationship tools to do this and so on. In fact you do the work always twice. Once on the SQL server and once on the mapper or app server. I have done this for 15 years, and i am fed up with it.

I use DataObjects.NET because it is able to generate and maintain the underlying database schema itself. It can also map to existing schema via legacy mode.

Now it becomes quite cool:

I define my persistent classes and my schema will be generated.
I have my model in C# and i have less training effort for all this tools
i have my complete business logic in C# and encapsulated into the persistent class.
I have an frictionless interplay between persistent and (normal) transient objects
I only need standard version control for my CS project, i mustn't maintain any SQL scripts.
I can use all this nifty VS tools like ReSharper, AnkSVN and so on to maintain my persistent classes.
I have a well defined upgrade path which is coded in C# and under version control.

And last but not least: I can use all this database tools like backup / restore, isql and so on the normal way.

Finally, there is nothing wrong in coupling the business and data layer. This is the key feature of every SQL server with this referential integrity, stored procedures and triggers thing. SQL servers have got more and more features to couple the business logic and the data layer. Calculated columns, Java on the database, .NET on the database and so on. Seamless integration of BL into the data layer. But on a ORM framework you blame it?

An SQL Server is the right choice in 2-tier applications or so called client-server computing. There should be no BL on the client, so it is best to put it on the server.

In 3-tier appications it is best to get all the BL stuff in the middle tier, the app server. No BL on the client and no BL on the database server. Then the app server does what it intend to do, grant access for clients and enforce the business rules.

I simply tell my workmate: Look at the class if you want to know what this persistent objects does. All BL is encapsulated in it.

You tell your workmate: you should look on the class we generated for EF and the BL we sculpted over it. Then you should look on the mapping. And don't forget about the triggers and stored procedures, they contain a lot of business logic too. And if you change something, keep all that stuff in perfect sync. Keep in mind some update scripts will also exist and you should maintain them. Be very careful - ok it is best when you don't touch anything at all.

I call this a house of cards - in my opinion it is a bug in the development process!

02 Aug 2010
09:55 AM

Martin Aatmaa

@ Thomas Maierhofer:

CQRS is at first a optimization technique for reading data from a strongly normalized data storage.

Correct. And it also simplifies (SRP) your object model by splitting the read behaviour from the write.

It doesn't help you writing data to data storage.

CQRS as a pattern by itself does not. However, CQRS combined with Event Sourcing does. It, in fact, turns your write side into a ridiculously simple affair: an append only, read only, event (property bag) dump.

And that's something that can be abstracted well behind a good old DAL.

02 Aug 2010
12:23 PM

Dave

Don't need encapsulation anymore? Well, in that case I've missed something. Last time I checked NHibernate could not access LDAP , XML or document databases..

To quote Ayende self: "There is no database, there's only persistent storage'.

We write a lot of implementation solutions. And every companies uses different data sources, so we need to be very flexible in how we access these data sources. It's called a Data (notice the lack of 'base') Access Layer. In 3 or 4 years that could mean that my DAL also need to be able to access Raven-DB for example.

However I agree that most solutions don't need a seperate DAL project. Most of them will never even change from database provider. However stating that encapsulating database access in a DAL is a myth is also not correct..

02 Aug 2010
14:41 PM

Thomas Maierhofer

Martin Aatmaa:

Event Sourcing can turn the write side into a simple affair, but it mustn't. Especially if you have a complex BL a lot of code is affected on "simple" writes. This code must be somewhere.

02 Aug 2010
15:15 PM

Dmitry

I definitely agree with the article. A lot of people think that repositories are a part of the DDD model for some reason. They are actually a part of the infrastructure and there is usually very little reason to make them generic.

POCO simply allows class modeling without thinking too much about the data access.

02 Aug 2010
15:57 PM

Ayende Rahien

Dave,

See my post today, you can move DAL between different storage abstraction. Not unless you have KV API

03 Aug 2010
07:20 AM

Dave

Ayende,

Our 'DAL' solution exists of at least three projects: An infrastructure project that contains the entities (poco) and interfaces (IEmployeeRepository) and a project that implements these DAL interfaces and could use NHibernate if those entities are stored and retrieved from a relational database.

The third project is not really a DAL project, it's more a DI module. We have written a DI abstraction (looks a bit like Microsoft ServiceLocator, but is also able to register those configurations in a generic way). This module says that IEmployeeRepository is handled by SqlEmployeeRepository. Actually this is how we did things before we wrote our own 'per-entity' data access framework (EntityDirector).

Because of this the main application doesn't know if NHibernate (or any other ORM framework is used for that matter) is used. So the use of any NHibernate related API is prohibited in any part of the solution, except for the SQL Encapsulation implementation.

Our situation is vary rare. We're not in control of where the data is located, how it's stored and in what configuration. In most cases employees need to be retrieved from a Active Directory (There's is no 'login', the Window Identity determines what a user can or can't do). Customer contacts are usually handled by the helpdesk department and each contact moment is logged in a helpdesk database. The customer (account information) itself often needs to be retrieved from an IBM DB2 database.

Our application uses all these resources (data sources) to aggregate all that information and present them to the employees. So every entity can in theory have a different data source.

I'm not sure what you mean with KV API.

03 Aug 2010
07:26 AM

Ayende Rahien

Dave,

KV == Key/Value store.

What you describe is by no means a unique situation, by the way.

The problem is that you are trying to treat it as a single application. Instead, create separate services that each talk to one of those data sources. I'll have a separate post about it soon

17 Aug 2010
12:43 PM

Alex Yakunin

Sorry, just came back... Bad that blog doesn't send any comment notifications to all the commenters.

@evereq:

Sure it's a little bit "art" to build repositories interfaces right, so you not end with 1000s methods in each of say 20 repositories :) But it's just your "experience" counts here, not approach itself!

Probably, you didn't get the point - that's exactly what Ayende writes about. Likely, you can design your repository for N different storages, but the overall complexity of this attempt would overwhelm the profit you get from this.

I.e. it's better to focus on just few similar storages (e.g. relational), rather then trying to support all of them.

And about "1000s methods in each of say 20 repositories": count of such methods isn't the main problem. The main problem is that some of them won't be efficient enough on all the storages. Moreover, in real life you'll end up with the case when may be just 30% of these 1000s of methods are supported by each particular repository impl. - simply because other 60% won't be efficient enough to be used at all.

In fact, you're getting the case when you're joining a set of completely different APIs into a single one by uniting the sets of methods. IMHO, this is even worse then using an intersection of these subsets :) - SRP gets completely broken.

I think it should be stressed this post isn't about possibility of implementing such DAL. It's about practical efficiency of implementing it.

I wrote, "I have a question for guys pretending that ORM must be abstracted: can you name some widely used and relatively comlex application that really has an abstraction layer allowing to switch from one ORM to another with nearly zero code? Relatively complex means there are definitely more that 20 tables. Commercial application example would be even better."

The fact there is still no acceptable answer to this question proves the idea of this post pretty well. Things become completely different, when theory meets practice.

17 Aug 2010
13:07 PM

Alex Yakunin

@Thomas Maierhofer:

Concerning DO.NET and support of indexing storages by it: well, that's not a complete true.

The main benefit DO has here is actually a technical possibility to deal with most of storages based on indexes - simply because DO "underatands" all the underlying concepts there and properly translates the abstractions.

On the other hand, this doesn't mean you can migrate your application built with DO to e.g. BigTable without necessity to change its model. Or. better to say, you can migrate it, but in 99% of cases the result you'll get won't be acceptable enough.

E.g. BigTable is column-oriented distributed storage. To use it effeciently, you must represent all the data in such fashion that most of queries will touch just a small range of keys in some limited number of BigTables (or their partitions). But nearly any SQL application doesn't ensure this: even simple index lookup requires join by PK, if all required values aren't stored in index, not speaking about e.g. inheritance queries.

So normally, you must implement certain changes in your model to migrate from SQL to some distributed storage, and the fact that framework supports both types of storages doesn't mean this won't be necessary. Underlying concepts are too different to hide the details by unifying APIs here.

But, as far as I can judge, distributed storages is may be the only really different case. If we're speaking about relatively compact storages, nearly all of them can be represented as index storages. E.g. Azure SQL with its 50Gb limit is one of examples of such storages (think why Microsoft simply can't support unlimited database size here). Although note that targeting it, you should anyway think about such aspects as possible multi-tenancy initially.

17 Aug 2010
13:19 PM

Alex Yakunin

"All Database Systems use in fact nearly the same techniques to drive this operations fast on OS level (B-Trees, paging, caching, transaction log, delayed commits, bla bla bla)."

That's true, but you missed the fact there is something different with distributed storages:

You simply can't afford yourself to run a query sequentially processing all the data there (or having even linear complexity relatively to database size). If you have such a query, you may consider it will never complete. Map-reduce won't solve the problem: the total load on system anyway stays the same with map-reduce, but since rate of such queries and the resources they need are both proportional to count of users (= data size), you get N^2 here, so it's just a question of time when they'll flood the system.
You can't efficiently run e.g. lookup joins here: each lookup will, likely, hit a different partition, so the total cost of this operation will be quite high.

And so on.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB