Ayende @ Rahien

It's a girl

That No SQL Thing: Document Databases – usages

I described what a document database is, but I haven’t touched about why you would want to use it.

The major benefit, of course, is that you are dealing with documents. There is little or no impedance mismatch between DTOs and documents. That means that storing data in the document database is usually significantly easier than when using an RDBMS for most non trivial scenarios.

It is usually quite painful to design a good physical data model for an RDBMS, because the way the data is laid out in the database and the way that we think about it in our application are drastically different. Moreover, RDBMS has this little thing called Schemas. And modifying a schema can be a painful thing indeed.

Sidebar: One of the most common problems that I find when reviewing a project is that the first step (or one of them) was to build the Entity Relations Diagram, thereby sinking a large time/effort commitment into it before the project really starts and real world usage tells us what we actually need.

The schemaless nature of a document database means that we don’t have to worry about the shape of the data we are using, we can just serialize things into and out of the database. It helps that the commonly used format (JSON) is both human readable and easily managed by tools.

A document database doesn’t support relations, which means that each document is independent. That makes it much easier to shard the database than it would be in a relational database, because we don’t need to either store all relations on the same shard or support distributed joins.

Finally, I like to think about document databases as a natural candidate for DDD and DDDish (DDD-like?) applications. When using a relational database, we are instructed to think in terms of Aggregates and always go through an aggregate. The problem with that is that it tends to produce very bad performance in many instances, as we need to traverse the aggregate associations, or specialized knowledge in each context. With a document database, aggregates are quite natural, and highly performant, they are just the same document, after all.

I’ll post more about this issue tomorrow.

Comments

Ayende Rahien
04/19/2010 09:53 PM by
Ayende Rahien

Ralf,

We actually have a pretty strong client side API.

I would love to get your comments on it.

Mohammad Azam
04/19/2010 10:30 PM by
Mohammad Azam

I think one of the other advantage of using NOSQL database is that the data can be consumed by any framework since it is kept in the form of document.

This means you can put data using .NET Framework and get the same data out using Ruby or Python framework.

Michael J. Ryan
04/19/2010 11:06 PM by
Michael J. Ryan

@Mohammad, so long as the framework you are using doesn't go too abstract. I find a lot of times trying to interact with a single system from different frameworks isn't always so easy. Example: using memcached from .Net, Java, and PHP. Depending on how the key hashes are generated can yeild very different results.

MF
04/20/2010 01:24 AM by
MF

So how do you handle the case where you want to change the layout of a particular type of document? e.g. to support a new feature. Do you just have a process that upgrades all the documents in one go? Or do you end up with lots of conditionals in your deserialization code?

Ayende Rahien
04/20/2010 01:53 AM by
Ayende Rahien

MF,

You'll have your answer in 3 days.

In short, you never have conditional in deserialization code if you do it right.

Steve
04/20/2010 04:55 AM by
Steve

Good point Mohammad - since it's stored as JSON, it shouldn't much matter what the client is.

Ayende, would your client api include calls from , ie. a javascript call ?

I look forward to hearing more on the topic

Hendry Luk
04/20/2010 05:16 AM by
Hendry Luk

Timing can't be better!

I've been exploring document-db lately, less about the technology itself, instead mostly on usage pattern, the best way to take its benefit, and how to apply things that we have taken for granted in rdbms-orm duet (e.g. transactions, n-level cache, lazy-load, join-load, stuff like that).

Looking forward to your next posts

Hendry Luk
04/20/2010 05:32 AM by
Hendry Luk

Ah btw... anonymous class in C# 3 and the new dynamic feature in C# 4 are the things that have made document-db to be a natural fit with .net applications... I mean, amazingly natural!

It is inevitable that document-db will now start gaining traction with the current state of .net language capability.

Rafal
04/20/2010 06:11 AM by
Rafal

Aggregates are quite natural for document databases? How? They are not supported by the db engine - if you want an aggregate you have to do everything yourself. In this way of thinking also statistics and reporting are 'natural' for document databases - provided that you bring in the missing data processing functionality.

Hendry Luk
04/20/2010 06:24 AM by
Hendry Luk

Mongo does have support for basic aggregate functions...

Complex aggregate operations can cause performance problem with huge data, they're normally solved using map/reduce across distributed processing power.

Frank Quednau
04/20/2010 07:38 AM by
Frank Quednau

As to working with aggregates, shouldn't an object database work equally well in this respect?

Ayende Rahien
04/20/2010 08:42 AM by
Ayende Rahien

Steve,

Well, using jQuery's API, here is how you insert a document:

$.ajax({

method: 'PUT',

dataType: 'json',

url: ' http://localhost:8080/docs/users/',

data: { name: 'ayende' }

});

The Web UI for Raven is composed solely of calls like this one.

And yes, there is even a wrapper around that to give you things like EditDocument, GetDocumentsPage, etc.

Ayende Rahien
04/20/2010 08:43 AM by
Ayende Rahien

Rafal,

a) I was talking about DDD Aggregates, not Aggregation in general.

b) Most Document Database has some support for aggregation. They call it map reduce, but it is the same thing.

Ayende Rahien
04/20/2010 08:45 AM by
Ayende Rahien

Frank,

Probably, I am not sure how you set things up in a object database to control the scope of storage to reduce the number of remote calls

And as I am not an expert on OODB, I don't really know.

Ayende Rahien
04/20/2010 01:30 PM by
Ayende Rahien

Wayne,

You might want to read the previous posts in the series, I am laying out a lot of information about how and why you want to use this.

Wayne
04/20/2010 02:33 PM by
Wayne

I have just spent the day telling the CIO why a NoSql/Document database would be a bad place to store production reporting data 30000+ records per hour.

He has read some blog posts saying that there is the solution to all storage needs and has reduced/no maintenance costs associated with NOSQL. So everything has to be converted

Ayende Rahien
04/20/2010 03:00 PM by
Ayende Rahien

Wayne,

At a rate of 30,000 new records an hour, after a year you'll have 262,800,000 records.

I am not sure what you intend to do with them, but assuming that each row is 128 bytes in size (guid, couple of dates, an ip, maybe a url), you will have about 30 GB of data per year. I wouldn't worry about that.

As for whatever a NoSQL solution would be good or not, that is impossible to say without more data :-)

sebastien
04/26/2010 02:58 PM by
sebastien

Can the xml type of sql server, with help of xml indexes and Xquery be called a document oriented storage ?

Ayende Rahien
04/26/2010 03:08 PM by
Ayende Rahien

Sebastien,

It might, but that wouldn't probably do what you want.

Comments have been closed on this topic.