That No SQL ThingDocument Databases – usages
I described what a document database is, but I haven’t touched about why you would want to use it.
The major benefit, of course, is that you are dealing with documents. There is little or no impedance mismatch between DTOs and documents. That means that storing data in the document database is usually significantly easier than when using an RDBMS for most non trivial scenarios.
It is usually quite painful to design a good physical data model for an RDBMS, because the way the data is laid out in the database and the way that we think about it in our application are drastically different. Moreover, RDBMS has this little thing called Schemas. And modifying a schema can be a painful thing indeed.
Sidebar: One of the most common problems that I find when reviewing a project is that the first step (or one of them) was to build the Entity Relations Diagram, thereby sinking a large time/effort commitment into it before the project really starts and real world usage tells us what we actually need.
The schemaless nature of a document database means that we don’t have to worry about the shape of the data we are using, we can just serialize things into and out of the database. It helps that the commonly used format (JSON) is both human readable and easily managed by tools.
A document database doesn’t support relations, which means that each document is independent. That makes it much easier to shard the database than it would be in a relational database, because we don’t need to either store all relations on the same shard or support distributed joins.
Finally, I like to think about document databases as a natural candidate for DDD and DDDish (DDD-like?) applications. When using a relational database, we are instructed to think in terms of Aggregates and always go through an aggregate. The problem with that is that it tends to produce very bad performance in many instances, as we need to traverse the aggregate associations, or specialized knowledge in each context. With a document database, aggregates are quite natural, and highly performant, they are just the same document, after all.
I’ll post more about this issue tomorrow.
More posts in "That No SQL Thing" series:
- (03 Jun 2010) Video
- (14 May 2010) Column (Family) Databases
- (09 May 2010) Why do I need that again?
- (07 May 2010) Scaling Graph Databases
- (06 May 2010) Graph databases
- (22 Apr 2010) Document Database Migrations
- (21 Apr 2010) Modeling Documents in a Document Database
- (20 Apr 2010) The relational modeling anti pattern in document databases
- (19 Apr 2010) Document Databases – usages
Comments
Ralf,
We actually have a pretty strong client side API.
I would love to get your comments on it.
I think one of the other advantage of using NOSQL database is that the data can be consumed by any framework since it is kept in the form of document.
This means you can put data using .NET Framework and get the same data out using Ruby or Python framework.
@Mohammad, so long as the framework you are using doesn't go too abstract. I find a lot of times trying to interact with a single system from different frameworks isn't always so easy. Example: using memcached from .Net, Java, and PHP. Depending on how the key hashes are generated can yeild very different results.
So how do you handle the case where you want to change the layout of a particular type of document? e.g. to support a new feature. Do you just have a process that upgrades all the documents in one go? Or do you end up with lots of conditionals in your deserialization code?
MF,
You'll have your answer in 3 days.
In short, you never have conditional in deserialization code if you do it right.
Good point Mohammad - since it's stored as JSON, it shouldn't much matter what the client is.
Ayende, would your client api include calls from , ie. a javascript call ?
I look forward to hearing more on the topic
Timing can't be better!
I've been exploring document-db lately, less about the technology itself, instead mostly on usage pattern, the best way to take its benefit, and how to apply things that we have taken for granted in rdbms-orm duet (e.g. transactions, n-level cache, lazy-load, join-load, stuff like that).
Looking forward to your next posts
Ah btw... anonymous class in C# 3 and the new dynamic feature in C# 4 are the things that have made document-db to be a natural fit with .net applications... I mean, amazingly natural!
It is inevitable that document-db will now start gaining traction with the current state of .net language capability.
Aggregates are quite natural for document databases? How? They are not supported by the db engine - if you want an aggregate you have to do everything yourself. In this way of thinking also statistics and reporting are 'natural' for document databases - provided that you bring in the missing data processing functionality.
Mongo does have support for basic aggregate functions...
Complex aggregate operations can cause performance problem with huge data, they're normally solved using map/reduce across distributed processing power.
As to working with aggregates, shouldn't an object database work equally well in this respect?
Steve,
Well, using jQuery's API, here is how you insert a document:
$.ajax({
method: 'PUT',
dataType: 'json',
url: ' http://localhost:8080/docs/users/',
data: { name: 'ayende' }
});
The Web UI for Raven is composed solely of calls like this one.
And yes, there is even a wrapper around that to give you things like EditDocument, GetDocumentsPage, etc.
Rafal,
a) I was talking about DDD Aggregates, not Aggregation in general.
b) Most Document Database has some support for aggregation. They call it map reduce, but it is the same thing.
Frank,
Probably, I am not sure how you set things up in a object database to control the scope of storage to reduce the number of remote calls
And as I am not an expert on OODB, I don't really know.
Wayne,
You might want to read the previous posts in the series, I am laying out a lot of information about how and why you want to use this.
I have just spent the day telling the CIO why a NoSql/Document database would be a bad place to store production reporting data 30000+ records per hour.
He has read some blog posts saying that there is the solution to all storage needs and has reduced/no maintenance costs associated with NOSQL. So everything has to be converted
Wayne,
At a rate of 30,000 new records an hour, after a year you'll have 262,800,000 records.
I am not sure what you intend to do with them, but assuming that each row is 128 bytes in size (guid, couple of dates, an ip, maybe a url), you will have about 30 GB of data per year. I wouldn't worry about that.
As for whatever a NoSQL solution would be good or not, that is impossible to say without more data :-)
Can the xml type of sql server, with help of xml indexes and Xquery be called a document oriented storage ?
Sebastien,
It might, but that wouldn't probably do what you want.
Comment preview