That No SQL Thing: The relational modeling anti pattern in document databases

filter by tags archive

architecture (608) rss
bugs (450) rss
challanges (123) rss
community (378) rss
databases (481) rss
design (894) rss
development (640) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1445) rss
ravendb.net (529) rss
reviews (184) rss

2025
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Apr 20 2010

That No SQL ThingThe relational modeling anti pattern in document databases

time to read 15 min | 2852 words

I am going to demonstrate the design of the data model in a document database for a typical blog application.

The following is my default sample data model, showing a very simple blog:

The absolutely wrong approach with a document database is to try to take the relational model and apply it on a document level. This is especially wrong because for a while, it might actually work. Let us say that we want to store the following:

var user = new User("ayende");
var blog = new Blog("Ayende @ Rahien", user) { Tags = {".NET", "Architecture", "Databases" } };
var categoryRaven = new Category("Raven");
var categoryNoSQL = new Category("NoSQL");
var post = new Post(blog, "RavenDB", "... content ...")  
{
    Categories  = { categoryRaven, categoryNoSQL },
    Tags = {"RavenDB", "Announcements" }
};
var comment = new Comment(post, "Great news");

PersistAll(user, blog, categoryRaven, categoryNoSQL, post, comment);

Interestingly enough, I need to use code to represent the data without tying it to a particular storage format.

The wrong approach to store the data would be to store each object as its own document, similar to the way we would store each object as its own row in a relational database. That wrong approach would look like this:

// users/ayende
{
   "type": "user",
   "name": "ayende"
}

// tags/1
{
   "name": ".NET"
}

// tags/2
{
   "name": "Architecture"
}

// tags/3
{
   "name": "Databases"
}
// tags/4
{
   "name": "RavenDB"
}
// tags/5
{
   "name": "Announcements"
}
// categories/1
{
    "name": "Raven"
}
// categories/2
{
    "name" : "NoSQL"
}
// blogs/1
{
   "type": "blog",
    "users": ["users/ayende"],
    "name": "Ayende @ Rahien",
    "tags": ["tags/1", "tags/2", "tags/3"]
}

// posts/1
{
    "blog": "blogs/1",
    "title": "RavenDB",
    "content": "... content ...",
    "categories": ["categories/1", "categories/2"]
    "tags" : ["tags/4", "tags/5"]
}

// comments/1
{
    "post": "posts/1",
    "content": "Great News"
}

I know that I am repeating myself here, but I have seen people miss the point before. Do NOT try to model a document database in this way.

See full size image The main reason that this is wrong is that a document database has no real support for doing joins, unions or any of the things that make such a model work effectively in a relational model.

Let us try to analyze the scenarios where we need this data, okay?

Main page: show list of blogs
Main page: show list of recent posts
Main page: show list of recent comments
Main page: show tag cloud for posts
Main page: show categories
Post page: show post and all comments
Post page: add comment to post
Tag page: show all posts for tag
Categories page: show all posts for category

I am going to analyze each of those scenarios using SQL (and the above model) and the current (and bad, smelly, nasty) document database model. I’ll have another post showing how to correctly model this in a document database, this post is about how not to do it.

Main page: show list of blogs

Using SQL, this is pretty easy:

select * from blogs

Using DocDB, this is easy, we are using a built-in index to query for documents by their type:

docDb.Query<Blog>("DocumentsByType", query:"type:blog");

Main page: show list of recent posts

Using SQL, this is pretty easy:

select * from posts order by PostedAt desc

Using DocDB, we need to define our own index function, to allow use to sort on it. That is painless, even if I say so myself:

from doc in docs
where doc.type == "post"
select new {doc.posted_at}

And now we can query this using:

docDb.Query<Post>("Posts", orderBy:"-posted_at");

Main page: show list of recent comments

This is exactly the same as recent posts, so I’ll skip it.

Main page: show tag cloud for posts

Here the SQL grows interesting:

select Name, COUNT(*) as TagCount from tags
where ItemType = 'Posts'
group by Name

And with the document database we need to write a map/reduce index (“Did you just told me to go @%$# myself?”)

from doc in docs
where doc.type == "posts"
from tag in doc.tags
select new { tag, count = 1 }

from result in results
group result by result.tag into g
select new { tag = g.Key, count = g.Sum(x=>x.count) }

And now that we have the index, we can get the values from it using:

var tagCloud = new TagCloud();
var tagIds = docDb.Query<TagAndCount>("TagsCloud", orderBy:"+count");
foreach(var tagId in tagIds)
{
    var tag = docDb.Get<Tag>(tagId.Tag);
    tagCloud.Add(tag.Name, tagId.Count);
}

See full size image Now this is ugly on many levels. First, we have the fairly complex index. Second, we have to merge the data ourselves at the client side. Third, we have to perform a SELECT N+1.

Yuck doesn’t being to cover it. There are actually ways to handle this more nicely, by making a multi get request, but I’ll not bother.

Main page: show categories

Exactly the same as show blogs, so I’ll skip it.

Post page: show post and all comments

Using stupid SQL:

select * from Posts where Id = 1

select * from Comments where PostId = 1

A more efficient method would be to use a join:

select * from Posts 
  join Comments 
    on Posts.Id = Comments.Id
where Posts.Id = 1

With the doc db, we can do:

var post = docDb.Get<Post>(1);
var comments = docDb.Query<Comment>("CommentsByPost", query:"post_id:1", orderBy:"+posted_at");

Which, of course, require us to define the comments by post index:

from doc in docs
where doc.type == "comment"
select new{doc.post_id, doc.posted_at}

Note that we have to make two calls here, because a document database has no notion of joins.

Post page: add comment to post

In SQL, it is a straightforward insert:

insert into comments (PostId, ... )
values(1, ...)

And with a document database, you can use:

docDb.Store(new Comment{ PostId = 1, ... });
docDb.SaveChanges();

Nothing much to look at here, using this flawed model.

See full size image Tag page: show all posts for tag

Using sql, that is slightly complex, because tags may be associated with blogs or with posts, so we need to do:

select * from Posts 
where Id in (
    select ItemId from tags
    where ItemType = 'Posts' and TagId = 1
)

Using a document database:

var posts = docDb.Query<Post>("PostsByTag", query:"tag:tags/1");

With the following index:

from doc in docs
where doc.type == "post"
from tag in doc.tags
select new {tag}

Categories page: show all posts for category

This is exactly like tags, so I’ll skip it.

As you’ve seen, by copying the relational model, we have created just the same sort of environment that we already had with RDBMS, but now we are faced with the problem that a document database can’t do things that a relational database can. In my eyes, what we have done is a net lose. Oh, we may gain some small benefit by being schemaless, but that isn’t really that beneficial in compared to the amount of effort that we have to go to by trying to be relational on a non relational database.

Tweet Share Share 11 comments

Tags:

Comments

20 Apr 2010
11:41 AM

Demis Bellot

I think your solution is very much tied to RavenDB because your trying to use querying abilities built into the api.

In other NoSql db's that support rich data constructs (e.g. Redis, MongoDb etc), the way I'd achieve the same result is by storing posts in a sorted set ordered by date.

For anywhere I require a reference, I would create a 2-way link (i.e. bi-directional reference) storing the tag with the post and at the same time a list of posts with each tag the post has. That way you can query both indexes for the tags a particular post has as well as all posts matching a particular tag.

This just goes back to the general NoSQL mindset you should have where you need to know what querying functionality you require up front so at the time you're storing data you can update the 'query indexes' accordingly.

Other benefits you've missed is the speed increase which depending on your situation allows you to dispense of any caching logic or real-time/accuracy of the data your displaying.

Antirez (the author of Redis) has an interesting post on his VM solution in which he talks about the Redis Virtual Machine implementation where Redis is like a 'Reversed Mysql+Memcached solution'. Where the data you frequently access is kept in memory and the Least Recently Used items are swapped out to disk:

antirez.com/post/redis-virtual-memory-story.html

20 Apr 2010
12:44 PM

Edin

I guess, we can expect another post explaining how to actually do correct design. If a DocDb is used the way you described above, I really don't see any benefit of it. OK, maybe speed.

I think that Demis kind of showed some directions by saying that posts could be stored in sorted set and ordered by date. Also, I like the idea of 2-way link between tags and posts.

20 Apr 2010
13:39 PM

Ayende Rahien

Demis,

You did notice all the warning about this being the WRONG way about it, right?

Yes, I am using Raven's querying ability to sort of make it work, but I don't really like the solution.

There is no need to do a bidirectional association here with tag/posts, by the way, I outline how to do this in tomorrow's post.

I talk a lot about setting up the indexes based on your needs then.

And I don't want to talk about advantages based on this model, it is like saying that getting punched in the gut it better than being shot, I would rather avoid both

20 Apr 2010
13:39 PM

Ayende Rahien

Edin,

Of course you can, in fact, look at the future posts queue

20 Apr 2010
15:09 PM

Demis Bellot

@Ayende I see you've articulated it being the 'wrong' way to try to replace a RDBMS solution but my conclusion I've drawn from the wording of your original post is that blog posts do not belong in a NoSQL database but are rather best left in an RDBMS, not that a more ideal solution was already written and in your future post queue (which I still don't understand how people can blog so much that they have a future post-queue, as once a month takes enough time out of me :).

it is like saying getting punched in the gut it better than being shot, I would rather avoid both

I assume by this that you have a more appropriate solution forthcoming, I would still be doing bi-directional referencing in NoSQL databases but am looking forward to your proposed solution as well.

20 Apr 2010
20:22 PM

Tyler Burd

Ha. I mistook one sentence for "Oh, we may gain some small benefit by being shemales".

Can't wait to see the "right" way post.

20 Apr 2010
20:37 PM

Ayende Rahien

Tyler,

The right way certainly include no shemales.

20 Apr 2010
21:46 PM

Frank Quednau

I prefer fembots to shemales, tbh.

Yes, that looked pretty ugle at times. Let's see what comes next.

29 Apr 2010
08:35 AM

Marco

I know that thinking "relational" is not the best approach here, but I'm wondering how can you handle an N to N relation between 2 documents type which are indipendent (they are indipendent entity) but they can be linked, and when they are linked then someone can put one or more comments explaining why they are linked together...is it something that RavenDB can solve or I should stick with the rational DB?

Maybe a mix of the two could be the best in my opinion, model the documents with RavenDB but handle the relations using a "relation document" used to express the relation and which contain the comments...

It sounds like an interesting idea ;)

29 Apr 2010
11:17 AM

Ayende Rahien

Marco,

You are thinking about graph databases.

I will probably add some support for Raven to do so, but that is in the future.

29 Apr 2010
13:11 PM

Marco

Opss, I didn't know about this kind of DB, shame on me.

They are exactly what I was looking for.

If you add this functionality to RavenDB then it will be just brilliant, looking forward to it.

Thanks a lot

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

That No SQL ThingThe relational modeling anti pattern in document databases

More posts in "That No SQL Thing" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

More posts in "That No SQL Thing" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication