Designing a document database: Views

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Mar 12 2009

Designing a document databaseViews

time to read 3 min | 516 words

One of the more interesting problems with document databases is how you handle views. But a lot of people already had some issues with understanding what I mean with document database (hint, I am not talking about a word docs repository), so I have better explain what I mean by this.

A document database stores documents. Those aren’t what most people would consider as a document, however. It is not excel or word files. Rather, we are talking about storing data in a well known format, but with no schema. Consider the case of storing an XML document or a Json document. In both cases, we have a well known format, but there is not a required schema for those. That is, after all, one of the advantages of document db’s schema less nature.

However, trying to query on top of schema less data can be… problematic. Unless you are talking about lucene, which I would consider to be a document indexer rather than a document DB, although it can be used as such. Even with lucene, you have to specify the things that you are actually interested on to be able to search on them.

So, what are views? Views are a way to transform a document to some well known and well defined format. For example, let us say that I want to use my DB to store wiki information, I can do this easily enough by storing the document as a whole, but how do I lookup a page by its title? Trying to do this on the fly is a receipt for disastrous performance. In most document databases, the answer is to create a view. For RDMBS people, a DDB view is often called a materialized view in an RDMBS.

I thought about creating it like this:

Please note that this is only to demonstrate the concept, actually implementing the above syntax requires either on the fly rewrites or C# 4.0

The code above can scan through the relevant documents, and in a very clean fashion (I think), generate the values that we actually care about. Basically, we now have created a view called “pagesByTitleAndVersion”, index by title (ascending) and version (descending). We can now query this view for a particular value, and get it in a very quick manner.

Note that this means that updating views happen as part of a background process, so there is going to be some delay between updating the document and updating the view. That is BASE for you :-)

Another important thing is that this syntax is for projections only. Those are actually very simple to build. Well, simple is relative, there is going to be some very funky Linq stuff going on in there, but from my perspective, it is fairly straightforward. The part that is going to be much harder to deal with is aggregation. I am going to deal with that separately, however.

Tweet Share Share 17 comments

Tags:

Databases

Comments

12 Mar 2009
20:04 PM

Rafal

Ok, now I see I want it. Are you going to release it open-sourcely?

12 Mar 2009
20:09 PM

Rafal

And BTW, aren't views a good candidate to be based on RDBMS? It would save you implementation of joins, sorting, query capabilities... performance shouldn't be a problem since they could be optimized for querying.

12 Mar 2009
20:19 PM

meisinger

great series...

one question, however, why would you have to wait until C# 4.0?

what are they doing to the CLR or to the code base that would allow for this to happen?

i guess the bigger question really would be... why would you have to wait for anything to be able to execute that kind of code?

wouldn't the view be "updated" anytime that a document was persisted? and if so wouldn't the information needed to update the view already be in the context that you are working in?

or is the issue that the document db doesn't know how to update the view

i am starting to loose my mind

LOL

12 Mar 2009
20:35 PM

Ben Smith

Have you considered using a map/reduce style for defining views as per CouchDB ( http://couchdb.apache.org/)? The view quering language used there is JavaScript and documents are stored as JSON as you're suggesting

12 Mar 2009
21:00 PM

yug

Fascinating series... Thank you for sharing. It's not clear to me (from this post) where the updating of the view indexes are happening?

12 Mar 2009
21:37 PM

josh

@Rafal, i think you'd have to be concerned about changes to a document's title in the doc db if that view was stored in an RDBMS.

13 Mar 2009
00:07 AM

configurator

meisinger: Perhaps dynamic is necessary, because the documents are untyped and you are accessing their properties (Type, Title, Version)?

13 Mar 2009
01:28 AM

Ayende Rahien

Mike,

strong type sucks

13 Mar 2009
01:36 AM

Ayende Rahien

Rafal,

when it is built, probably

views aren't a good idea on top of RDBMS. the view generation tend to take a lot of time in many scenarios. Remember that the data itself is schema less, so a lot of the RDBBMS advantages are just not there.

Ben,

Wait for it...

I think that I have better solution than the javascript in couch.

Yug,

You don't see where updating the views happens because I haven't discussed it yet, wait for it...

13 Mar 2009
01:45 AM

Ayende Rahien

Configurator,

Exactly!

13 Mar 2009
01:50 AM

configurator

Ayende, for C# 3 wouldn't a Dictionaryish syntax be best

doc["Type"], doc["Title"], doc["Version"]

The disadvantage with this syntax is that it is not strongly typed - but neither is the dynamic syntax.

13 Mar 2009
02:23 AM

configurator

Another option is to use some sort of duck typing for your docs

For that query, all docs must support a certain interface (with Type, Title and Version as properties). I.e. it is possible to create an interface for your DDB where it is queried as such:

DDB.GetDocumentsAs <tinterface();

The interface as a type parameter causes a duck class to be generated by the DDB, mapping the given properties into their matching indexer.

13 Mar 2009
09:28 AM

Haxen

First, I like this series of posts!

This that you are trying to accomplish reminds me a lot of Lotus Notes which doesn't bring back good memories. ;-)

13 Mar 2009
11:39 AM

Ryan Roberts

This is a fascinating series of articles Oren, thankyou.

I have a similar requirement to document database views coming up in my current work. We need view support for a system built on top of db4o to support various reporting scenarios and things like incremental search in a performant way. Lucene is my toy of choice for things like this, and we are allready using it for complex queries. It will be great to see how you tackle your query language / criteria for this.

And tuples can't come soon enough for me.

13 Mar 2009
13:43 PM

huey

This is an interesting series, and now I finally grasp what its all about :)

I have been reading about azure storage tables and one of the ideas is partitions -- queries over a single partition are fast, queries over multiple partitions are slow. The problem is that a partition that optimizes a query over one property might suck for a query over a different property.

A solution would be to replicate the the data (or reference to) with different partitions optimized per property you want to query on. I have no idea if this is optimal (hoping some good good use practices for azure storage are shown at MIX). The problem is if you want to add a property or new query or even insert new data these multiple partitions can be out of sync easily.

Seeing this post about views finally makes it all click. This is very interesting stuff with regards to scaling.

13 Mar 2009
17:18 PM

Ayende Rahien

configurator,

But that syntax is just ugly

13 Mar 2009
17:20 PM

Ayende Rahien

Ryan,

I am actually looking for someone to sponsor the development of this :-)

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Designing a document databaseViews

More posts in "Designing a document database" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Designing a document database" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication