Ayende @ Rahien

Refunds available at head office

Designing a document database: View syntax

The choice of using Linq queries as the default syntax was not an accident. If you look at how Couch DB is doing things, you can see that the choice of Javascript as the query language can cause some really irritating imperative style coding. For example, look at this piece of code:

function(doc) {
  if (doc.type == "comment") {
    map(doc.author, {post: doc.post, content: doc.content});
  }
}

This works, and it allows for some really complicated solutions, but it comes with its own set of problems. Unlike Couch DB, I actually want to enforce a schema for the views, and I need to be able to tell that schema at view creation time. This is partly because of the storage engine choice, and partly because the imperative style means that it is very easy to violate some of the map reduce required behaviors, such as repeatability of the results (by querying a separate data source, for example).

Linq queries are not imperative, they are a good way of expressing set based logic in a really nice way, while still allowing for an almost embarrassingly complex set of problems to be expressed with them. More than that, Linq queries are strongly typed, provide me with a whole bunch of information and allow me to do some really interesting things along the way, some of which we will talk about later. There is also the issue of how easy it would be to utilize such things as PLinq, or that the extensibility story for the DB becomes much easier with this scenario, or that at least in a theoretical perspective, the performance that we are talking about here should be much better than a Javascript based solution. 

Another property of Linq that I considered, much as I am loath to admit it in such a public forum is the marketing aspect of it. A linq-driven database is sure to get a lot of attention, you only have to look at the number of comment on the previous posts in this topic, compare those with linq queries to those without the linq queries. The difference is quite astounding.

All in all, it sounds like an impressive amount of reason to go with Linq.

The problem, of course, is that Linq implies C#, and I don’t really think that C# is the best language for doing language oriented programming. This time, however, we have the major advantage that the domain concepts that we want are already built into the language, so we don’t really need a lot of tweaking here to get things exciting.

I posted about the syntax before, but I don’t think that a lot of people actually got what I meant. Here is the entire view definition:

image 

It is not a snippet, and it is not a part of something larger that I am not showing. This is the view. And yes, it is not compliable on its own. Nor do I imagine that we will see people writing this code in Visual Studio. Or, at least, I imagine that it will be written there, but it will not stay there.

Much like in Couch DB today, you are going to have to create the view on the server, and you do that by creating a specially named document, which will contain this syntax as its content.

Internally, we are going to do some interesting things to it, but I think that I can stop now by just showing your the first stage, what happens to the view code after preprocessing it:

image

Readers of my book should recognize the pattern, I am using the notion of Implicit Base Class here to get us an executable class, which we can now compile and execute at will. Note that the query itself was modified, to make it compliable. We can now proceed to do additional analysis of the actual query, generate the fixed schema out of it, and start doing the really interesting things that we want to do.

But I have better leave those for another post…

Comments

configurator
03/16/2009 12:58 AM by
configurator

Now I finally see what you mean. Here, you're actually using a DSL that looks like C# and compiles to C#.

Two things come to mind:

  1. I'm not sure the definition should be "var pagesByTitleAndVersion", but "view pagesByTitleAndVersion". That's because you're not generating a variable but a class, and 'var' is a bit confusing in this syntax.

  2. Whenever we want any complex check (that is, any check other than == or !=), we'd need to cast the return type. i.e. we'd need an example view to be:

var oldPages = from doc in docs

where doc.Type == "page"

where (DateTime)doc.CreationTime > DateTime.Now.AddDays(-3);

or something like that. Any way to easily remove that cast? It would be rather cumbersome on the otherwise elegant syntax you're using.

Ayende Rahien
03/16/2009 04:51 AM by
Ayende Rahien

Configurator,

  1. the code totally ignores the variable type, feel free to put whatever you want there.

  2. Yes, changing the syntax would be pretty easy, yes.

Rafal
03/16/2009 09:59 AM by
Rafal

Ayende, I've got a question about transaction management in your document db. Are you going to support transactions at all? Can they cover updates to multiple documents? What about distributed transactions?

Frank Quednau
03/16/2009 10:15 AM by
Frank Quednau

Thanks for clarifying this. I suppose the "document" can carry arbitrary properties definable by some user of your DB. Could this not be expressed as an interface at runtime? ...if one would provide some kind of editor to define the view, i could imagine it possible that, once you have said interface for your document, the first expression could be compilable. Additionally, you know how to fill an empty interface with life (Dynamicproxy2, etc.)?

Anyway, let's see what follows on !

Ayende Rahien
03/16/2009 10:23 AM by
Ayende Rahien

Rafal,

I intend to support transactions, including over multiple docs in the same batch.

No DTC planned.

Ayende Rahien
03/16/2009 10:24 AM by
Ayende Rahien

Frank,

Documents don't have to share schema, though.

Trying to express this as interface would lead to a whole bunch of trouble.

yug
03/16/2009 12:06 PM by
yug

I'm still not sure of the original syntax. With Intellisense and Resharper etc isn't the original syntax going give you all sorts of visual issues in the editor? Why not just go with the dictionary syntax and save the hassle? Plus it stops the confusion of appearing to be strongly typed...

Rafal
03/16/2009 12:40 PM by
Rafal

Question: can a view contain more rows than the underlying document database? For example: assume an invoice database (each document is an invoice with buyer's and seller's Tax ID). I want to create index: Tax ID -> #of invoices, where tax id can belong either to buyer or seller. In worst case scenario, unique tax IDs in every invoice, we'll have index with 2N entries. How view syntax would look like?

Ayende Rahien
03/16/2009 01:24 PM by
Ayende Rahien

Yug,

You missed the part about there being no editor?

Ayende Rahien
03/16/2009 01:54 PM by
Ayende Rahien

Rafal,

look at the next post

Comments have been closed on this topic.