Designing a document databaseView syntax

time to read 5 min | 973 words

The choice of using Linq queries as the default syntax was not an accident. If you look at how Couch DB is doing things, you can see that the choice of Javascript as the query language can cause some really irritating imperative style coding. For example, look at this piece of code:

function(doc) {
  if (doc.type == "comment") {
    map(doc.author, {post: doc.post, content: doc.content});
  }
}

This works, and it allows for some really complicated solutions, but it comes with its own set of problems. Unlike Couch DB, I actually want to enforce a schema for the views, and I need to be able to tell that schema at view creation time. This is partly because of the storage engine choice, and partly because the imperative style means that it is very easy to violate some of the map reduce required behaviors, such as repeatability of the results (by querying a separate data source, for example).

Linq queries are not imperative, they are a good way of expressing set based logic in a really nice way, while still allowing for an almost embarrassingly complex set of problems to be expressed with them. More than that, Linq queries are strongly typed, provide me with a whole bunch of information and allow me to do some really interesting things along the way, some of which we will talk about later. There is also the issue of how easy it would be to utilize such things as PLinq, or that the extensibility story for the DB becomes much easier with this scenario, or that at least in a theoretical perspective, the performance that we are talking about here should be much better than a Javascript based solution. 

Another property of Linq that I considered, much as I am loath to admit it in such a public forum is the marketing aspect of it. A linq-driven database is sure to get a lot of attention, you only have to look at the number of comment on the previous posts in this topic, compare those with linq queries to those without the linq queries. The difference is quite astounding.

All in all, it sounds like an impressive amount of reason to go with Linq.

The problem, of course, is that Linq implies C#, and I don’t really think that C# is the best language for doing language oriented programming. This time, however, we have the major advantage that the domain concepts that we want are already built into the language, so we don’t really need a lot of tweaking here to get things exciting.

I posted about the syntax before, but I don’t think that a lot of people actually got what I meant. Here is the entire view definition:

image 

It is not a snippet, and it is not a part of something larger that I am not showing. This is the view. And yes, it is not compliable on its own. Nor do I imagine that we will see people writing this code in Visual Studio. Or, at least, I imagine that it will be written there, but it will not stay there.

Much like in Couch DB today, you are going to have to create the view on the server, and you do that by creating a specially named document, which will contain this syntax as its content.

Internally, we are going to do some interesting things to it, but I think that I can stop now by just showing your the first stage, what happens to the view code after preprocessing it:

image

Readers of my book should recognize the pattern, I am using the notion of Implicit Base Class here to get us an executable class, which we can now compile and execute at will. Note that the query itself was modified, to make it compliable. We can now proceed to do additional analysis of the actual query, generate the fixed schema out of it, and start doing the really interesting things that we want to do.

But I have better leave those for another post…

More posts in "Designing a document database" series:

  1. (17 Mar 2009) What next?
  2. (16 Mar 2009) Remote API & Public API
  3. (16 Mar 2009) Looking at views
  4. (15 Mar 2009) View syntax
  5. (14 Mar 2009) Aggregation Recalculating
  6. (13 Mar 2009) Aggregation
  7. (12 Mar 2009) Views
  8. (11 Mar 2009) Replication
  9. (11 Mar 2009) Attachments
  10. (10 Mar 2009) Authorization
  11. (10 Mar 2009) Concurrency
  12. (10 Mar 2009) Scale
  13. (10 Mar 2009) Storage