Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 596 words

So far I posted quite a few of posts about building the document database. To be frank, the reason that I did this is because the idea has been bouncing in my head a lot recently, and sitting down and actually thinking about it has been great, especially since now I have the design dancing in my head, shiny & beautiful. Here is the full list, in case you missed anything:

  1. Schema-less databases
  2. Designing a document database
  3. Designing a document database: Storage
  4. Designing a document database: Scale
  5. Designing a document database: Authorization
  6. Designing a document database: Concurrency
  7. Designing a document database: Attachments
  8. Designing a document database: Replication
  9. Designing a document database: Views
  10. Designing a document database: Aggregation
  11. Challenge: C# Rewriting
  12. Designing a document database: Aggregation Recalculating
  13. Designing a document database: View syntax
  14. Designing a document database: Remote API & Public API

A few days ago I asked on twitter what do people think, do I have this written up yet or not. Opinions seems to be divided on this score. Let me try to set the record straight. I have a lot of scattered code around this, yes. But it is not a project, is is a lot of tiny experiments to prove that one approach or the other would work. This series of posts has required a lot of research. But I don’t have anything that is even remotely close to a working system.

I am estimating that it would take a month or two to take this from the drawing board to something that I would be willing to use in production*. This if full time work, by the way. It is likely that I can get something usable faster than that, depending on your definition of usable :-). Most of the challenge is going to be in implementing the views, as I see it now. Everything else seems to be pretty straightforward.

That is somewhat of a problem. I don’t really want to spend several months (and the associated support costs afterward) to build an open source project. The main issue is that while it is fun, there is simply no money in it, and I heard that eating is mandatory. On the other hand, I don’t really see something like that selling as a commercial package. This is infrastructure, and infrastructure has been commoditized. The ideal solution from my point of view is what we tried to do with Linq to NHibernate. Getting a company, or several companies, to sponsor its development as an OSS project.

The motivation would be the same as usual, this is something that the aforementioned companies need, and are willing to pay for. It didn’t end up the way I expected it with Linq for NHibernate, but it ended up very well after all, so I am happy about that.

Oh, and as an aside, if you want more posts in this series, do suggest a few topics that you want to hear about.

* Just to give you an idea about the complexity involved, I estimated Linq to NHibernate to be about 3 months.

time to read 3 min | 474 words

One of the greatest challenges that we face when we try to design API is balancing power, flexibility, complexity, extensibility and version tolerance. There is a set of tradeoffs that we have to make in order to make sure that we design an API, and selecting the right tradeoffs impacts the way  that users will work with the software.

One of the decisions that I made about the docs db is that it would be actually have two published API. The first would be the remote API, accessible for clients, and the second would be a public API, accessible for people who want to extend the db. I consider extending the db to be a very common operation, for doing things like adding more functions for the views, supplying the sharding function, etc. I am assuming that most people who would want to do this extension would be .Net devs, so I am just going to use my usual style here and expose some extension points so it will be possible to mix & match things.

For the remote API, we need a way to add / remove documents, get a document and query a view. I would like that part of the API to be accessible from any language, even if I don’t foresee it getting used much in other languages. The idea of being able to access it from JavaScript, and making the entire application hosted through the DB is actually a very interesting idea, see CouchApp for more details. Making it accessible for everyone is a pretty good idea, and I think that adopting’s a REST API similar to the one that CouchDB is using would be a good choice. Something that I would be extremely interested on, as a matter of fact, would be to make use of Astoria’s REST API, instead of having to write my own implementation. I think it bears some investigating, since on the minus side there is the problem that Astoria expects some sort of schema from the backend implementation, but it would be an interesting avenue for investigating.

Currently I am thinking about an API similar to:

  • POST or PUT to /docs[/id] – to create/update a document
  • POST to /bulk_docs – to create/update a batch of documents
  • POST or PUT to /views – to create / update a view definition
  • GET from /docs/id to get a document
  • GET from /views/name – to get all view docs of particular view
  • GET from /views/name?key=foo – to get all view docs with matching key
  • GET from /views/name?start=foo&end=bar – to get all view docs within range

This seems to be a pretty simple proposition, and it seems pretty complete.

Thoughts?

time to read 2 min | 334 words

I was asked how we can handle more complex views scenarios. So let us take a look at how we can deal with them.

Joins

In a many to one scenario, (post & comments), how can we get both on them in one call to the database? I am afraid that I am not doing anything new here, since the technique is actually described here. The answer to that is quite simple, you don’t. What you can do, however, is generate a view that will allow you to get both of them at the same time. For example:

image

The key here is that views are always calculated in sort order, so what is actually happening here is that we sort by post id, then by IsPost. Since false is higher then true, the actual post is always the first item, with the comments directly following that. This means that we can query for all of them in one DB call.

Returning more than a single result per row

To be fair, I haven’t considered this, but it seems like a pretty obvious that this is needed. Here is the original request:

Question: can a view contain more rows than the underlying document database? For example: assume an invoice database (each document is an invoice with buyer's and seller's Tax ID). I want to create index: Tax ID -> #of invoices, where tax id can belong either to buyer or seller. In worst case scenario, unique tax IDs in every invoice, we'll have index with 2N entries. How view syntax would look like?

If I understand the problem correctly, this can be resolve using the following view definition:

image

Thoughts?

time to read 5 min | 973 words

The choice of using Linq queries as the default syntax was not an accident. If you look at how Couch DB is doing things, you can see that the choice of Javascript as the query language can cause some really irritating imperative style coding. For example, look at this piece of code:

function(doc) {
  if (doc.type == "comment") {
    map(doc.author, {post: doc.post, content: doc.content});
  }
}

This works, and it allows for some really complicated solutions, but it comes with its own set of problems. Unlike Couch DB, I actually want to enforce a schema for the views, and I need to be able to tell that schema at view creation time. This is partly because of the storage engine choice, and partly because the imperative style means that it is very easy to violate some of the map reduce required behaviors, such as repeatability of the results (by querying a separate data source, for example).

Linq queries are not imperative, they are a good way of expressing set based logic in a really nice way, while still allowing for an almost embarrassingly complex set of problems to be expressed with them. More than that, Linq queries are strongly typed, provide me with a whole bunch of information and allow me to do some really interesting things along the way, some of which we will talk about later. There is also the issue of how easy it would be to utilize such things as PLinq, or that the extensibility story for the DB becomes much easier with this scenario, or that at least in a theoretical perspective, the performance that we are talking about here should be much better than a Javascript based solution. 

Another property of Linq that I considered, much as I am loath to admit it in such a public forum is the marketing aspect of it. A linq-driven database is sure to get a lot of attention, you only have to look at the number of comment on the previous posts in this topic, compare those with linq queries to those without the linq queries. The difference is quite astounding.

All in all, it sounds like an impressive amount of reason to go with Linq.

The problem, of course, is that Linq implies C#, and I don’t really think that C# is the best language for doing language oriented programming. This time, however, we have the major advantage that the domain concepts that we want are already built into the language, so we don’t really need a lot of tweaking here to get things exciting.

I posted about the syntax before, but I don’t think that a lot of people actually got what I meant. Here is the entire view definition:

image 

It is not a snippet, and it is not a part of something larger that I am not showing. This is the view. And yes, it is not compliable on its own. Nor do I imagine that we will see people writing this code in Visual Studio. Or, at least, I imagine that it will be written there, but it will not stay there.

Much like in Couch DB today, you are going to have to create the view on the server, and you do that by creating a specially named document, which will contain this syntax as its content.

Internally, we are going to do some interesting things to it, but I think that I can stop now by just showing your the first stage, what happens to the view code after preprocessing it:

image

Readers of my book should recognize the pattern, I am using the notion of Implicit Base Class here to get us an executable class, which we can now compile and execute at will. Note that the query itself was modified, to make it compliable. We can now proceed to do additional analysis of the actual query, generate the fixed schema out of it, and start doing the really interesting things that we want to do.

But I have better leave those for another post…

time to read 4 min | 775 words

One of the more interesting problems with document databases is the views, and in particular, how are we going to implement views that contain aggregation. In my previous post, I discussed the way we will probably expose this to the users. But it turn out that there are significant challenges in actually implementing the feature itself, not just in the user visible parts.

For projection views, the actual problem is very simple, when a document is updated/removed, all we have to do is to delete the old view item, and create a new item, if applicable.

For aggregation views, the problem is much harder, mostly because it is not clear what the result of adding, updating or removing a document may be. As a reminder, here is how we plan on exposing aggregation views to the user:

image

Let us inspect this from the point of view of the document database. Let us say that we have 100,000 documents already, and we introduce this view. A background process is going to kick off, to transform the documents using the view definition.

The process goes like this:

image

Note that the process depict above is a serial process. This isn’t really useful in the real world. Let us see why. I want to add a new document to the system, how am I going to update the view? Well… an easy option would be this:

image

I think you can agree with me that this is not a really good thing to do from performance perspective. Luckily for us, there are other alternative. A more accurate representation of the process would be:

image

We run the map/reduce process in parallel, producing a lot of separate reduced data points. Now we can do the following:

image

We take the independent reduced results and run a re-reduce process on them again. That is why we have the limitation that map & reduce must return objects in the same shape, so we can use reduce for data that came from map or from reduce, without caring where it came from.

This also means that adding a document is a much easier task, all we need to do is:

image

We get the single reduced result from the whole process, and now we can generate the final result very easily:

image

All we have to do is run the reduce on the final result and the new result. The answer from that would be identical to the answer running the full process on all the documents. Things get more interesting, however, when we talk about document update or document removal. Since update is just a special case of atomic document removal and addition, I am going to talk about document removal only, in this case.

Removing a document invalidate the final aggregation results, but it doesn’t necessarily necessitate recalculating the whole thing from scratch. Do you remember the partial reduce results that we mentioned earlier? Those are not only useful for parallelizing the work, they are also very useful in this scenario. Instead of discarding them when we are done with them, we are going to save them as well. They wouldn’t be exposed to the user at any way, but they are persisted. They are going to be useful when we need to recalculate. The fun thing about them is that we don’t really need to recalculate everything. All we have to do is recalculate the batch that the removed document resided on, without that document. When we have the new batch, we can now reduce the whole thing to a final result again.

I am guessing that this is going to be… a challenging task to build, but from design perspective, it looks pretty straightforward.

time to read 2 min | 285 words

I said that I would speak a bit about aggregations. On the face of it, aggregation looks simple, really simple. Continuing the same thread of design from before, we can have:

image

The problem is that while this is really nice, it doesn’t really work.

The problem is that using this approach, we are going to have to recalculate the view for the entire document set that we have, a potentially very expensive operation. Now, technically I can solve the problem by rewriting the Linq statement. The problem is that it wouldn’t really work. While it is possible to do so, it wouldn’t really work because the following code assume that it knows all the state, and there is no way to regenerate that state in an incremental fashion.

Let us try a better approach:

image

Thanks for Alex Yakunin, for helping me simplify this.

What do we have now? We split the problem into two sections, the Map and the Reduce. Note that to simplify things, map and reduce must return objects in the same shape. That means that we don’t need an explicit re-reduce phase.

That is much easier to reason about, and it allow us to perform aggregation in a very easy manner, allowing us to do aggregation in a manner that is simple to partition. I am probably going to have another post regarding the actual details of handling aggregations.

time to read 3 min | 516 words

One of the more interesting problems with document databases is how you handle views. But a lot of people already had some issues with understanding what I mean with document database (hint, I am not talking about a word docs repository), so I have better explain what I mean by this.

A document database stores documents. Those aren’t what most people would consider as a document, however. It is not excel or word files. Rather, we are talking about storing data in a well known format, but with no schema. Consider the case of storing an XML document or a Json document. In both cases, we have a well known format, but there is not a required schema for those. That is, after all, one of the advantages of document db’s schema less nature.

However, trying to query on top of schema less data can be… problematic. Unless you are talking about lucene, which I would consider to be a document indexer rather than a document DB, although it can be used as such. Even with lucene, you have to specify the things that you are actually interested on to be able to search on them.

So, what are views? Views are a way to transform a document to some well known and well defined format. For example, let us say that I want to use my DB to store wiki information, I can do this easily enough by storing the document as a whole, but how do I lookup a page by its title? Trying to do this on the fly is a receipt for disastrous performance. In most document databases, the answer is to create a view. For RDMBS people, a DDB view is often called a materialized view in an RDMBS.

I thought about creating it like this:

image

Please note that this is only to demonstrate the concept, actually implementing the above syntax requires either on the fly rewrites or C# 4.0

The code above can scan through the relevant documents, and in a very clean fashion (I think), generate the values that we actually care about. Basically, we now have created a view called “pagesByTitleAndVersion”, index by title (ascending) and version (descending). We can now query this view for a particular value, and get it in a very quick manner.

Note that this means that updating views happen as part of a background process, so there is going to be some delay between updating the document and updating the view. That is BASE for you :-)

Another important thing is that this syntax is for projections only. Those are actually very simple to build. Well, simple is relative, there is going to be some very funky Linq stuff going on in there, but from my perspective, it is fairly straightforward. The part that is going to be much harder to deal with is aggregation. I am going to deal with that separately, however.

time to read 3 min | 454 words

In a previous post, I asked about designing a document DB, and brought up the issue of replication, along with a set of questions that effect the design of the system:

  • How often should we replicate?
    • As part of the transaction?
    • Backend process?
    • Every X amount of time?
    • Manual?

I think that we can assume that the faster we replicate, the better it is. However, there are cost associated with this. I think that a good way of doing replication would be to post a message on a queue for the remote replication machine, and have the queuing system handle the actual process. This make it very simple to scale, and create a distinction between the “start replication” part an the actual replication process. It also allow us to handle spikes in a very nice manner.

  • Should we replicate only the documents?
    • What about attachments?
    • What about the generated view data?

We don’t replicate attachments, since those are out of scope.

Generated view data is a more complex issue. Mostly because we have a trade off here, of network payload vs. cpu time. Since views are by their very nature stateless (they can only use the document data), running the view on source machine or the replicated machine would result in exactly the same output. I think that we can safely ignore the view data, treating this as something that we can regenerate. CPU time tend to be far less costly than network bandwidth, after all.

Note that this assumes that view generation is the same across all machines. We discuss this topic more extensively in the views part.

  • Should we replicate to all machines?
    • To specified set of machines for all documents?
    • Should we use some sharding algorithm?

I think that a sharding algorithm would be the best option, given a document, it will give a list of machine to replicate to. We can provide a default implementation that replicate to all machines or to secondary and tertiaries.

time to read 2 min | 344 words

In a previous post, I asked about designing a document DB, and brought up the issue of attachments, along with a set of questions that needs to be handled:

  • Do we allow them at all?

We pretty much have to, otherwise we will have the users sticking them into the document directly, resulting in very inefficient use of space (binaries in Json format sucks).

  • How are they stored?
    • In the DB?
    • Outside the DB?

Storing them in the DB will lead to very high database sizes. And there is the simple question if a Document DB is the appropriate storage for BLOBs. I think that there are better alternatives for that than the Document DB. Things like Rhino DHT, S3, the file system, CDN, etc.

  • Are they replicated?

Out of scope for the document db, I am afraid. That depend on the external storage that you wish for.

  • Should we even care about them at all? Can we apply SoC and say that this is the task of some other part of the system?

Yes we can and we should.

However, we still want to be able to add attachments to documents. I think we can resolve them pretty easily by adding the notion of a document attributes. That would allow us to add external information to a document, such as the attachment URLs. Those should be used for things that are related to the actual document, but are conceptually separated from it.

An attribute would be a typed key/value pair, where both key and value contains strings. The type is an additional piece of information, containing the type of the attribute. This will allow to do things like add relations, specify attachment types, etc.

time to read 2 min | 331 words

This is actually a topic that I haven’t considered upfront. Now that I do, it looks like it is a bit of a hornet nest.

In order to have authorization we must first support authentication. And that bring a whole bunch of questions on its own. For example, which auth mechanism to support? Windows auth? Custom auth? If we have auth, don’t we need to also support sessions? But sessions are expansive to create, so do we really want that?

For that matter, would we need to support SSL?

I am not sure how to implement this, so for now I am going to assume that magic happened and it got done. Because once we have authorization, the rest is very easy.

By default, we assume that any user can access any document. We also support only two operations: Read & Write.

Therefore, we have two pre-defined attributes on the document, read & write. Those attributes may contain a list of users that may read/write to the document. If either read/write permission is set, then only the authorized users may view it.

The owner of the document (the creator) is the only one allowed to set permissions on a document. Note that write permission implies read permission.

In addition to that, an administrator may not view/write to documents that they do not own, but he is allowed to change the owner of a document to the administrator account, at which point he can change the permissions. Note that there is no facility to assign ownership away from a user, only to take ownership if you are the admin.

There is a somewhat interesting problem here related to views. What sort of permissions should we apply there? What about views which are aggregated over multiple documents with different security requirements? I am not sure how to handle this yet, and I would appreciate any comments you have in the matter.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}