Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,565
|
Comments: 51,184
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 211 words

Here is a snippet from a blog post describing a lecture I gave yesterday in the Ural University:

Imagine him [Ayende] giving a public lecture at the Ural State University and demonstrating one of the numerous code snippets he prepared. Suddenly a guy (I think his name is Alex) interrupts him and tries to point out an error in the code. Unfortunately, Alex fails to express himself in English, and instead just mumbles incomprehensibly. After two more attempts, he gives up and explains the bug to the audience — in Russian. But before anyone has a chance to translate it, Oren smiles and says: “Oh right! You are absolutely correct, I have to insert a break statement here!”. Now, granted, Oren is a great talker and has no problem understanding people; but that was unbelievable even for him, because I can swear that: 1) he doesn't know a single word of Russian, 2) the guy who spotted the problem didn't use a single word of English (like break or foreach). Truth to be told, the whole situation was even scary a bit.

There was a roar of laugher in the audience when I did that, and it took me a while to understand why.

time to read 3 min | 516 words

One of the more interesting problems with document databases is how you handle views. But a lot of people already had some issues with understanding what I mean with document database (hint, I am not talking about a word docs repository), so I have better explain what I mean by this.

A document database stores documents. Those aren’t what most people would consider as a document, however. It is not excel or word files. Rather, we are talking about storing data in a well known format, but with no schema. Consider the case of storing an XML document or a Json document. In both cases, we have a well known format, but there is not a required schema for those. That is, after all, one of the advantages of document db’s schema less nature.

However, trying to query on top of schema less data can be… problematic. Unless you are talking about lucene, which I would consider to be a document indexer rather than a document DB, although it can be used as such. Even with lucene, you have to specify the things that you are actually interested on to be able to search on them.

So, what are views? Views are a way to transform a document to some well known and well defined format. For example, let us say that I want to use my DB to store wiki information, I can do this easily enough by storing the document as a whole, but how do I lookup a page by its title? Trying to do this on the fly is a receipt for disastrous performance. In most document databases, the answer is to create a view. For RDMBS people, a DDB view is often called a materialized view in an RDMBS.

I thought about creating it like this:

image

Please note that this is only to demonstrate the concept, actually implementing the above syntax requires either on the fly rewrites or C# 4.0

The code above can scan through the relevant documents, and in a very clean fashion (I think), generate the values that we actually care about. Basically, we now have created a view called “pagesByTitleAndVersion”, index by title (ascending) and version (descending). We can now query this view for a particular value, and get it in a very quick manner.

Note that this means that updating views happen as part of a background process, so there is going to be some delay between updating the document and updating the view. That is BASE for you :-)

Another important thing is that this syntax is for projections only. Those are actually very simple to build. Well, simple is relative, there is going to be some very funky Linq stuff going on in there, but from my perspective, it is fairly straightforward. The part that is going to be much harder to deal with is aggregation. I am going to deal with that separately, however.

time to read 3 min | 454 words

In a previous post, I asked about designing a document DB, and brought up the issue of replication, along with a set of questions that effect the design of the system:

  • How often should we replicate?
    • As part of the transaction?
    • Backend process?
    • Every X amount of time?
    • Manual?

I think that we can assume that the faster we replicate, the better it is. However, there are cost associated with this. I think that a good way of doing replication would be to post a message on a queue for the remote replication machine, and have the queuing system handle the actual process. This make it very simple to scale, and create a distinction between the “start replication” part an the actual replication process. It also allow us to handle spikes in a very nice manner.

  • Should we replicate only the documents?
    • What about attachments?
    • What about the generated view data?

We don’t replicate attachments, since those are out of scope.

Generated view data is a more complex issue. Mostly because we have a trade off here, of network payload vs. cpu time. Since views are by their very nature stateless (they can only use the document data), running the view on source machine or the replicated machine would result in exactly the same output. I think that we can safely ignore the view data, treating this as something that we can regenerate. CPU time tend to be far less costly than network bandwidth, after all.

Note that this assumes that view generation is the same across all machines. We discuss this topic more extensively in the views part.

  • Should we replicate to all machines?
    • To specified set of machines for all documents?
    • Should we use some sharding algorithm?

I think that a sharding algorithm would be the best option, given a document, it will give a list of machine to replicate to. We can provide a default implementation that replicate to all machines or to secondary and tertiaries.

time to read 2 min | 344 words

In a previous post, I asked about designing a document DB, and brought up the issue of attachments, along with a set of questions that needs to be handled:

  • Do we allow them at all?

We pretty much have to, otherwise we will have the users sticking them into the document directly, resulting in very inefficient use of space (binaries in Json format sucks).

  • How are they stored?
    • In the DB?
    • Outside the DB?

Storing them in the DB will lead to very high database sizes. And there is the simple question if a Document DB is the appropriate storage for BLOBs. I think that there are better alternatives for that than the Document DB. Things like Rhino DHT, S3, the file system, CDN, etc.

  • Are they replicated?

Out of scope for the document db, I am afraid. That depend on the external storage that you wish for.

  • Should we even care about them at all? Can we apply SoC and say that this is the task of some other part of the system?

Yes we can and we should.

However, we still want to be able to add attachments to documents. I think we can resolve them pretty easily by adding the notion of a document attributes. That would allow us to add external information to a document, such as the attachment URLs. Those should be used for things that are related to the actual document, but are conceptually separated from it.

An attribute would be a typed key/value pair, where both key and value contains strings. The type is an additional piece of information, containing the type of the attribute. This will allow to do things like add relations, specify attachment types, etc.

time to read 2 min | 331 words

This is actually a topic that I haven’t considered upfront. Now that I do, it looks like it is a bit of a hornet nest.

In order to have authorization we must first support authentication. And that bring a whole bunch of questions on its own. For example, which auth mechanism to support? Windows auth? Custom auth? If we have auth, don’t we need to also support sessions? But sessions are expansive to create, so do we really want that?

For that matter, would we need to support SSL?

I am not sure how to implement this, so for now I am going to assume that magic happened and it got done. Because once we have authorization, the rest is very easy.

By default, we assume that any user can access any document. We also support only two operations: Read & Write.

Therefore, we have two pre-defined attributes on the document, read & write. Those attributes may contain a list of users that may read/write to the document. If either read/write permission is set, then only the authorized users may view it.

The owner of the document (the creator) is the only one allowed to set permissions on a document. Note that write permission implies read permission.

In addition to that, an administrator may not view/write to documents that they do not own, but he is allowed to change the owner of a document to the administrator account, at which point he can change the permissions. Note that there is no facility to assign ownership away from a user, only to take ownership if you are the admin.

There is a somewhat interesting problem here related to views. What sort of permissions should we apply there? What about views which are aggregated over multiple documents with different security requirements? I am not sure how to handle this yet, and I would appreciate any comments you have in the matter.

time to read 2 min | 299 words

In my previous post, I asked about designing a document DB, and brought up the issue of concurrency, along with a set of questions that effect the design of the system:

  • What concurrency alternatives do we choose?

We have several options. Optimistic and pessimistic concurrency are the most obvious ones. Merge concurrency, such as the one implemented by Rhino DHT, is another. Note that we also have to handle the case where we have a conflict as a result of replication.

I think that it would make a lot of sense to support optimistic concurrency only. Pessimistic concurrency is a scalability killer in most system. As for conflicts as a result of concurrency, Couch DB handles this using merge concurrency, which may be a good idea after all. We can probably support both of them pretty easily.

It does cause problems with the API, however. A better approach might be to fail reads of documents with multiple versions, and force the user to resolve them using a different API. I am not sure if this is a good idea or a time bomb. Maybe returning the latest as well as a flag that indicate that there is a conflict? That would allow you to ignore the issue.

  • What about versioning?

In addition to the Document ID, each document will have an associated version. The Document Id is a UUID, which means that it can be generated at the client side. Each document is also versioned by the server accepting it. The version syntax follow the following format: [server guid]/[increasing numeric id]/[time].

That will ensure global uniqueness, as well as giving us all the information that we need for the document version.

time to read 3 min | 455 words

In my previous post, I asked about designing a document DB, and brought up the issue of scale, along with a set of questions that effect the design of the system:

  • Do we start from the get go as a distributed DB?

Yes and no. I think that we should start from the get go assuming that a database is not alone, but we shouldn’t burden it with the costs that are associated with this. I think that simply building replication should be a pretty good task, which mean that we can push more smarts regarding the distribution into the client library. Simpler server side code usually means goodness, so I think we should go with that.

  • Do we allow relations?
    • Joins?
    • Who resolves them?

Joins are usually not used in a document DB. They are very useful, however. The problem is how do we resolve them, and by whom. This is especially true when we consider that a joined document may reside on a completely different server. I think that I am going to stick closely to the actual convention in other document databases, that is, joins are not supported. There is another idea that I am toying with, the notion of document attributes, which may be used to record this, but that is another aspect all together. See the discussion about attachments for more details.

  • Do we assume data may reside on several nodes?

Yes and no. The database only care about data that is stored locally, while it may reference data on other nodes, we don’t care about that.

  • Do we allow partial updates to a document?

That is a tricky question. The initial answer is yes, I want this feature. The complete answer is that while I want this feature, I am not sure how I can implement this.

Basically, this is desirable since we can use this to reduce the amount of data we send over the network. The problem is that we run into an interesting issue of how to express that partial update. My current thinking is that we can apply a diff to the initial Json version vs. the updated Json version, and send that. That is problematic since there is no standard way of actually diffing Json. We can just throw it into a string and compare that, of course, but that expose us to json format differences that may cause problems.

I think that I am going to put this issue as: postphoned.

time to read 2 min | 385 words

In a previous post, I asked about designing a document DB, and brought up the issue of storage, along with a set of questions that needs to be handled:

  • How do we physically store things?

There are several options, from building our own persistent format, to using an RDMBS. I think that the most effective option would be to use Esent. It is small, highly efficient, require no installation and very simple to use. It also neatly resolve a lot of the questions that we have to ask in addition to that.

  • How do we do backups?

Esent already has the facilities to do that, so we have very little to worry about it here.

  • How do we handle corrupted state?

See above, Esent is also pretty good in doing auto recovery, which is a nice plus.

  • Where do we store the views?
    • Should we store them in the same file as the actual data? 

I think not, I think that the best alternative is to have a file per view. That should make things such backing up just the DB easier, not to mention that it will reduce contention internally. Esent is built to handle that, but it is better to make it this way than not. All the data (include logs & temp dirs) should reside inside the same directory.

Crash recovery on startup should be enabled. Transactions should probably avoid crossing file boundaries.It is important the the files will include a version table, which will allow to detect invalid versions (caused a whole bunch of problems with RDHT until we fixed it).

  • Are we transactional?

Yes, we are transactional. But only for document writes. We are not transactional for document + views, for example, since view generation is done as a background service.

  • Do we allow multi document operation to be transactional?

Yes, depending on the operation. We allow submittal of several document writes / deletes at the same time, and they would succeed or fail as a single unit. Beyond that, no.

time to read 4 min | 674 words

A while ago I started experimenting with building my own document DB, based on the concepts that Couch DB have. As it turn out, there isn’t really much to it, at a conceptual level. A document DB requires the following features:

  • Store a document
  • Retrieve document by id
  • Add attachment to document
  • Replicate to a backup server
  • Create views on top of documents

The first two requirements are easily handled, and should generally take less than a day to develop. Indeed, after learning about the Esent database, it took me very little time to create this. I should mention that as an interesting limitation to the DB, I made the decision to accept only documents in Json format. That makes some things very simple, specifically views and partial updates.

There are several topics here that are worth discussion, because they represent non trivial issues. I am going to raise them here as questions, and answer them in future posts.

Storage:

  • How do we physically store things?
  • How do we do backups?
  • How do we handle corrupted state?
  • Where do we store the views?
    • Should we store them in the same file as the actual data? 
  • Are we transactional?
  • Do we allow multi document operation to be transactional?

Scale:

  • Do we start from the get go as a distributed DB?
  • Do we allow relations?
    • Joins?
    • Who resolves them?
  • Do we assume data may reside on several nodes?
  • Do we allow partial updates to a document?

Concurrency:

  • What concurrency alternatives do we choose?
  • What about versioning?

Attachments:

  • Do we allow them at all?
  • How are they stored?
    • In the DB?
    • Outside the DB?
  • Are they replicated?
  • Should we even care about them at all? Can we apply SoC and say that this is the task of some other part of the system?

Replication:

  • How often should we replicate?
    • As part of the transaction?
    • Backend process?
    • Every X amount of time?
    • Manual?
  • Should we replicate only the documents?
    • What about attachments?
    • What about the generated view data?
  • Should we replicate to all machines?
    • To specified set of machines for all documents?
    • Should we use some sharding algorithm?

Views:

  • How do we define views?
  • How do we define the conversion process from a document to a view item?
  • Does views have fixed schema?
  • How often do we update views?
  • How do we remove view items from the DB when the origin document has been removed?

There are some very interesting challenges relating to doing the views. Again, I am interested in your opinions about this.

There are several other posts, detailing my current design, which will be posted spaced about a day apart from one another. I’ll post a summary post with all the relevant feedback as well.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  2. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  3. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
  4. RavenDB 7.0 Released (4):
    07 Mar 2025 - Moving to NLog
  5. Challenge (77):
    03 Feb 2025 - Giving file system developer ulcer
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}