Designing a document databaseScale

time to read 3 min | 455 words

In my previous post, I asked about designing a document DB, and brought up the issue of scale, along with a set of questions that effect the design of the system:

  • Do we start from the get go as a distributed DB?

Yes and no. I think that we should start from the get go assuming that a database is not alone, but we shouldn’t burden it with the costs that are associated with this. I think that simply building replication should be a pretty good task, which mean that we can push more smarts regarding the distribution into the client library. Simpler server side code usually means goodness, so I think we should go with that.

  • Do we allow relations?
    • Joins?
    • Who resolves them?

Joins are usually not used in a document DB. They are very useful, however. The problem is how do we resolve them, and by whom. This is especially true when we consider that a joined document may reside on a completely different server. I think that I am going to stick closely to the actual convention in other document databases, that is, joins are not supported. There is another idea that I am toying with, the notion of document attributes, which may be used to record this, but that is another aspect all together. See the discussion about attachments for more details.

  • Do we assume data may reside on several nodes?

Yes and no. The database only care about data that is stored locally, while it may reference data on other nodes, we don’t care about that.

  • Do we allow partial updates to a document?

That is a tricky question. The initial answer is yes, I want this feature. The complete answer is that while I want this feature, I am not sure how I can implement this.

Basically, this is desirable since we can use this to reduce the amount of data we send over the network. The problem is that we run into an interesting issue of how to express that partial update. My current thinking is that we can apply a diff to the initial Json version vs. the updated Json version, and send that. That is problematic since there is no standard way of actually diffing Json. We can just throw it into a string and compare that, of course, but that expose us to json format differences that may cause problems.

I think that I am going to put this issue as: postphoned.

More posts in "Designing a document database" series:

  1. (17 Mar 2009) What next?
  2. (16 Mar 2009) Remote API & Public API
  3. (16 Mar 2009) Looking at views
  4. (15 Mar 2009) View syntax
  5. (14 Mar 2009) Aggregation Recalculating
  6. (13 Mar 2009) Aggregation
  7. (12 Mar 2009) Views
  8. (11 Mar 2009) Replication
  9. (11 Mar 2009) Attachments
  10. (10 Mar 2009) Authorization
  11. (10 Mar 2009) Concurrency
  12. (10 Mar 2009) Scale
  13. (10 Mar 2009) Storage