Designing a document databaseConcurrency
In my previous post, I asked about designing a document DB, and brought up the issue of concurrency, along with a set of questions that effect the design of the system:
- What concurrency alternatives do we choose?
We have several options. Optimistic and pessimistic concurrency are the most obvious ones. Merge concurrency, such as the one implemented by Rhino DHT, is another. Note that we also have to handle the case where we have a conflict as a result of replication.
I think that it would make a lot of sense to support optimistic concurrency only. Pessimistic concurrency is a scalability killer in most system. As for conflicts as a result of concurrency, Couch DB handles this using merge concurrency, which may be a good idea after all. We can probably support both of them pretty easily.
It does cause problems with the API, however. A better approach might be to fail reads of documents with multiple versions, and force the user to resolve them using a different API. I am not sure if this is a good idea or a time bomb. Maybe returning the latest as well as a flag that indicate that there is a conflict? That would allow you to ignore the issue.
- What about versioning?
In addition to the Document ID, each document will have an associated version. The Document Id is a UUID, which means that it can be generated at the client side. Each document is also versioned by the server accepting it. The version syntax follow the following format: [server guid]/[increasing numeric id]/[time].
That will ensure global uniqueness, as well as giving us all the information that we need for the document version.
More posts in "Designing a document database" series:
- (17 Mar 2009) What next?
- (16 Mar 2009) Remote API & Public API
- (16 Mar 2009) Looking at views
- (15 Mar 2009) View syntax
- (14 Mar 2009) Aggregation Recalculating
- (13 Mar 2009) Aggregation
- (12 Mar 2009) Views
- (11 Mar 2009) Replication
- (11 Mar 2009) Attachments
- (10 Mar 2009) Authorization
- (10 Mar 2009) Concurrency
- (10 Mar 2009) Scale
- (10 Mar 2009) Storage
Comments
When you say versioned, do you mean that we can rewind/fast forward in time to a particular version?
No, I mean that you know what version a document is, useful for things like optimistic concurrency.
So do you version at the record/document level or at the field level? Assuming you want to merge changes without overwriting someone else's update, you need a way to determine what changed at the field level. Maybe you don't care that much, or just make a user reload the latest version before allowing them to commit changes (which can be a bad user experience).
No, I do not. I track this at field level.
This something like the way SVN track those changes.
so you track what changes on the client/app side, and send only those changes with the relevant version?
I am not sure that I understand what you mean.
What I intend to do is actually create a very simple system. If you update a document with a version that is not the latest, I am going to reject the update.
ok. guess I was throwing partial updates in there too. If the user experience of requiring the document to be reloaded if its not the latest before accepting changes is acceptable, then good. it is, by far, much easier to implement.
The server rejecting an update does not translate into the user experience. It is up to the client code to know how to handle this rejection in a user pleasing and business logic acceptable way.
Comment preview