The five requirements for the design of all major RavenDB features

time to read 3 min | 534 words

imageWe started some (minor) design work for the next set of features for RavenDB (as we discussed in the roadmap) and a few interesting things came out of that. In particular, the concept of the five pillars any major feature need to stand on.

By major I mean something that impact the persistent state of the system as a whole. For example, attachments, cmpxchng, revisions and conflicts are quite obvious in this manner, while a query is local and transient.

Here they are, in no order of importance:

  • Client API
  • Cluster
  • Backup
  • Studio
  • Disaster Recovery

The client API is how a feature is exposed to clients, obviously. This can be explicit, as in the case of attachments or more subtle, like the CmpXchg usage, which can either be the low level calls or using it directly from RQL.

The cluster is how a particular feature operates in the cluster. In the case of attachments, it means that attachments flow across the network as part of the replication behavior between nodes. For CmgXchg, it means that the values are directly stored in the cluster state machine and are managed by the Raft cluster. The actual way it works doesn’t matter, that we thought about the implications of this feature in a distributed environment has been discussed.

Backup is subtle. It is easy to implement a feature and forget that we actually need to support backup and restore until very late in the game. RavenDB has a few backup strategies, and this also include migrating data from another instance, long term behavior, etc. RavenDB has a few backup strategies (full snapshot or regular) and the feature need to work across all of them.

The studio refers to how we are actually going to expose a feature to the user on the studio. A good example where we failed in the CmpXchng values that are currently not exposed in the studio (there are endpoints for that, but we haven’t got around to this). We are feeling the lack and it is on the fast track for new features for the next minor release. If a feature isn’t in the studio, how do we expect a user to discover, manage or work with it?

Finally, we have disaster recovery. We are taking data integrity very seriously, and one of the things we do is to make sure that even in the case of disk failure or some other data corruption, we can still get the data out. This is done by laying out the data on disk in such a way that there are multiple ways to access it. First, by reading the data normally and assuming a valid structure. This is what we usually do. Second, by reading one byte at a time and still being able to reconstruct the data back, even if some parts of that has been corrupted. This require us to plan ahead how we store the data for a feature in advance so we can support recovery.

There are other stuff as well, anything from monitoring to debugging to performance, but usually they aren’t so important at the design phase of a feature.