RavenDB 4.0The etag simplification

time to read 2 min | 357 words

A seemingly small change in RavenDB 4.0 is the way we implement the etag. In RavenDB 3.x and previous we used a 128 bits number, that was divided into 8 bits of type, 56 bits of restarts counter and 64 bits of changes within the current restart period. Visually, this looks like a GUID:  01000000-0000-0018-0000-000000000002.

The advantage of this format is that it is always increasing, very cheap to handle and requires very little persistent data. The disadvantage is that it is very big, not very human readable and the fact that the number of changes reset on every restart means that you can’t make meaningful deduction about relative sizes between any two etags.

In RavenDB 4.0 we shifted to use a single 64 bits number for all etag calculations. That means that we can just expose a long (no need for the Etag class) which is more natural for most usages. This decision also means that we need to store a lot less information, and etags are one of those things that we go over a lot.  A really nice side affect which was totally planned is that we can now take two etags and subtract them and get a pretty good idea bout the range that needs to be traversed.

Another important decision is that everything uses the same etag range. So documents, revisions, attachments and everything share the same etag, which make it very simple to scan through and find the relevant item just based on a single number. This make it very easy to implement replication, for example, because the wire protocol and persistence format remain the same.

I haven’t thought to write about this, seemed like too small a topic for post, but there was some interest about it in the mailing list, and enumerating all the reasons, it suddenly seems like it isn’t such a small topic.

Update: I forgot to mention, a really important factor of this decision is that we can do do this:

image

So we can give detailed information and expected timeframes easily.

More posts in "RavenDB 4.0" series:

  1. (14 Aug 2017) Maintaining transaction boundary integrity in a distributed cluster
  2. (03 Aug 2017) Raven Query Language
  3. (13 Jul 2017) The admin’s backdoor is piping hot
  4. (10 Jul 2017) Securing the keys to the kingdom
  5. (04 Jul 2017) Unbounded results sets
  6. (13 Jun 2017) The etag simplification
  7. (12 Jun 2017) Data subscriptions, Part II
  8. (09 Jun 2017) Data subscriptions, Part I
  9. (19 May 2017) Managing encrypted databases
  10. (11 May 2017) Working with attachments
  11. (10 May 2017) Attachments
  12. (08 May 2017) Full database encryption