RavenDB 4.0The etag simplification

time to read 2 min | 357 words

A seemingly small change in RavenDB 4.0 is the way we implement the etag. In RavenDB 3.x and previous we used a 128 bits number, that was divided into 8 bits of type, 56 bits of restarts counter and 64 bits of changes within the current restart period. Visually, this looks like a GUID:  01000000-0000-0018-0000-000000000002.

The advantage of this format is that it is always increasing, very cheap to handle and requires very little persistent data. The disadvantage is that it is very big, not very human readable and the fact that the number of changes reset on every restart means that you can’t make meaningful deduction about relative sizes between any two etags.

In RavenDB 4.0 we shifted to use a single 64 bits number for all etag calculations. That means that we can just expose a long (no need for the Etag class) which is more natural for most usages. This decision also means that we need to store a lot less information, and etags are one of those things that we go over a lot.  A really nice side affect which was totally planned is that we can now take two etags and subtract them and get a pretty good idea bout the range that needs to be traversed.

Another important decision is that everything uses the same etag range. So documents, revisions, attachments and everything share the same etag, which make it very simple to scan through and find the relevant item just based on a single number. This make it very easy to implement replication, for example, because the wire protocol and persistence format remain the same.

I haven’t thought to write about this, seemed like too small a topic for post, but there was some interest about it in the mailing list, and enumerating all the reasons, it suddenly seems like it isn’t such a small topic.

Update: I forgot to mention, a really important factor of this decision is that we can do do this:

image

So we can give detailed information and expected timeframes easily.

More posts in "RavenDB 4.0" series:

  1. (30 Oct 2017) automatic conflict resolution
  2. (05 Oct 2017) The design of the security error flow
  3. (03 Oct 2017) The indexing threads
  4. (02 Oct 2017) Indexing related data
  5. (29 Sep 2017) Map/reduce
  6. (22 Sep 2017) Field compression