Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,550

filter by tags archive

Optimization story: GetNextIdentityValueWithoutOverwritingOnExistingDocuments

time to read 16 min | 3026 words

A customer had a problem. They were mostly using the RavenDB HiLo algorithm for saving documents to the database, which is very fast & cheap. That client, however, chose to use the identity method. Which means that RavenDB will assign the value.

This is usually used if you need to have sequential values. The identity is actually being managed internally by RavenDB, and that works perfectly fine.

Except… What happens when you enter replication to the mix. The documents with the identity values are replicated to the secondary server, and there we don’t have the identity value, we just have the docs being written with their full id. (users/1, users/2, users/3, etc).

So far, so good. But what happens when you have a failover and you need to write to the secondary, and you use the identity? Well, RavenDB ain’t stupid, and it won’t overwrite the users/1 document. Instead, it will search for the next available opening from the smallest identity value generated and use that. The code looks like this:

   1: private long GetNextIdentityValueWithoutOverwritingOnExistingDocuments(string key, 
   2:     IStorageActionsAccessor actions, 
   3:     TransactionInformation transactionInformation)
   4: {
   5:     long nextIdentityValue;
   6:     do
   7:     {
   8:         nextIdentityValue = actions.General.GetNextIdentityValue(key);
   9:     } while (actions.Documents.DocumentMetadataByKey(key + nextIdentityValue, transactionInformation) != null);
  10:     return nextIdentityValue;
  11: }

This works, great. Except when you have large number of documents that have already been written. Instead of the brute force search, we now use the following approach:

   1: public long GetNextIdentityValueWithoutOverwritingOnExistingDocuments(string key,
   2:     IStorageActionsAccessor actions,
   3:     TransactionInformation transactionInformation,
   4:     out int tries)
   5: {
   6:     long nextIdentityValue = actions.General.GetNextIdentityValue(key);
   8:     if (actions.Documents.DocumentMetadataByKey(key + nextIdentityValue, transactionInformation) == null)
   9:     {
  10:         tries = 1;
  11:         return nextIdentityValue;
  12:     }
  13:     tries = 1;
  14:     // there is already a document with this id, this means that we probably need to search
  15:     // for an opening in potentially large data set. 
  16:     var lastKnownBusy = nextIdentityValue;
  17:     var maybeFree = nextIdentityValue*2;
  18:     var lastKnownFree = long.MaxValue;
  19:     while (true)
  20:     {
  21:         tries++;
  22:         if(actions.Documents.DocumentMetadataByKey(key + maybeFree, transactionInformation) == null)
  23:         {
  24:             if (lastKnownBusy + 1 == maybeFree)
  25:             {
  26:                 actions.General.SetIdentityValue(key, maybeFree);
  27:                 return maybeFree;
  28:             }
  29:             lastKnownFree = maybeFree;
  30:             maybeFree = Math.Max(maybeFree - (maybeFree - lastKnownBusy) / 2, lastKnownBusy + 1);
  32:         }
  33:         else
  34:         {
  35:             lastKnownBusy = maybeFree;
  36:             maybeFree = Math.Min(lastKnownFree, maybeFree*2);
  37:         }
  38:     }
  39: }

This can figure out the first free item in a range of billion documents in under 100 tries, which I am pretty sure if good enough.


Jonas H

Stop the presses, it's binary search!


But isn't it possible to retrieve max value from the index?


Ayende, from some recent discussions on ravendb's forums about generating sequential with no gaps invoice numbers due to government regulations, does this provide an easy solution for that?


This reminds me of the days with MS Access (client database),

Event OnDuplicateSaveException (key)

while true {

keyAdded=0; key++; loop ++ try { SaveWithNewKey(key) break; } catch { if (loop > maxKeyAddition) throw system exception }




When in a disconnected replicated enviroment how does Raven handle unique id's ?

Like: 2 servers, one in the cloud and 1 at the office.


Rafal, whose to say there's a single index whatsoever? You can use ravendb without a single index ever existing.

Edward 3

Why maybeFree = nextIdentityValue*2 ?

Could this not cause huge gaps when ids are like 5 and more digits?

nit picking corner:
Move line 13 to 7 and remove line 10

Frank Quednau

Some kind of nested interval algorithm...I doubt there will be a "hole" .

Is access to this algorithm serialized?


@dotnetchris I was thinking about the 'primary key' index in the Esent database. On a second thought, this index is probably sorted in lexicographic order so getting max numeric value could be a problem.

Ayende Rahien

Chris, That is certainly the way to go about this, sure.

Ayende Rahien

Edward, 2 machines, that have a partition among them? Failover?

There are many scenarios.

Ayende Rahien

Edward, The scenario we had for that is that we had millions of docs already there, we want to get high ASAP. We have ids with seven digits, and this code does NOT generate holes.

Ayende Rahien

Frank, This is not serialized, but the calls are transitionally safe.


@Ayende, about replication:

3 servers, master-master scenario. They all have the same data (no partioning although it could be possible) and changes are happening on every server.

1 server at the office with an internet connection which can disconnect, The software at the office is connected to this database.

1 server in the field with an internet connection which can disconnect, The software at this location is connected to this database.

1 server in the cloud for consumers and also office employees.

These servers are replicating each other. When a connection is reconnected the servers are syncing each other.

It is like MS SQL merge replication: http://msdn.microsoft.com/en-us/library/ms152746(v=sql.105).aspx

Ayende Rahien

Edward, Okay, in that scenario, what you want to do is to make sure that you are using Raven/ServerPrefix, which will make sure that each of them has its own prefix for the docs. That would mean that you won't have conflicts when creating new ids.


Sounds solid. How is that serverprefix calculated/determined? is it unique? stays it unique?

Ayende Rahien

Edward, You set it up, and you need to ensure that is is unique per all servers


Okay clear, thanks.

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - one day from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats