Optimization story: GetNextIdentityValueWithoutOverwritingOnExistingDocuments

time to read 16 min | 3026 words

A customer had a problem. They were mostly using the RavenDB HiLo algorithm for saving documents to the database, which is very fast & cheap. That client, however, chose to use the identity method. Which means that RavenDB will assign the value.

This is usually used if you need to have sequential values. The identity is actually being managed internally by RavenDB, and that works perfectly fine.

Except… What happens when you enter replication to the mix. The documents with the identity values are replicated to the secondary server, and there we don’t have the identity value, we just have the docs being written with their full id. (users/1, users/2, users/3, etc).

So far, so good. But what happens when you have a failover and you need to write to the secondary, and you use the identity? Well, RavenDB ain’t stupid, and it won’t overwrite the users/1 document. Instead, it will search for the next available opening from the smallest identity value generated and use that. The code looks like this:

   1: private long GetNextIdentityValueWithoutOverwritingOnExistingDocuments(string key,

   2:     IStorageActionsAccessor actions,

   3:     TransactionInformation transactionInformation)

   4: {

   5:     long nextIdentityValue;

   6:     do

   7:     {

   8:         nextIdentityValue = actions.General.GetNextIdentityValue(key);

   9:     } while (actions.Documents.DocumentMetadataByKey(key + nextIdentityValue, transactionInformation) != null);

  10:     return nextIdentityValue;

  11: }

This works, great. Except when you have large number of documents that have already been written. Instead of the brute force search, we now use the following approach:

   1: public long GetNextIdentityValueWithoutOverwritingOnExistingDocuments(string key,

   2:     IStorageActionsAccessor actions,

   3:     TransactionInformation transactionInformation,

   4:     out int tries)

   5: {

   6:     long nextIdentityValue = actions.General.GetNextIdentityValue(key);

7:

   8:     if (actions.Documents.DocumentMetadataByKey(key + nextIdentityValue, transactionInformation) == null)

   9:     {

  10:         tries = 1;

  11:         return nextIdentityValue;

  12:     }

  13:     tries = 1;

  14:     // there is already a document with this id, this means that we probably need to search

  15:     // for an opening in potentially large data set.

  16:     var lastKnownBusy = nextIdentityValue;

  17:     var maybeFree = nextIdentityValue*2;

  18:     var lastKnownFree = long.MaxValue;

  19:     while (true)

  20:     {

  21:         tries++;

  22:         if(actions.Documents.DocumentMetadataByKey(key + maybeFree, transactionInformation) == null)

  23:         {

  24:             if (lastKnownBusy + 1 == maybeFree)

  25:             {

  26:                 actions.General.SetIdentityValue(key, maybeFree);

  27:                 return maybeFree;

  28:             }

  29:             lastKnownFree = maybeFree;

  30:             maybeFree = Math.Max(maybeFree - (maybeFree - lastKnownBusy) / 2, lastKnownBusy + 1);

31:

  32:         }

  33:         else

  34:         {

  35:             lastKnownBusy = maybeFree;

  36:             maybeFree = Math.Min(lastKnownFree, maybeFree*2);

  37:         }

  38:     }

  39: }

This can figure out the first free item in a range of billion documents in under 100 tries, which I am pretty sure if good enough.

Tweet Share Share 18 comments

Tags:

raven

Comments

30 Oct 2012
10:33 AM

Jonas H

Stop the presses, it's binary search!

30 Oct 2012
12:38 PM

Rafal

But isn't it possible to retrieve max value from the index?

30 Oct 2012
14:16 PM

dotnetchris

Ayende, from some recent discussions on ravendb's forums about generating sequential with no gaps invoice numbers due to government regulations, does this provide an easy solution for that?

30 Oct 2012
16:34 PM

Edward

This reminds me of the days with MS Access (client database),

Event OnDuplicateSaveException (key)

while true {

keyAdded=0; key++; loop ++ try { SaveWithNewKey(key) break; } catch { if (loop > maxKeyAddition) throw system exception }

}

30 Oct 2012
16:36 PM

Edward

When in a disconnected replicated enviroment how does Raven handle unique id's ?

Like: 2 servers, one in the cloud and 1 at the office.

30 Oct 2012
16:55 PM

dotnetchris

Rafal, whose to say there's a single index whatsoever? You can use ravendb without a single index ever existing.

30 Oct 2012
17:00 PM

Edward 3

Why maybeFree = nextIdentityValue*2 ?

Could this not cause huge gaps when ids are like 5 and more digits?

nit picking corner:
Move line 13 to 7 and remove line 10

30 Oct 2012
19:47 PM

Frank Quednau

Some kind of nested interval algorithm...I doubt there will be a "hole" .

Is access to this algorithm serialized?

30 Oct 2012
23:18 PM

Rafal

@dotnetchris I was thinking about the 'primary key' index in the Esent database. On a second thought, this index is probably sorted in lexicographic order so getting max numeric value could be a problem.

31 Oct 2012
13:14 PM

Ayende Rahien

Chris, That is certainly the way to go about this, sure.

31 Oct 2012
13:15 PM

Ayende Rahien

Edward, 2 machines, that have a partition among them? Failover?

There are many scenarios.

31 Oct 2012
13:17 PM

Ayende Rahien

Edward, The scenario we had for that is that we had millions of docs already there, we want to get high ASAP. We have ids with seven digits, and this code does NOT generate holes.

31 Oct 2012
13:17 PM

Ayende Rahien

Frank, This is not serialized, but the calls are transitionally safe.

31 Oct 2012
17:32 PM

Edward

@Ayende, about replication:

3 servers, master-master scenario. They all have the same data (no partioning although it could be possible) and changes are happening on every server.

1 server at the office with an internet connection which can disconnect, The software at the office is connected to this database.

1 server in the field with an internet connection which can disconnect, The software at this location is connected to this database.

1 server in the cloud for consumers and also office employees.

These servers are replicating each other. When a connection is reconnected the servers are syncing each other.

It is like MS SQL merge replication: http://msdn.microsoft.com/en-us/library/ms152746(v=sql.105).aspx

01 Nov 2012
08:01 AM

Ayende Rahien

Edward, Okay, in that scenario, what you want to do is to make sure that you are using Raven/ServerPrefix, which will make sure that each of them has its own prefix for the docs. That would mean that you won't have conflicts when creating new ids.

01 Nov 2012
15:40 PM

Edward

Sounds solid. How is that serverprefix calculated/determined? is it unique? stays it unique?

01 Nov 2012
15:52 PM

Ayende Rahien

Edward, You set it up, and you need to ensure that is is unique per all servers

02 Nov 2012
13:29 PM

Edward

Okay clear, thanks.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB