Ayende @ Rahien

Refunds available at head office

The last RavenDB feature of the year, bulk inserts

One of the most exciting new features that got into RavenDB 2.0 is the notion of bulk inserts. Unlike the “do batches in a loop” approach, we actually created an optimized approach and a hand crafted code path that reduce the time of the standard RavenDB saves (which does a lot, but come at a cost).

In particular, we made sure that we can parallelize the operation between the client and the server, so we don’t have to build the entire request in memory on the client and then wait for it all to be in memory on the server before we can start operation. Instead, we have a fully streamed operation from end to end.

Here is what the API looks like:

   1: using (var bulkInsert = store.BulkInsert())
   2: {
   3:     for (int i = 0; i < 1000*1000; i++)
   4:     {
   5:         bulkInsert.Store(new User {Name = "Users #" + i});
   6:     }
   7: }

This uses a single request to the server to do all the work. And here are the results:

image

This API has several limitations:

  • You must provide the id at the client side (in this case, generated via hilo).
  • It can't take part of DTC transactions
  • If you want updates, you need to explicitly state (other would throw).
  • Put triggers will execute, but the AfterCommit will not.
  • This bypass the indexing memory pre fetching layer.
  • Changes() will not be raised for documents inserted using bulk-insert.
  • There isn't a single transaction for the entire operation, rather, this is done in batches and each batch is transactional on its own.

This is explicitly meant to drop a very large number of records to RavenDB very fast, and it does this very well, typically an order of magnitude or more faster than the “batches in a loop”  approach.

A note about the last limitation, though. The whole idea here it to reduce, as much as possible, the costs of actually doing a bulk insert. That means that we can’t keep a transaction of millions of item open. Instead, we periodically flush the transaction buffer throughout the process. Assuming the default batch size of 512 documents, that means that an error in one of those documents will result in the entire batch of 512 being rolled back, but will not roll back previously committed batches.

This is done to reduce transaction log size and to make sure that even during a bulk insert operation, we can index the incoming documents while they are being stream in.

Tags:

Published at

Originally posted at

Comments (8)

Feature intersection is killing me, referenced document indexing

I mentioned before that the hard part in building RavenDB now isn’t the actual features that we add, it is the intersection of features that is causing problems.

Case in point, let us look at the new referenced document indexing, which allows you to index data from a related document, and have RavenDB automatically keep it up to date. This was a feature that was requested quite often. Implementing that was complex, but straightforward. We now track what are the documents are referenced by each document, and we know how to force reindexing of a document if a document it was referencing was changed.

So far, so good. It was actually quite easy for us to force re-indexing, all we had to do was to force the referencing document etag to change, and the indexing code would pick it up and re-index that. Simple & easy.

Except… we use Etags for a lot more than just indexing. For example, we use etags for replication.

Now, imagine, if you will, two nodes setup as master/master. Both nodes have an index that uses LoadDocument to refer to another document.

We are now in a stable state, both nodes have all documents.  We modify a document, which causes that document to be replicated to the second node. That trigger (on both servers) re-indexing of the referencing document.  And that, in turn, would cause both servers to want to replicate the new “change” to the other one. What is worse, RavenDB is smart enough to detect that isn’t a conflict, so what we actually get is an infinite distributed loop.

Or, another case, pre fetching. As you probably know, an important optimization in RavenDB is the ability to prefetch documents from disk and not have to wait for them. We even augment that by putting incoming documents directly into the prefetching queue, never needing to hit the disk throughout the process.

Except that when we designed prefetching, there was never the idea of a having holes in the middle. But touching a document (updating its etag), causes just that. Let us assume that we have three documents (items/1, items/2, items/3).

We are saving items/1 and items/3 as part of our standard work. items/1 is being referenced by items/2. That means that on disk, we would have the following etags: (4 – items/1, 5 – items/2, 6 – items/3). However, the prefetching queue will have just (4 – items/1, 6 - items/3). This is a hole, and we didn’t use to have those (we might have gaps, but there were never any documents in those gaps). So we had to re-write the prefeteching behavior to accommodate that (along the way, we made it much better, but still).

Then there were issues relating to optimizations, it turned out that allowing a lot of holes was also not a good idea, so we changed our etag implementation to reduce the chance of holes, and…

It is interesting work, but it can be quite a hurdle when we want to do a new feature.

And then there are the really tough questions. When we load another document during the indexing of another document, what operation should we pass to the read trigger that decide if we can or cannot see this index? Is it Index operation, which means that you won’t be able to load versioned documents? Or is it Load documents, which would allow us to read versioned documents, but bring the question of how to deal with this situation? Add a new option? And make each read trigger chose its own behavior?

It is a sign of maturity, and I really like the RavenDB codebase, but it is increasing in complexity.

Tags:

Published at

Originally posted at

Comments (3)

Riddle me this, why won’t this code work?

The following code will not result in the expected output:

using(var mem = new MemoryStream())
{
    using(var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen:true))
    {
        gzip.WriteByte(1);
        gzip.WriteByte(2);
        gzip.WriteByte(1);
        gzip.Flush();
    }
    
    using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true))
    {
        gzip.WriteByte(2);
        gzip.WriteByte(1);
        gzip.WriteByte(2);
        gzip.Flush();
    }

    mem.Position = 0;

    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }


    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }
}

Why? And what can be done to solve this?

Tags:

Published at

Originally posted at

Comments (15)

RavenDB Memory Issue, if I told you once, I told you a million times

One of the really annoying things about doing production readiness testing is that you often run into the same bug over & over again. In this case, we have fixed memory obesity issues over and over again.

Just recently, we had the following major issues that we had to deal with:

Overall, not fun.

But the saga ain’t over yet. We had a test case, we figure out what was going on, and we fixed it, damn it. And then we went to prod and figured out that we didn’t fix it after all. I’ll spare you the investigative story, suffice to say that we finally ended up figuring out that we are to blame for optimizing for a specific scenario.

In this case, we have done a lot of work to optimize for very large batches (import scenario), and we set the Lucene merge factor at a very high level (way too high, as it turned out). That was perfect for batching scenarios. But not so good for non batching scenarios. That resulted in us having to hold in memory a lot of lucene segments. Segments aren’t expensive, but they each have their own data structures. That works, sure, but when you start having tens of thousands of those, we are back in the previous story, where a relatively small objects come together in unexpected ways to kill us in nasty ways. Reducing the merge factor meant that we would keep only very small amount of segments, and avoided the problem entirely.

The best thing about this? I had to chase a bunch of false leads and ended up fixing what would have been a separate memory leak that would have gone unnoticed otherwise Smile.

And now, let us see if stopping work at quarter to six in the morning is conductive for proper rest, excuse me, I am off to bed.

Tags:

Published at

Originally posted at

Comments (3)

Tooling shout out: .NET Memory Profiler

image

To start with, I don’t have any association with them, I got nothing (no money, free license, promise of goodwill or anything else at all) from the SciTech Software (the creators of .NET Memory Profiler.

This tool has been instrumental in figuring out our recent memory issues. I have tried dotTrace Memory, JustTrace and WinDBG, but this tool outshone them all and was able to point us quite quickly to the root cause that we had to deal with, and from there, it was quite easy to reach a solution.

Highly recommended.

Tags:

Published at

Originally posted at

Comments (7)

RavenDB Feature of the Year: Indexing related documents

I am pretty sure that this feature is going to be at the very top of the chart when people talk about 2.0 features that they love. This is a feature that we didn’t plan for in 2.0. But we got held up by the memory issues, and I really needed to do something awesome rather than trolling through GBs of dump files. So I decided to give myself a little gift and do a big cool feature as a reward.

Let us imagine the following scenario:

image

We want to search for invoices based on the customer name. That one is easily enough to do, because you can use the approach outlined here. First do a search on the customer name, then do a search based on the customer id. In most cases, this actually result in better UX, because you have the chance to do more stuff to find the right customer.

That said, a common requirement that also pops up is the need to sort based on the customer name. And that is were things gets complex. You need to do things like multi map reduce, and it get hairy (or get bald, depending if you tear at your hair often or not).

Let us look at another example:

image

I want to looks for courses that have students named Oren.  There are solutions for that, but they aren’t nice.

Here is where we have the awesome feature, indexing related documents:

image

And now we can query things like this:

image

And obviously, we can do all the rest, such as sort by it, do full text searching, etc.

What about the more complex example? Students & Courses? This is just as easy:

image

And then we can query it on:

image

But wait! Yes, I know what you are thinking. What about updates? RavenDB will take care of all of that for you behind the scenes. When the referenced document change, the value will be reindexed automatically, meaning that you will get the updated value easily.

image

image

This feature is going to deal with a lot of pain points that people are currently getting, and I am so excited I can barely sit.

Tags:

Published at

Originally posted at

Comments (15)

RavenDB Memory Issue, The Process

Well, we got it. Dear DB, get your hands OFF my memory (unless you really need it, of course).

The actual issue was so hard to figure out because it was not a memory leak. It exhibit all of the signs for that, sure, but it was not.

Luckily for RavenDB, we have a really great team, and the guy who provided the final lead is Arek, from AIS.PL, who does really great job. Arek manage to capture the state in a way that showed that a lot of the memory was help by the OptimizedIndexReader class, to be accurate, about 2.45GB of it. That made absolutely no sense, since OIR is a relatively cheap class, and we don’t expect to have many of them.

Here is the entire interesting part of the class:

   2: public class OptimizedIndexReader<T> where T : class
   3: {
   4:     private readonly List<Key> primaryKeyIndexes;
   5:     private readonly byte[] bookmarkBuffer;
   6:     private readonly JET_SESID session;
   7:     private readonly JET_TABLEID table;
   8:     private Func<T, bool> filter;
   9:  
  10:     public OptimizedIndexReader(JET_SESID session, JET_TABLEID table, int size)
  11:     {
  12:         primaryKeyIndexes = new List<Key>(size);
  13:         this.table = table;
  14:         this.session = session;
  15:         bookmarkBuffer = new byte[SystemParameters.BookmarkMost];
  16:     }

As you can see, this isn’t something that looks like it can hold 2.5GB. Sure, it has a collection, but the collection isn’t really going to be that big.  It may get to a few thousands, but it is capped at around 131,072 or so. And the Key class is also small. So that can’t be it.

There was a… misunderstanding in the way I grokked the code. Instead of having one OIR with a collection of 131,072 items. No, the situation was a lot more involved. When using map/reduce indexes, we would have as many of the readers as we would have (keys times buckets). When talking about large map/reduce indexes, that meant that we might need tens of thousands of the readers to process a single batch. Now, each of those readers would usually contain just one or two items, so that wasn’t deemed to be a problem.

Except that we have this thing on line 15. BookmarkMost is actualy 1,001 bytes. With the rest of the reader, let us call this an even 1Kb. And we had up to of 131,072 of those around, per index. Now, we weren’t going to hang on to those guys for a long while, just until we were done indexing. Except… Since this took up a lot of memory, this also meant that we would create a lot of garbage memory for the GC to work on, that would slow everything down, and result in us needing to process larger and larger batches.  As the size of the batches would increase, we would use more and more memory. And eventually we would start paging.

Once we did that, we were basically is slowville, carrying around a lot of memory that we didn’t really need. If we were able to complete the batch, all of that memory would instance turn to garbage, and we could move on. But if we had another batch with just as much work to do…

And what about prefetching? Well, as it turned out, we had our own problems with prefetching, but they weren’t relating to this. Prefetching simply made things so fast that they served the data to the map/reduce index at a rate fast enough to expose this issue, ouch!

We probably still need to go over some things, but this looks good.

Tags:

Published at

Originally posted at

Comments (3)

RavenDB 2.0 StopShip bug: Memory is nice, let us eat it all.

In the past few days, it sometimes felt like RavenDB is a naughty boy who want to eat all of the cake and leave none for others.

The issue is that under certain set of circumstances, RavenDB memory usage would spike until it would consume all of the memory on the machine. The problem is that we are pretty sure what is the root cause of the problem, it is the prefetching data that is killing us. Proven by the fact that when we disable that, we seem to be operating fine. And we did find quite a few such issues. And we got them fixed.

And still the problem persists… (picture torn hair and head banging now).

To make things worse, in our standard load tests, we couldn’t see this problem. It was our dog fooding tests that actually caught it. And it only happened after a relatively long time in production. That sucked, a lot.

The good news is that I eventually sat down and wrote a test harness that could pretty reliably reproduce this issue. That narrowed down things considerably. This issue is related to map/reduce and to prefetching, but we are still investigating.

Here are the details:

  • Run RavenDB on a machine that has at least 2 GB of free RAM.
  • Run the Raven.SimulatedWorkLoad, it will start writing documents and creating indexes
  • After about 50,000 – 80,000 documents have been imported, you’ll begin seeing memory rises rapidly, to use as much free memory as you have.

On my machine, it got to 6 GB before I had to kill it. I took a dump of the process memory at around 4.3GB, and we are analyzing this now. The frustrating thing is that the act of taking the mem dump dropped the memory usage to 1.2GB.

I wonder if we aren’t just creating so much memory garbage that the GC just let us consume all available memory. The problem with that is that it gets so bad that we start paging, and I don’t think the GC should allow that.

The dump file can be found here (160MB compressed), if you feel like taking a stab in it. Now, if you’ll excuse me, I need to open WinDBG and see what I can find.

Tags:

Published at

Originally posted at

Comments (9)

Goodbye, 2012: Our end of year discount starts now!

Well, as the year draws to a close, it is that time again, I got older, apparently. Yesterday marked my 31th trip around the sun.

To celebrate, I decided to give the first 31 people a 31% discount for all of our products.

This offer applies to:

This also applies to our support & consulting services.

All you have to do is to use the following coupon code: goodbye-2012

Enjoy the end of the year, and happy holidays.

Think about production, silly!

We just finished doing a big optimization in RavenDB, and one of the things that we needed to do was to store additional (internal) information so we could act upon it later on. If you must know, we now keep track of stats during indexing and can select the appropriate indexing approach based on the amount of data that we have available.

The details about this aren’t that important. What is important is that this is a piece of data that is used by RavenDB to make decisions. That means that just about the worst thing that we could possibly do is leave things at this state:

Think about what will happen in production, when you have an annoyed (and tired) ops team trying to figure out what is going on. Having a black box is the worst thing that you could possibly do, because you give the admin absolutely no input. And remember, you are going to be the one on call when the support phone rings.

One of the very final touches that we did was to add a debug endpoint that will expose those details to the user, so we could actually inspect them at runtime, and in production.  We have a lot of those, some are intended for monitoring purposes, such as the /admin/stats or the /databases/db-name/stats endpoints, some are meant for troubleshooting, such as the /databases/db-name/logs?type=error endpoint and some are purely for debugging purposes, such as /databases/db-name/indexes/index/name?debug=keys which gives you the stats about all the keys in a map/reduce index.

Trust me, you are going to need those, at some point.

Tags:

Published at

Originally posted at

Comments (5)

Production Cloud Profiling With Uber Prof

With Uber Prof 2.0 (NHibernate Profiler, Entity Framework Profiler, Linq to SQL Profiler, LLBLGen Profiler) we are going to bring you a new era of goodness.

In 1.0, we gave you a LOT of goodness for the development stage of building your application, but now we are able to take it a few steps further. Uber Prof 2.0 supports production profiling, which means that you can run it in production and see what is going on in your application now!

To make things even more interesting, we have also done a lot of work to make sure that this works on the cloud as well. For example, go ahead and look at this site: http://efprof.cloudapp.net/

This is a live application, that doesn’t really do anything special, I’ll admit. But the kicker is when you go to this URL: http://efprof.cloudapp.net/profiler/profiler.html

image

This is EF Prof, running on the cloud, and giving you the results that you want, live. You can read all about this feature and how to enable it here, but I am sure that you can see the implications.

Optimizations gone wild, O(N!) memory leaks

So, after doing so much work on the indexing optimization, it turned out that we had a bug. I assume that you remember this optimization, right?

image

In which we were able to pre fetch data from the disk and not have to wait for data at all. This all worked beautifully when running on data sets that included simple indexes. But the moment we had map/reduce indexes, something bad happened. That something bad was that we kept missing the batch that we loaded (this relates to how we load & find the appropriate batches).

We do all the lookups by etag, and map/reduce add gaps in the etags. Which meant that we kept missing the etag, and had to start loading things up again. And because whenever we load something we also start loading the next batch…

Here is what the memory looked like:

image

Yup, for every batch we loaded the next 5 batches, for a total of O(N!) items in memory for everything.

Now, we had some cleanup routines, but we did NOT expect to have that much, so we would recover, eventually, but usually not before we consumed all the memory.

Opps!

Tags:

Published at

Originally posted at

Comments (7)

RavenDB indexing optimizations, Step III–Skipping the disk altogether

Coming back a bit, before prefetching, we actually had something like this:

image_thumb[1]

 

With the new prefetching, we can parallelize the indexing & the I/O fetching. That is good, but the obvious optimization is actually not going to the disk at all. We already have the documents we want in memory, why no send them directly to the pre fetched queue?

image

As you can see, we didn’t need to even touch the disk to get this working properly. This gives us a really big boost in terms of how fast we can index things. Also note that because we already have those docs in memory, we can still merge separate writes into a single indexing batch, reducing the cost of indexing even further.

Tags:

Published at

Originally posted at

Comments (5)

RavenDB indexing optimizations, Step II–Pre Fetching

Getting deeper into our indexing optimization routines, when we last left it, we had the following system:

image

This was good because it was able to predictively decide when to increase the batch size and smooth over spikes easily. But note where we have the costs?

The next step was this:

image

Pre fetching, basically. What we noticed is that we were spending a lot of time just loading the data from the disk, and we changed our behavior to allow us to load things while we are indexing. So on the next indexing batch, we will usually find all of the data we needed already in memory and ready to rock.

This gave us a pretty big boost in how fast we can index things Smile, but we aren’t done yet. In order to make this feature viable, we had to do a lot of work there. For starter, we had to make sure we would take too much memory, and we wouldn’t impact other aspects of the database, etc. Interesting work, all around, even if I am just focusing on the high level optimizations. There is still a fairly obvious optimization waiting for us, but I’ll discuss that in the next post.

Tags:

Published at

Originally posted at

Comments (9)

RavenDB indexing optimizations, Step I–dynamic batches

One of the major features in RavenDB was significant improvements to indexing speed. I thought that this would be a good idea to discuss this in detail.

Here is a very simple example of how RavenDB used to handle indexing.

The blue boxes are inserts, and the green and red ones are showing the indexing work done.

image

 

This is simplified, of course, but that is a good way to show exactly what is going on there. In particular, you can see that it takes quite a long time for us to index everything.

The good thing here is that the actual cost we have is per index, so we want to batch things properly. Note that this behavior is already somewhat optimized. Most of the time, you don’t have a few calls with thousands of documents. Usually we are talking about many calls each saving a small number of documents. This approach would balance those things out, because we can merge many save calls into a single indexing run.

However, that still took too much time, so we introduced the idea of dynamically change the batch size (the reason we have batches is to limit the amount of RAM we use, and allow us to respond more quickly in general).

So we changed things to do this:

image

Note that we increased the batch size as we note that we have more things to index. The batch size will automatically grow all the way to 128K docs, depending on a whole host of factors (load, speed, memory, number of indexes, etc).

Since the cost is most in per batch, we actually got a not insignificant improvement from this approach.  But we can do better, as we will see in our next post.

Tags:

Published at

Originally posted at

Microsoft patterns & practices Symposium 2013

A while ago I had to promise myself that I wouldn’t be traveling so much (translated: all the time). Which is probably why I would not be in this event.

There are some quite interesting talks scheduled there. In particular, I would note:

Tags:

Published at

Originally posted at

RavenDB 1.0 Hot Fix release

In has been a while since we released the last 1.0 build (960). In the meantime, we had done a lot of work on RavenDB to get it ready to 2.0.

But we also maintained, fixed bugs and in general learned a lot from things that happened for people in production. Just before we release 2.0, we decided to create a hot fix release, which contains a lot of the bug fixes that went into the product as a result of actual production experience for our existing 1.0 customers.

The bloody JSON.NET problem. The new hot fix build (992) uses Newtonsoft.Json 4.0.8. If you need to use the 4.5 version, you will need to either keep using build 975 or move to 2.0 (which resolved the problem completely). I am sorry about this, but there is really very little that we can do about it at this point.

Update: We had a problem in the studio in 990, so we fixed that in 992, please upgrade to that.

Changes:

  • Transactions:
    • Will only try to delete transactions from my resource manager
    • Allowing multiple concurrent startups
    • Fixing transaction rollback error
    • Avoiding recovering transactions when EnlistInDistributedTransactions= false
    • Making sure we use the appropriate database, whatever it is the default one or not
    • Making sure that we can use the proper database when we recover from a failure using DTC
    • RavenDB-620 Ensure dealing with recovered transactions properly
    • Notify DTC about recovery completed AFTER we finish processing all the failed transactions
    • Moving the way we are handling storage of the recovery information Instead of storing and recovering on the server, we store / recover locally
    • Making sure that creating a transaction in parallel isn't going to cause issues
    • RavenDB-529 NonAuthoritativeInformation does not consider the transaction timeout
    • Fixing an issue with local transaction identifier always being thought of as the same
  • Replication:
    • Avoiding NRE during conflict resolution
    • Proper cleanup of the transactions
    • Making it possible to disable compression for replication (using Raven/Replication/DisableCompression config option). Useful if you need to replicate to an older version of RavenDB
  • Database:
    • Making sure that deletes are properly atomic in multi threading scenarios
    • Making sure that we can get an error on optimistic concurrency delete inside transaction using munin
    • Moving Munin to Snapshot isolation mode
    • Making sure that all of Munin operations run withing a Read context
    • Better safety using Munin
    • Proper handling of disposing of the database tasks Will handle failed dbs nicely
    • Make sure that we have safe shutdown sequence for transactional storage
    • Making sure that we will cleanup the write marker after we did the complete index cleanup
  • IIS:
    • Database initialization is not happening on a separate thread, so a request timeout should not cause it to die mid way
    • More robust disposal sequence
    • More robust init sequence, will force disposal of the resources created during the db ctor
    • Making sure that we play more nicely with the way ASP.Net calls us during app domain shut down
    • Making sure that we register to the TenantDatabaseModified.Occured from ASP.Net as well
    • Make sure that the idle timeout and the shutdown timeout were reasonable values for what we need during shutdowns

Baring hot fixes for additional critical bugs, this is the last RavenDB 1.0 build

Tags:

Published at

Originally posted at

Comments (8)

Debugging memory issues with RavenDB using WinDBG

We had gotten a database export that showed a pretty rough case for RavenDB. A small set of documents (around 7K) that fan out in multiple map/reduce indexes to be around 70K to 180K (depending of which index is used).

As you can imagine, this puts quite a load on the system, I tried to use the usual methods (dotTrace, hand picking, etc) and we did get some good results on that, we found some pretty problematic issues along the way, and it was good. But we still had the RavenDB process take way too much memory. That means that we had to pull the big guns, WinDBG.

I took a dump of the process when it was using about 1 GB of memory, then I loaded that dump into WinDBG (6.2.9200.20512 AMD64).

I loaded SOS:

.loadby sos clr

Then, the first thing to do was to try to see what is going on with the threads:

image

As you can see, we have a small number of threads, but nothing to write home about. The next step is to see if we have anything very big in the heap:

!dumpheap –stat

image

The first number is the method table address, the second is the count, and the third is the total size. The problem is that all of that combine doesn’t reach near as much memory as we take.

I guess it is possible that we hold a lot of data in the stack, especially since the problem is likely caused by indexing.

I decided to map all of the threads and see what they are doing.

image

Switches me to thread #1, and we can see that we are currently in a waiting. Dumping the stack reveals:

image

This seems to be the debugger thread. Let us look at the rest:

  • 2 – Finalizer
  • 3 – Seems to be an inactive thread pool thread.
    image
  • 6 – appears to be an esent thread:
    image
    I am not sure what this is doing, and I am going to ignore this for now.
  • 7 – also esent thread:
    image

And… I got tired of this, and decided that I wanted to do something more productive, which is to selectively disable things in RavenDB until I find something that drops the memory usage enough to be worth while.

Whack a mole might not be a great debugging tool, but it the essence of binary search. Or so I tell myself to make my conscience sleep more easily. 

For reference, you can look at the dump file here.

Tags:

Published at

Originally posted at

Comments (13)

Managing RavenDB Document Store startup

The RavenDB’s document store is your main access point to the database. It is strongly recommended that you’ll have just a single instance of the document store per each server you are accessing. That usually means that you have to implement a singleton, with all the double checked locking nonsense that is involved in that. I was giving a course in RavenDB last week and I stumbled upon a very nice coding pattern:

public static class Global
{
    private static readonly Lazy<IDocumentStore> theDocStore = new Lazy<IDocumentStore>(()=>
        {
            var docStore = new DocumentStore
                {
                    ConnectionStringName = "RavenDB"
                };
            docStore.Initialize();

            //OPTIONAL:
            //IndexCreation.CreateIndexes(typeof(Global).Assembly, docStore);

            return docStore;
        });

    public static IDocumentStore DocumentStore
    {
        get { return theDocStore.Value; }
    }
}

This is a very readable code, and it handles pretty much all of the treading stuff for you, without obscuring what you really want to do.

And what about when you have multiple servers? How do you handle it then? Same idea, taken one step further:

public static class Global
{
    private readonly static ConcurrentDictionary<string, Lazy<IDocumentStore>> stores = 
        new ConcurrentDictionary<string, Lazy<IDocumentStore>>();
    
    public static IDocumentStore GetDocumentStoreFor(string url)
    {
        return stores.GetOrAdd(url, CreateDocumentStore).Value;
    }

    private static Lazy<IDocumentStore> CreateDocumentStore(string url)
    {
        return new Lazy<IDocumentStore>(() =>
            {
                var docStore = new DocumentStore
                    {
                        ConnectionStringName = url
                    };
                docStore.Initialize();

                //OPTIONAL:
                //IndexCreation.CreateIndexes(typeof(Global).Assembly, docStore);

                return docStore;
            });
    }
}

This is nice, easy and the correct way to handle things.

Tags:

Published at

Originally posted at

Comments (28)