Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,972 | Comments: 44,523

filter by tags archive

Things we learned from production, part III–singleton thinking makes long queues


One of the more interesting things that we had to learn in production was that we aren’t an only child. It is a bit more complex than that, and I am not explaining well, let me start at the beginning.

Usually, when we work on RavenDB, we work within the scope of a single database, all of our efforts are usually scoped to that. That means that when we worked on the multi database feature for RavenDB, we actually focused on the process of loading a single database up in the air. We considered how multiple databases will interact, and we made sure that they are isolated from one another, but that was about it.

In particular, as mentioned in the previous post, starting up and shutting down were done sequentially, on a per database basis. In order to prevent issues, we had a lock on the initialize database part of the process, so two requests to the same database will not result in the same database being loaded twice.

I mentioned that we were thinking on a single database mindset, right?

Can you guess what happened?

  • Request for DB #1 – lock acquired, starting up
    • Request for DB #1 – waiting for lock to release
    • Request for DB #1 – waiting for lock to release
    • Request for DB #1 – waiting for lock to release
  • DB initialized, lock released
  • All requests are now freed and can be processed.

What happen when we have multiple databases, however?

  • Request for DB #1 – lock acquired, starting up
    • Request for DB #1 – waiting for lock to release
    • Request for DB #2 – waiting for lock to release
    • Request for DB #3 – waiting for lock to release
  • DB initialized, lock released
  • Request for DB #2 – lock released, lock acquired, starting up
    • Request for DB #3 – waiting for lock to release

You guessed it, we actually had a global lock for starting (or disposing, for that matter) databases. That meant that a single db that took time to start would impact other databases.

More importantly, it would means that other requests, which were waiting for that database to load and then had to load their own database, had far less time to actually do the processing they needed. Which meant that they were far more likely to run into the request time limit and be aborted by IIS. Which left them in an inconsistent state. Which was a nightmare to figure out.

We resolved this issue by making sure that the lock is now handled only on the same database, and that we won’t lock forever, if after a while we still don’t have the db, we will error early and give you a 503 Service Unavailable error until the db is ready to rock.


Comments

Gene Hughson

The Highlander rule...if there can be only one, expect lots of mayhem ;-)

Andrew

Would be great ayende if you started tagging your posts with the build number that the changes came into effect

jesús

I'm afraid I have fallen in the same singleton thinking while developing the update cascade bundle https://github.com/jesuslpm/UpdateCascadeBundle. Now I need to support multiple databases. I guess there will be an instance of IStartupTask, AbstractPutTrigger and so on per database ?

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats