Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,408

filter by tags archive

RavenDB ShardingEnabling shards for existing database


A question came up in the mailing list, how do we enable sharding for an existing database. I’ll deal with data migration in this scenario at a later post.

The scenario is that we have a very successful application, and we start to feel the need to move the data to multiple shards. Currently all the data is sitting in the RVN1 server. We want to add RVN2 and RVN3 to the mix. For this post, we’ll assume that we have the notion of Customers and Invoices.

Previously, we access the database using a simple document store:

var documentStore = new DocumentStore
{
	Url = "http://RVN1:8080",
	DefaultDatabase = "Shop"
};

Now, we want to move to a sharded environment, so we want to write it like this. Existing data is going to stay where it is at, and new data will be sharded according to geographical location.

var shards = new Dictionary<string, IDocumentStore>
{
	{"Origin", new DocumentStore {Url = "http://RVN1:8080", DefaultDatabase = "Shop"}},//existing data
	{"ME", new DocumentStore {Url = "http://RVN2:8080", DefaultDatabase = "Shop_ME"}},
	{"US", new DocumentStore {Url = "http://RVN3:8080", DefaultDatabase = "Shop_US"}},
};

var shardStrategy = new ShardStrategy(shards)
	.ShardingOn<Customer>(c => c.Region)
	.ShardingOn<Invoice> (i => i.Customer);

var documentStore = new ShardedDocumentStore(shardStrategy).Initialize();

This wouldn’t actually work. We are going to have to do a bit more. To start with, what happens when we don’t have a 1:1 match between region and shard? That is when the translator become relevant:

.ShardingOn<Customer>(c => c.Region, region =>
{
    switch (region)
    {
        case "Middle East":
            return "ME";
        case "USA":
        case "United States":
        case "US":
            return "US";
        default:
            return "Origin";
    }
})

We basically say that we map several values into a single region. But that isn’t enough. Newly saved documents are going to have the shard prefix, so saving a new customer and invoice in the US shard will show up as:

image

But existing data doesn’t have this (created without sharding).

image

So we need to take some extra effort to let RavenDB know about them. We do this using the following two functions:

 Func<string, string> potentialShardToShardId = val =>
 {
     var start = val.IndexOf('/');
     if (start == -1)
         return val;
     var potentialShardId = val.Substring(0, start);
     if (shards.ContainsKey(potentialShardId))
         return potentialShardId;
     // this is probably an old id, let us use it.
     return "Origin";

 };
 Func<string, string> regionToShardId = region =>
 {
     switch (region)
     {
         case "Middle East":
             return "ME";
         case "USA":
         case "United States":
         case "US":
             return "US";
         default:
             return "Origin";
     }
 };

We can then register our sharding configuration so:

  var shardStrategy = new ShardStrategy(shards)
      .ShardingOn<Customer, string>(c => c.Region, potentialShardToShardId, regionToShardId)
      .ShardingOn<Invoice, string>(x => x.Customer, potentialShardToShardId, regionToShardId); 

That takes care of handling both new and old ids, and let RavenDB understand how to query things in an optimal fashion. For example, a query on all invoices for ‘customers/1’ will only hit the RVN1 server.

However, we aren’t done yet. New customers that don’t belong to the Middle East or USA will still go to the old server, and we don’t want any modification to the id there. We can tell RavenDB how to handle it like so:

var defaultModifyDocumentId = shardStrategy.ModifyDocumentId;
shardStrategy.ModifyDocumentId = (convention, shardId, documentId) =>
{
    if(shardId == "Origin")
        return documentId;

    return defaultModifyDocumentId(convention, shardId, documentId);
};

That is almost the end. There is one final issue that we need to deal with, and that is the old documents, before we used sharding, don’t have the required sharding metadata. We can fix that using a store listener. So we have:

 var documentStore = new ShardedDocumentStore(shardStrategy);
 documentStore.RegisterListener(new AddShardIdToMetadataStoreListener());
 documentStore.Initialize();

Where the listener looks like this:

 public class AddShardIdToMetadataStoreListener : IDocumentStoreListener
 {
     public bool BeforeStore(string key, object entityInstance, RavenJObject metadata, RavenJObject original)
     {
         if (metadata.ContainsKey(Constants.RavenShardId) == false)
         {
             metadata[Constants.RavenShardId] = "Origin";// the default shard id
         }
         return false;
     }

     public void AfterStore(string key, object entityInstance, RavenJObject metadata)
     {
     }
 }

And that is it. I know that there seems to be quite a lot going on in here, but it basically can be broken down to three actions that we take:

  • Modify the existing metadata to add the sharding server id via the listener.
  • Modify the document id convention so documents on the old server won’t have a designation (optional).
  • Modify the sharding configuration so we’ll understand that documents without a shard prefix actually belong to the Origin shard.

And that is pretty much it.

Why RavenDB isn’t written in F#, or the cost of the esoteric choice


In a recent post, a commenter suggested that using F# rather than C# would dramatically reduce the code size (measured in line numbers).

My reply to that was:

F# would also lead to a lot more complexity, reduced participation in the community, harder to find developers and increased costs all around.

And the data to back up this statement:

C# Developers F# Developers

image

image

Nitpicker corner: Now, I realize that this is a sensitive topic, so I’ll note that this isn’t meant to be a scientific observation. It is a data point that amply demonstrate my point. I’m not going to run a full bore study.  And yes, those numbers are about jobs, not people, but I’m assuming that the numbers are at least roughly comparable.

The reply to this was:

You have that option to hire cheaper developers. I think that the cheapest developers usually will actually increase your costs. But if that is your way, then I wish you good luck, and I accept that as an answer. How about "a lot more complexity"?

Now, let me try to explain my thinking. In particular, I would strongly disagree with the “cheapest developers” mentality. That is very far from what I’m trying to achieve. You usually get what you pay for, and trying to save on software development costs when your product is software is pretty much the definition of penny wise and pound foolish.

But let us ignore such crass terms as money and look at availability. There are less than 500 jobs for F# developers (with salary ranges implications that there isn’t a whole lot of F# developers queuing up for those jobs). There are tens of thousands of jobs for C# developers, and again, the salary range suggest that there isn’t a dearth of qualified candidates that would cause demand to raise the costs. From those numbers, and my own experience, I can say the following.

There are a lot more C# developers than there are F# developers. I know that this is a stunning conclusion, likely to shatter the worldview of certain people. But I think that you would find it hard to refute that. Now, let us try to build on this conclusion.

First, there was the original point, that F# lead to reduced number of lines. I’m not going to argue that, mostly  because software development isn’t an issue of who can type the most. The primary costs for development is design, test, debugging, production proofing, etc. The act of actually typing is pretty unimportant.

For fun, I checked out the line count numbers for similar projects (RavenDB & CouchDB). The count of lines in the Raven.Database project is roughly 100K. The count of lines in CouchDB src folder is roughly 45K. CouchDB is written in Erlang, which is another functional language, so we are at least not comparing apples to camels here. We’ll ignore things like different feature set, different platforms, and the different languages for now. And just say that an F# program can deliver with 25% lines of code of a comparable C# program.

Note that I’m not actually agreeing with this statement, I’m just using this as a basis for the rest of this post. And to (try to) forestall nitpickers. It is easy to show great differences in development time and line of code in specific cases where F# is ideally suited to the task. But we are talking about general purpose usage here.

Now, for the sake of argument, we’ll even assume that the cost of F# development is 50% of the cost of C# development. That is, that the reduction in line count actually has a real effect on the time and efficiency. In other words, if an F# program is 25% smaller than a similar C# program, we’ll not assume that it takes 4 times as much time to write.

Where does this leave us? It leave us with a potential pool of people to hire that is vanishingly small. What are the implications of writing software in a language that have fewer people familiar with it?

Well, it is harder to find people to hire. That is true not only for people that your hire “as is”. Let us assume that you’re going to give those people additional training after hiring them, so they would know F# and can work on your product. An already steep learning curve has just became that much steeper. Not only that, but this additional training means that the people you hire are more expensive (there is an additional period in which they are only learning). In addition to all of that, it will be harder to hire people, not just because you can’t find people already experienced with F#, but because people don’t want to work for you.

Most developers at least try to pay attention to the market, and they make a very simple calculation. If I spend the next 2 – 5 years working in F#, what kind of hirability am I going to have in the end? Am I going to be one of those trying to get the < 500 F# jobs, or am I going to be in the position to find a job among the tens of thousands of C# jobs?

Now, let us consider another aspect of this. The community around a project. I actually have a pretty hard time finding any significant F# OSS projects. But leaving that aside, looking at the number of contributors, and the ability of users to go into your codebase and look for themselves is a major advantage. We have had users skip the bug report entirely and just send us a Pull Request for an issue they run into, others have contributed (significantly) to the project. That is possible only because there is a wide appeal. If the language is not well known, the number of people that are going to spend the time and do something with it is going to be dramatically lower.

Finally, there is the complexity angle. Consider any major effort required. Recently, we are working on porting RavenDB to Linux. Now, F# work on Linux, but anyone that we would go to in order to help us port RavenDB to Linux would have this additional (rare) requirement, need to understand F# as well as Linux & Mono. Any problem that we would run into would have to first be simplified to a C# sample so it could be readily understood by people who aren’t familiar with F#, etc.

To go back to the beginning, using F# might have reduce the lines of code counter, but it wouldn’t reduce the time to actually build the software and it would limit the number of people that can participate in the project, either as employees or Open Source contributors.

A new blog look & feel


This blog has been running continuously for over 10 years now. And while I think that the level of content has improved somewhat over the years (certainly my command of English did), I’m afraid that we never really touch with the design.

This blog theme was taken from (if I recall properly) dasBlog default skin with some color shifting to make it a bit more orange. And I kept that look for the past 10 years, even when we moved between various blogging platform. This has grown tiring, and more to the point, the requirement that we have today are not nearly the same as before.

Hence, the new design. This include responsive design, mobile friendly layout and improving just about every little bit in the user experience.

One major feature is the introduction of series, which will allow reader to easily go through an entire related series of post without them (or me) having to do anything.

I would appreciate any feedback you have.

Boldly & confidently fail, it is better than the alternative


Recently I had the chance to sit with a couple of the devs in the RavenDB Core Team to discuss “keep & discard” habits*.

The major problem we now have with RavenDB is that it is big. And there are a lot of things going on there that you need to understand. I run the numbers, and it turns out that the current RavenDB contains:

  • 835,000 Lines of C#
  •   67,500 Lines of Type Script
  •   87,500 Lines of HTML

That is divided into many areas of functionalities, but that is still a big chunk of stuff to go through. And that is ignoring things that require we understand additional components (like Esent, Lucene, etc). What is more, there is a lot of expertise in understanding what is going on in term of the full picture. We limit this value here because too much of it would result in high memory consumption under this set of circumstances, for example.

The problem is that it take time, and sometime a lot of it, to get good understanding on how things are all coming together. In order to handle that, we typically assign new devs issues from all around the code base. The idea isn’t so much to give them a chance to become expert in a particular field, but to make sure that they get the general idea of how come is structured and how the project comes together.

Over time, people tend to gravitate toward a particular area (M** is usually the one handling the SQL Replication stuff, for example), but that isn’t fixed (T fixed the most recent issue there), and the areas of responsibility shifts (M is doing a big task, we don’t want to disturb him, let H do that).

Anyway, back to the discussion that we had. What I realized is that we have a problem. Most of our work is either new features or fixing issues. That means that nearly all the time, we don’t really have any fixed template to give developers “here is how you do this”. A recent example was an issue where invoking smuggler with a particular set of filters would result in very high cost. The task was to figure out why, and then fix this. But the next task for this developer is to do sharded bulk insert implementation.

I’m mentioning this to explain a part of the problem. We don’t see a lot of “exactly the same as before” and a new dev on the team lean on the other members quite heavily initially. That is expected, of course, and encouraged. But we identified a key problem in the process. Because the other team members also don’t have a ready made answer, they need to dig into the problem before they can offer assistance, which sometimes (all too often, to be honest) lead to a “can you slide the keyboard my way?” and taking over the hunt. The result is that the new dev does learn, but a key part of the process is missing, the finding out what is going on.

We are going to ask both sides of this interaction to keep track of that, and stop it as soon as they realize that this is what is going on.

The other issue that was raised was the issue of fear. RavenDB is a big system, and it can be quite complex. It is quite reasonable apprehension, what if I break something by mistake?

Here it comes back to the price of failure. Trying something out means that at worst you wasted a work day, nothing else. We are pretty confident in our QA process and system, so we can allow people to experiment. Analysis paralysis is a much bigger problem. And I wasn’t being quite right, trying the wrong thing isn’t wasting a day, you learned what doesn’t work, and hopefully also why.

“I have not failed. I've just found 10,000 ways that won't work.”
Thomas A. Edison

* Keep & discard is a literal translation of a term that is very common in the IDF. After most activities, there is an investigation performed, and one of the first questions asked is what we want to keep (good things that happened that we need to preserve for the next time we do this) and what we need to discard (bad things that we need to watch out for).

** The actual people are not relevant for this post, so I’m using letters only.

My view on crowd funding


After my previous post, I was asked what I’m thinking about the notion of crowd funding, which is currently all the rage.

The answer is complicated. I’m focusing right now on things like kick starter and its siblings, because I’m familiar with how they work. The basic premise is pretty great. You have some idea (usually a product) that require initial capital and has some well known market. By directly contacting the target audience, we can get the seed money, judge demand and have very low risk overall. The “investors” put in small amount of money, which loss they can tolerate without hardship. The project get money for very little effort and get great marketing along the way.

This is great, if you are doing a product. Something that can be sold. For instance, let us say that we want to do a major feature, like adding time series capabilities to RavenDB. Let us say that we start a kick starter campaign for this, asking for 150,000 USD and promising backers that they’ll get a free license out of early sponsorship.

I’ll get into the exact costs associated with this option in a bit. But before we go there, remember the premise of my previous post. It isn’t money to build a specific product. It is money that is required to purchase something for the business itself. Of course, buying that cool car will raise morale and I have a spreadsheet that says that it will increase the effectiveness of the team by 17.4% (although it will decrease parking space by 37%). So it make sense to go with that, from a business perspective. However, there is very little that I can do to actually make people want to back “we want a cool car” notion. At least, I don’t think so, but the internet does have some dark corners.

Back to the notion of using this to build products. There is a very basic problem here. RavenDB isn’t targeting individuals. It is a database platform, and most of our customers are businesses or enterprises. That lead to a very different mindset. Speculative investment in something like this is going to be much rarer, harder and fraught with issues. An Open Source project can do that, but it make sense to invest in a project a business is using, but there are very few who actually manage to do that. A quick search of kick starter doesn’t show any major open source soliciting funds there.

Kick starter make sense for personal stuff, things that you actually get to hold, or need to buy. Something of some scarcity. Doing this for commercial software make very little sense, and for open source, it is even a bigger problem. For open source projects that depend on donations, usually you have a valid commercial reason for people to donate (Linux, Wikipedia, etc).

I’m open for contrarian point of view, mind. But I don’t think that crowd funding is applicable for the kind of things that I would want to use it for.

Funding options


This is a divergence from my usual discussion on technical stuff. In this post, I want to talk about money. In particular, how you get it from other people. Note that I am neither an expert nor qualified to talk about the subject matter, this post came out of a lot of scribbled notes and is mostly meant to serve as a way to lay down a line of thought. All numbers are made up, and while I would like such a car, it would be mostly to inflict it on the employee of the month.

There are many cases in the lifecycle of a business where you need more cash than you currently have (or are willing to spend outright).

A common scenario is when you start a business, or when you want to expand it. For our discussion, we’ll use the example of the following drool worthy car:

I consider such a piece of art priceless, but let us say that I managed to convince the owner to sell it to me for the nice sum of 1,000,000$.

Unfortunately, I don’t have 1,000,000$. I only have 650,000$. So long, beautiful car, it was very nice to know you, but it is just not possible. Except that there is this thing where people give you a lump sum of money, and you give it back over time (although usually more than you got).

Funding is important for businesses in the same manner that breathing is for people. There are typically several ways to fund a business:

  • Direct cash infusion – That is usually how most business start. The amount of money put into the business depend on what it needs to do. A web developer would need the money buy a laptop and a Starbucks loyalty card, so that is easy. For a restaurant, you need enough money for rent, employees, equipment, etc. The smaller the amount you need to put into the business to kick start it, the easier it is to just use your own saving to do so.
  • Partners – This is pretty much the same as the previous one, but instead of having only one person do that, you have multiple people and more savings to dip into.
  • Angels/Investors – Those are people who for various reasons would give you money. Sometimes this is because they are related to you, but often time it is a calculated move, investing some money in a business in order to get a stake in it and cash it in afterward.
  • Government development loan / grant – Sometimes you can get this, and they usually have both very good terms, and really strict rules, regulation and hops to jump through.
  • Bank / credit loan – Well, you are presumably familiar with that. You get a loan, pay interest, mortgage some assets, etc.
  • Self funded – Your business is making more money than it is spending, therefor you have money to spend on the business.

The best choice is self funding, because that mean that you are profitable and aren’t beholden to someone. The other really depend on personal preferences. Here are mine:

  • Direct cash infusion – That works for starting a business with low starting overhead costs (see, single developer shop). It might also be viable if you have a lot of personal wealth that you can put into the business, but personally, I like to think about the money flowing in the other direction. Otherwise that is an indication that there is something strange going on here.
  • Partners – I used to work at a place that was owned by 7 founding members + 1 “silent partner”. I still remember when the entire company got an email from a co-CEO that was basically: “You are forbidden to discuss project X or anything related to it with the other co-CEO”. That left an… impression, shall we say. Also, this is again something that you would usually do in the beginning. Bringing a partner into an existing business implies one of a few things. You are in a big trouble (either personally or the business) and need cash infusion that you can’t/won’t supply or you are doing really well and people are flocking to join you.
  • Investors/Angels – This is very similar to the previous point, with the caveat that investors usually aren’t going to meddle in the day to day affairs, nor are they going to shoulder any burden. They are there to provide the money, some expertise/networking but that is basically it. They do create a pretty huge amount of bureaucracy, reports, compliance, etc. The investors needs to know that you aren’t blowing away their money, after all.
  • Government development loan / grant – This is pretty much the same as the previous one, only the investor is the government. If you thought that investors generated a lot of paperwork, you were mistaken.

The remaining two options are self funding and getting a loan. Now, assuming that no one else buy this magnificent car, I can put some numbers in Excel and predict that in a couple of years, I’ll have enough money to buy it outright. So all I need to do is ask the owner to not sell it to anyone, hope that my cash flow remain according to projections, hope the price doesn’t change and just wait.

Of course, that means that I can’t crash lift moral by making this the official company car in the meantime. I’m losing quite a lot of amusing moments by waiting, and that is assuming that it is still possible in two years. Of course, if in two years I would have the money to do so, I’m not so sure that I would still want to just purchase it directly. That would mean having no money at all. And that is kinda of scary, because salaries need to be paid, and this car doesn’t look like it has good gas/mileage ratio.

So the option that we have left is taking a loan. The nice thing about doing that is that we can mortgage the actual asset that we are buying, this magnificent car. Now, the bank may not value it as much as I do, so they are going to give it a price of only 900,000$, and then they are going to only agree to fund 80% of that, which gives us 720,000$.

In other words, that means that we need to puny up 380,000$, which is much more reasonable, and leave us with a bit of free cash cushion. That lead to a few interesting observations:

  • The loan amount and the money we already have are comparable. That means that the bank is going to be much nicer to us than if we wanted to borrow much more money than we already have (on the assumption that if we got this amount of money once, we’ll be able to get it again to pay them).
  • There is a valid asset to mortgage, which reduce the loan risk (and thus get us better terms).
  • The current interest environment is at an all times low, which mean that this is a great time to loan money (and bad time to try to save).

This means that this is a much simpler deal than going to a bank with a business plan and hoping that they will believe that we can make it. Now, let us get down to the financial details.

An offer from bank A is for an interest rate of 4%. That gives us a month payment over ten years of 7,290$ per month.

An offer from bank B is for an interest rate of 4.25%. Which gives a monthly payment of 7,375$ per month.

That is a simple number game, and we are pretty much done at this point, right? Almost, but let us project this over 10 years, and see where that put us.

Bank A: Total amount of interest paid is 154,800$

Bank B: Total amount of interest paid is 165,000$

In other words, the total difference is 10,200$. That means that while it is still a numbers game, it isn’t just the interest rate. The reason is that we now need to consider a lot more aspects. For example, Bank B may have an easier loan approval process, or require less security, or value the car higher than bank A. Bank A doesn’t allow early cash out, while bank B does, or a million and one other differences.

The question now becomes is whatever the other stuff beyond the raw interest rate can be quantified, and whatever it is worth more than 10,000$.

As I said earlier in this post, this is mostly settling things in my mind. Feel free to ignore this post all together.

RavenDB 3.5 Features: Data Exploration


RavenDB is doing a pretty great job for being a production database, in fact, we have designed it upfront to only have features that make sense to have for robust production systems.

In particular, we don’t have any form of ad-hoc queries. A query always hits an index, so it is very fast. Even what we call dynamic queries in RavenDB are actually creating an index behind the scene. This is pretty awesome for normal production usage, but it does have some limitations when you want to explore the data. This can be because you are a developer trying to find a particular something, and you just want to quickly fire off random queries. You don’t care about the costs, and you don’t want to generate indexes. Or you can be an admin that needs to get a particular report from the system and you want to play around with the details until you get everything right.

In order to serve those needs, RavenDB 3.5 is going to have a really nice feature, explicit data exploration.

For example, let us say that I want to count the number of unique words in all of my posts, I can do it using the following:

image

Note that the actual query is pretty meaningless, and I’m writing this at 1AM with a baby nearby that make funny noises, so the Linq statement there works, but can probably be better.

The point here is that to demo what is going on. We write a simple Linq statement, and can run it against our database, and then gather the results back. It is like having LinqPad directly inside the RavenDB studio. In fact, that is the number one scenario that we envision for this feature, replacing LinqPad usage by having a native capability.

Now, some caveats. As you can see, you can select to limit the query duration as well as the number of documents it will operate on. That give us a quick way to explore the data without putting too much load on the server. You can even take the output here and throw it directly to Excel. “Sam, can you give the a breakdown of orders this year by month and country? Just email me the Excel spreadsheet”.

Note that this is intended as a user feature, it isn’t something that we provide an API for. It is there for admins or developers that are figuring things out, an admin feature, not something that you want to use on production.

Work stealing in the presence of startup / shutdown costs


I mentioned that we have created our own thread pool implementation in RavenDB to handle our specific needs. A common scenario that ended up quite costly for us was the notion of parallelizing similar work.

For example, I have 15,000 documents to index .That means that we need to go over each of the documents and apply the indexing function. That is an embarrassingly parallel task. So that is quite easy. One easy way to do that would be to do something like this:

foreach(var doc in docsToIndex)
	ThreadPool.QueueUserWorkItem(()=> IndexFunc(new[]{doc}));

Of course, that generates 15,000 entries for the thread pool, but that is fine.

Except that there is an issue here, we need to do stuff to the result of the indexing. Namely, write them to the index. That means that even though we can parallelize the work, we still have non trivial amount of startup & shutdown costs. Just running the code like this would actually be much slower than running it in single threaded mode.

So, let us try a slightly better method:

foreach(var partition in docsToIndex.Partition(docsToIndex.Length / Environment.ProcessorCount))
	ThreadPool.QueueUserWorkItem(()=> IndexFunc(partition));

If my machine has 8 cores, then this will queue 8 tasks to the thread pool, each indexing just under 2,000 documents. Which is pretty much what we have been doing until now.

Except that this means that we have to incur the startup/shutdown costs a minimum of 8 times.

A better way is here:

ConcurrentQueue<ArraySegment<JsonDocument>> partitions = docsToIndex.Partition(docsToIndex.Length / Environment.ProcessorCount);
for(var i = 0; i < Environment.ProcessorCount; i++) 
{
	ThreadPool.QueueUserWorkItem(()=> {
		ArraySegment<JsonDocument> first;
		if(partitions.TryTake(out first) == false)
			return;

		IndexFunc(Pull(first, partitions));
	});
}

IEnumerable<JsonDocument> Pull(ArraySegment<JsonDocument> first, ConcurrentQueue<ArraySegment<JsonDocument>> partitions )
{
	while(true)
	{
		for(var i = 0; i < first.Count; i++)
			yield return first.Array[i+first.Start];

		if(partitions.TryTake(out first) == false)
			break;
	}
}

Now something interesting is going to happen, we are scheduling 8 tasks, as before, but instead of allocating 8 static partitions, we are saying that when you start running, you’ll get a partition of the data, which you’ll go ahead and process. When you are done with that, you’ll try to get a new partition, in the same context. So you don’t have to worry about new startup/shutdown costs.

Even more interesting, it is quite possible (and common) for those tasks to be done with by the time we end up executing some of them. (All the index is already done but we still have a task for it that didn’t get a chance to run.) In that case we exit early, and incur no costs.

The fun thing about this method is what happens under the load when you have multiple indexes running. In that case, we’ll be running this for each of the indexes. It is quite likely that each core will be running a single index. Some indexes are going to be faster than the others, and complete first, consuming all the documents that they were told to do. That means that the tasks belonging to those indexes will exit early, freeing those cores to run the code relevant for the slower indexes, which hasn’t completed yet.

This gives us dynamic resource allocation. The more costly indexes get to run on more cores, while we don’t have to pay the startup / shutdown costs for the fast indexes.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats