Index Prioritization in RavenDB: Problems
Every now and then we get a request for index prioritization in RavenDB. The requests are usually in the form of:
I have an index (or a few indexes) that are very important, and I would like them to be update before any other indexes.
That is really nice request, but it ignores a lot of actual really hard implementation details.
In any prioritization scheme, there is the risk of starvation. If index A has to complete before index B, what happens if we have enough writes that index A is always busy? That means that index B will never get to index, and it will fall further & further behind. There are well known algorithms to handle this scenario, mostly from the OS thread scheduling point of view.
You could incrementally increase the priority of an index every time that you skipped updating it, until at some point it has higher priority than all the other indexes and gets its moment in the sun. That is workable if all you are working with are threads, and there isn’t a significantly different execution environment for a thread to run.
For RavenDB indexes, there is actually a major difference in the execution environment depending on when you are running. We have a lot of optimizations inside RavenDB to avoid IO, in particular, we do a lot of work so indexes do not have to wait for their input, we do parallel IO, optimized insert hooks, and a whole bunch of stuff like that. All of those assume that you all of the indexes are going to run together, however.
We already have the feature that a very slow index will be allowed to run while the rest of the indexes are keeping up, but that is something that we really try to avoid (we give it a grace period of 3/4 as much time as all of the other indexes combined). That is because the moment you have out of sync indexes, all that hard work is basically going to be wasted. You are going to be needing to load the documents to be indexed multiple times, creating more load on the server. Keeping the documents that were already indexed waiting in memory for the low priority index to work on is also not a good idea, since that is going to cause RavenDB to consume potentially a LOT more memory.
I have been thinking about this for a while, but it isn’t an easy decision. What do you think?
The issue of starvation makes a lot of sense to me. I don't want a low priority index to get neglected forever.
When you say a LOT more memory, what are we talking about here? I think people would take the trade off if they understood the risks and were willing to pay for the memory. I've seen people say they have 20 gigs of memory or higher on some of their ravendb servers.
It definitely is a hard problem, but if it was easy then everyone would be doing it :).
Khalid, LOT more memory means that we don't know when to release memory. If you have three low prio indexes, all of them indexing at different rates, how do you tell when to release memory held for them. Let us assume that they are 20K, 40K, 60K documents behind the std indexes. That means that we have to keep 60K documents in memory (and remember that we still get new writes) to allow for in memory indexing without additional IO. It also means that those indexes are going to get bigger and bigger batches, which impact CPU & IO as well.
So what if you had three lanes of documents in memory. High, Normal, Low. You would duplicate the document into each lane, and let the indexes consume those documents in that particular lane.
Low might remain in memory for a bit, but you could clear out normal and high as you process them assuming they process as quickly as they do now or even more quickly.
At most you would peak at 3 times as much memory as you use now.
What do you think?
Khalid, What happens when you have 3 low indexes, running at different speed? Could you /should you run high & low in parallel? What happens under low mem conditions?
Those are tough questions.
Different speed / same priority Is this much different then how indexes work today? All the indexes now are the same priority, but more often than not, one index finishes before the other (unless the studio is lying to me).
Parallel : Yes you would run them in parallel, because each priority would have a separate set of documents. So they could run in parallel without influencing each of the other sets. IO and Memory is another issue.
Low memory conditions This is probably a implementation decision, if you wanted to drop all the documents from the low index queue and reload them into memory later you could do that. You could also have Low priority indexes limited by IO. Meaning, you read them from disk sequentially and put the pressure on CPU and IO (I don't like that as I type it).
Interesting issue. Looks like you are quite often trying to do operating system's job in scheduling, caching, I/O and memory management. Maybe it would be a good idea to let the OS handle these issues? For example, you could make indexing an external process (one process per index) that would run in its own pace. The db files are probably already cached by the OS so this wouldn't hurt the IO performance very much.
Isn't this similar to the idea of priority queues on a service bus?
My understanding was that is a generally not a good idea. Instead of prioritizing you could setup a dedicated service bus for the high priority workflows. could the same approach be used for indexing?
replicate data to another instance and have that instance manage specific index(es). Priority is then implicitly defined because the ravendb instance is only responsible for that index (or set of indexes).
Forgive me for being simple about this but if the question is how to support priority indexing for a small number of very important indexes without impacting other behavior in the index then the answer seems simple to me - move the high priority index/queries/data to their own instances of Raven.
If there is that much activity and the consistency of the index is that important (i.e. worth spending $$$) than why not give features that are so important their own db? If that is too expensive then maybe the consistency of the index isn't as important as thought, or maybe the data doesn't need to be written as often, or a bulk operation done less frequently would suffice?
Granted, it isn't as sexy as solving the race condition but . . .
What about injecting some logic at write time that defers indexing of a low priority index when a new document is added? I'm not familiar with the internals, but this would be similar to pretending the document didn't exist (for specific indexes) until a later point when either the system has capacity or a dynamically expired period expires (it couldn't be static as you'd run into the same problem, just a but later). At this point you simulate the write with metadata indicating which indexes are already aware of the changes. For this "write", the indexing subsystem executes skipping the high priority, already updated indexes.
As I said I'm not sure this approach makes sense given the optimizations internal to RavenDB. It just allows one to throttle the indexing events for writes without (I'm guessing) fiddling too much with internal optimisation.
Rafal, How would using multiple processes help any? Sure, I could let the OS handle scheduling, but it doesn't really change anything in terms of the other constraints that I've got. It really complicates memory management, because now I have to deal with multiple processes and manage all of them (remember, if we start paging, perf drop like a rock), so we can't let the OS do that. It doesn't help for I/O, since I would either need to duplicate the I/O or share it among multiple processes, which is a decidedly non trivial task.
Jason, Pretty much, yes. That is the current recommendation.
Gregory, Sure, that is what you would want to do, but then you get into situations about who is the master, what about conflicts, data aliasing issues, etc.
Rohland, We actually came up with something similar to that idea, keep an eye on the queue, it will show up in a few days.
@Ayende I'm talking about letting the OS manage memory and you are afraid that you would have to manage all that memory yourself - this is quite opposite from what I was trying to say. Unix systems are quite good at sharing memory pages between processes, so maybe Windows can do the same - then you would avoid the penalty of duplicated cache and I/O. And besides, its much easier to terminate a worker process and return all resources to the OS than trying to garbage-collect unused CLR objects from a running application.
... but after giving it a second thought, I don't think this would work well enough, there is too much concurrent writing going on and it needs to be managed quite carefully
I guess I explained it poorly because it looks to me that Jason and I made the same suggestion. (great minds thinking alike and such) You liked his suggestion but had issues with mine . . . need to work on my language skills.
Rafal, Sure, Windows has the idea of shared memory. But it is non trivial to handle that properly. Especially if you want to share that to multiple threads. And you can't really share .NET objects properly.
Yeah, it's quite unfortunate that Json objects need to be represented in two distinct forms - a serialized block of bytes in the storage and parsed in-memory document representation with all mem overhead, fragmentation and caching/sharing problems.