RavenDB 3.5 whirl wind tourDigging deep into the internals
So far I talked mostly about the visible parts of the stuff that we did in RavenDB 3.5, stuff that has a user interface and is actually easy to talk about. In this post, I'm going to dive a bit into the stuff that goes in the core, which no one usually notices except us, except when it breaks.
RavenDB Lucene Parser
A frequent cause for complaint with RavenDB is the fact that the Lucene Query Parser is relying on exceptions for control flow. That means that if you are debugging a test that is using RavenDB, you are likely to be stopped by LookAheadSuccessException during debugging. This is handled internally, but the default VS configuration will stop on all exception, which caused more than a single person to assume that there is actually some error and post a question to the mailing list.
But the reason we decided to implement our own parser wasn't the issue of exceptions. It was performance and stability. RavenDB doesn't actually use Lucene syntax for queries, we have extended in in several ways (for example, the @in<Group>: (Relatives ,Friends) syntax). Those extensions to the syntax were implemented primarily as pre and post processing over the raw query string using regular expressions. And you know the saying about that. Under profiling, it turned out that significant amount of time was spent in these processing, and in particular, in those regexes.
All of which gives us an extremely efficient parser, no exceptions during the parsing and a formal grammer that we can stick to. If you care, you can read the full grammar here.
Explicit thread management for indexing
In RavenDB 3.0, we rely on the standard .NET ThreadPool for index execution, this has led to some interesting issues related to thread starvation, especially when you have many concurrent requests that take up threads. The fact that the .NET ThreadPool has a staggered growth pattern also have an impact here, in terms of how much we are actually scale out there.
By creating our own thread pool, decided for our own stuff, we are able to do things that you can't do in the global thread pool. For example, we can respond to CPU pressure by reducing the priority of the indexing thread pool threads, so we'll prefer to process request than do background work. We also have a more predictable behavior around indexing batches and abandon an index midway through an index to ensure liveliness for the entire indexing process.
And what is almost as important, the fact that we have our own thread pool for indexing means that we can now much more easily report and monitor it. Which make our lives much easier in production.
As a reminder, we have the RavenDB Conference in Texas in a few months, which would be an excellent opportunity to see RavenDB 3.5 in all its glory.
More posts in "RavenDB 3.5 whirl wind tour" series:
- (25 May 2016) Got anything to declare, ya smuggler?
- (23 May 2016) I'm no longer conflicted about this
- (19 May 2016) What did you subscribe to again?
- (17 May 2016) See here, I got a contract, I say!
- (13 May 2016) Deeper insights to indexing
- (11 May 2016) Digging deep into the internals
- (09 May 2016) I'll have the 3+1 goodies to go, please
- (04 May 2016) I’ll find who is taking my I/O bandwidth and they SHALL pay
- (02 May 2016) You want all the data, you can’t handle all the data
- (29 Apr 2016) A large cluster goes into a bar and order N^2 drinks
- (27 Apr 2016) I’m the admin, and I got the POWER
- (25 Apr 2016) Can you spare me a server?
- (21 Apr 2016) Configuring once is best done after testing twice
- (19 Apr 2016) Is this a cluster in your pocket AND you are happy to see me?
Comments
I would dispute the value of reducing indexing thread priority. Indexing might be cache-hungry, and pushing it off and on the scheduler may end up reducing the overall throughput.
If your high-request-intensity spike is not very short, indexing will still proceed to completion in background, but at much higher total cost.
You might want to consider pausing indexing activity instead of spreading it thinner.
Mihailik, By reducing the priority, as long as there is enough activity on the system to keep requests processing, it will effectively be paused. If there is enough capacity to run some stuff, then the OS is already going to try to schedule the indexing work on the same set of cores. And we don't want to play with this too much, because a high spike that pauses indexing needs something to free them, and if you have spikes every 5 seconds, that could keep indexing paused forever.
Comment preview