The design of RavenDB 4.0Making Lucene reliable
I don’t like Lucene. It is an external dependency that works in somewhat funny ways, and the version we use is a relatively old one that has been mostly ported as-is from Java. This leads to some design decisions that are questionable (for example, using exceptions for control flow in parsing queries), or just awkward (by default, an error in merging segments will kill your entire process). Getting Lucene to run properly in production takes quite a bit of work and effort. So I don’t like Lucene.
We have spiked various alternatives to Lucene multiple times, but it is a hard problem, and most solutions that we look at lead toward pretty much the same approach that Lucene does it.By now, we have been working with Lucene for over eight years, so we have gotten good in managing it, but there are still quite a bit of code in RavenDB that is decided to managing Lucene’s state, figuring out how to recover in case of errors, etc.
Just off the top of my head, we have code to recover from aborted indexing, background processes that takes regular backups of the indexes, so we’ll be able to restore them in the case of an error, etc. At some point we had a lab of machines that were dedicated to testing that our code was able to manage Lucene properly in the presence of hard resets. We got it working, eventually, but it was hard. And we still get issues from users that into trouble because Lucene can tie itself into knots (for example, a disk full error midway through indexing can corrupt your index and require us to reset it). And that is leaving aside the joy of I/O re-ordering does to you when you need to ensure reliability.
So the problem isn’t with Lucene itself, the problem is that it isn’t reliable. That led us to the Lucene persistence format. While Lucene persistent mode is technically pluggable, in practice, this isn’t really possible. The file format and the way it works are very closely tied to the idea of files. Actually, the idea of process data as a stream of bytes. At some point, we thought that it would be good to implement a Transactional NTFS Lucene Directory, but that idea isn’t really viable, since that is going away.
It was at this point that we realized that we were barking at the entirely wrong tree. We already have the technology in place to make Lucene reliable: Voron!
Voron is a low level storage engine that offers ACID transactions. All we need to do is develop VoronLuceneDirectory, and that should handle the reliability part of the equation. There are a couple of details that needs to be handled, in particular, Voron needs to know, upfront, how much data you want to write, and a single value in Voron is limited to 2GB. But that is fairly easily done. We write to a temporary file from Lucene, until it tells us to commit. At which point we can write it to Voron directly (potentially breaking it to multiple values if needed).
Voila, we have got ourselves a reliable mechanism for storing Lucene’s data. And we can do all of that in a single atomic transaction.
When reading the data, we can skip all of the hard work and file I/O and serve it directly from Voron’s memory map. And having everything inside a single Voron file means that we can skip doing things like the compound file format Lucene is using, and chose a more optimal approach.
And with a reliable way to handle indexing, quite large swaths of code can just go away. We can now safely assume that indexes are consistent, so we don’t need to have a lot of checks on that, startup verifications, recovery modes, online backups, etc.
Improvement by omission indeed.