Awesome RavenDB feature of the Day, Compression
RavenDB stores JSON documents. Internally on disk, we actually store the values in BSON format. This works great, but there are occasions where users are storing large documents in RavenDB.
In those cases, we have found that compressing those documents can drastically reduce the on-disk size of the documents.
Before we go on, we have to explain what this is for. It isn’t actually disk space that we are trying to save, although that is a nice benefit. What we are actually trying to do is reduce the IO cost that we have when loading / saving documents. By compressing the documents before they hit the disk, we can save in valuable IO time (at the expense of using relatively bountiful CPU time). Reducing the amount of IO we use have a nice impact on performance, and it means that we can put more documents in our page cache without running out of room.
And yes, it does reduce the total disk size, but the major thing is the IO cost.
Note that we only support compression for documents, not for indexes. The reason for that is quite simple, for indexes, we are doing a lot of random reads, whereas with documents, we almost always go with the read/write the full thing.
Because of that, we would have needed to break the index apart to manageable chunks (and thus allow random reads), but that would pretty much ensure poor compression ratio. We run some tests, and it just wasn’t worth the effort.
A final thought, this feature is going to be available for RavenDB Enterprise only.
I am not showing any code because the only thing you need to do to get it to work is use:
<add key="Raven/ActiveBundles" value="Compression"/>
And everything works, just a little bit smaller on disk .
Comments
Just a random Idea: what about using the NTFS feature to compress the complete raven-data-directory?
Considering how small documents tend to be, maybe word list compression with shared dictionary would be something to consider.
Just an idea. Drawbacks are obvious.
Very cool.
What might be useful is the ability to selectively choose what type/size of document takes advantage of compression. For very small documents compression/decompression might just be a waste. Do you have a minimum threshold for compression? Is it configurable?
Fabian, That is a separate issue, and happen at a different layer, with different perf profile.
Starfish, That is a bit complex to implement, and we usually see docs at the tens of KB range and up
Rohland, Those things are handled internally
Comment preview