Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,125 | Comments: 45,486

filter by tags archive

Awesome RavenDB feature of the Day, Compression

time to read 2 min | 387 words

RavenDB stores JSON documents. Internally on disk, we actually store the values in BSON format. This works great, but there are occasions where users are storing large documents in RavenDB.

In those cases, we have found that compressing those documents can drastically reduce the on-disk size of the documents.

Before we go on, we have to explain what this is for. It isn’t actually disk space that we are trying to save, although that is a nice benefit. What we are actually trying to do is reduce the IO cost that we have when loading / saving documents. By compressing the documents before they hit the disk, we can save in valuable IO time (at the expense of using relatively bountiful CPU time). Reducing the amount of IO we use have a nice impact on performance, and it means that we can put more documents in our page cache without running out of room.

And yes, it does reduce the total disk size, but the major thing is the IO cost.

Note that we only support compression for documents, not for indexes. The reason for that is quite simple, for indexes, we are doing a lot of random reads, whereas with documents, we almost always go with the read/write the full thing.

Because of that, we would have needed to break the index apart to manageable chunks (and thus allow random reads), but that would pretty much ensure poor compression ratio. We run some tests, and it just wasn’t worth the effort.

A final thought, this feature is going to be available for RavenDB Enterprise only.

I am not showing any code because the only thing you need to do to get it to work is use:

<add key="Raven/ActiveBundles" value="Compression"/>

And everything works, just a little bit smaller on disk Smile.


Comments

Fabian Wetzel

Just a random Idea: what about using the NTFS feature to compress the complete raven-data-directory?

Starfish

Considering how small documents tend to be, maybe word list compression with shared dictionary would be something to consider.

Just an idea. Drawbacks are obvious.

Rohland

Very cool.

What might be useful is the ability to selectively choose what type/size of document takes advantage of compression. For very small documents compression/decompression might just be a waste. Do you have a minimum threshold for compression? Is it configurable?

Ayende Rahien

Fabian, That is a separate issue, and happen at a different layer, with different perf profile.

Ayende Rahien

Starfish, That is a bit complex to implement, and we usually see docs at the tens of KB range and up

Ayende Rahien

Rohland, Those things are handled internally

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. The design of RavenDB 4.0: Physically segregating collections - one day from now
  2. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - about one day from now
  3. RavenDB 3.5 whirl wind tour: I'll have the 3+1 goodies to go, please - 5 days from now
  4. The design of RavenDB 4.0: Voron has a one track mind - 6 days from now
  5. RavenDB 3.5 whirl wind tour: Digging deep into the internals - 7 days from now

And 12 more posts are pending...

There are posts all the way to May 30, 2016

RECENT SERIES

  1. The design of RavenDB 4.0 (14):
    03 May 2016 - Making Lucene reliable
  2. RavenDB 3.5 whirl wind tour (14):
    04 May 2016 - I’ll find who is taking my I/O bandwidth and they SHALL pay
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats