Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,131 | Comments: 45,567

filter by tags archive

Some thoughts about compression & storage

time to read 2 min | 317 words

One of the advantages that keeps showing up with leveldb is the notion that it compresses the data on disk by default. Since reading data from disk is way more expensive than the CPU cost of compression & decompression, that is a net benefit.

Or is it? In the managed implementation we are currently working on, we chose to avoid this for now. For a very simple reason. By storing the compressed data on disk, it means that you cannot just give a user the memory mapped buffer and be done with it, you actually have to decompress the data yourself, then hand the user the buffer to the decompressed memory. In other words, instead of having a single read only buffer that the OS will manage / page / optimize for you, you are going to have to allocate memory over & over again, and you’ll pay the cost of decompressing again and again.

I think that it would be better to have the client make that decision. They can send us data that is already compressed, so we won’t need to do anything else, and we would still be able to just hand them a buffer of data. Sure, it sounds like we are just moving the cost around, isn’t it? But what actually happens is that you have a better chance to do optimizations. For example, if I am storing the data compressing via gzip. And I’m exposing the data over the wire, I can just stream the results from the storage directly to the HTTP stream, without having to do anything about it. It can be decompressed on the client.

On the other hand, if I have storage level decompression, I am paying for the cost of reading the compressed data from disk, then allocating new buffer, decompressing the data, then going right ahead and compressing it again for sending over the wire.



This is good thinking Oren.

Catalin Pop

What about compression at file system level? How will that affect performance, you can still use memory mapped files yet also get compression. (Of course you loose the ability to stream compressed documents)

http://en.wikipedia.org/wiki/DriveSpace ;)

Ayende Rahien

Catalin, I am not sure here, I am pretty sure that this would be something that would require a lot of perf testing.


Hope all the client (versions) would be using the same compression type/level/etc

Ayende Rahien

Chris, Why? Each client can choose their own compression algorithm.


Sure, they can. Would they be able to read data stored by other clients in compressed format if their compression format choice is different?

Ayende Rahien

Chris, You seem to be thinking about this in the form of a DB that is available for many clients. Currently I am talking about this in the context of an embedded library that you use in your project. There is one client there, the code that is using it, and it is fit for purpose.

Daniel Lang

Interesting. How would you handle indexing then?

Ayende Rahien

Daniel, We do compression in RavenDB above the storage layer right now.

Daniel Lang

I know, that's why I'm confused about that. If the compression is to be done on the client, how could the server index documents if it's not able to read the content? Obviously, I must have missed something, right?

Ayende Rahien

Daniel, We are talking about different clients. I am talking about the client of the storage layer, you are talking about the db client.

Daniel Lang

Oh, thanks. Now it makes sense.

Michael R. Schmidt

Another possible advantage to compression on the client is that the data is compressed over the wire. Smaller data transfer over the network.

Comment preview

Comments have been closed on this topic.


  1. RavenDB Conference 2016–Slides - 11 hours from now
  2. Proposed solution to the low level interview question - about one day from now

There are posts all the way to Jun 02, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats