Dealing with large documents (100+ MB)

time to read 2 min | 376 words

RavenDB can handle large documents. There actually is a limit to the size of a document in RavenDB, but given that it is 2 GB in size, that isn’t a practical one. I actually had to go and check if the limit was 2 or 4 GB, because it doesn’t actually matter.

That said, having large documents is something that you should be wary of. They work, but they have very high costs.

I have run some benchmarks on the topic a while ago, and the results are interesting. Let’s consider a 100MB document. Parsing time for that should be around 4 – 5 seconds. That ignores the fact that there are also memory costs. For example, you can have a JSON documents that is parsed to 50(!) time the size of the raw text. That is 5GB of memory to handle a single 100MB document. That is just the parsing cost. But there are others. Reading a 100MB from most disks will take about a second, assuming that the data is sequential. Assuming you have 1Gbits/S network, all of which is dedicated to this single document, you can push that to the network in 800 ms or so.

Dealing with such documents is hard and awkward, if you accidently issue a query on a bunch of those documents and get 25 of them page, you just got a query that is 2.5 GB in size.  With documents of this size, you are also likely to want to modify multiple pieces at the same time, so you’ll need to be very careful about concurrency control as well.

In general, at those sizes, you stop threating this as a simple document and move to a streaming approach, because anything else doesn’t make much sense, it is too costly.

A better alternative is to split this document up to its component parts. You can then interact with each one of them on an independent basis.

It is the difference between driving an 18 wheeler and driving family cars. You can pack a whole lot more on the 18 wheeler truck, but it got a pretty poor mileage and it is very awkward to park. You aren’t going to want to use that for going to the grocery store.