The importance of a data format: Part V – The end result

Jan 13 2016

The importance of a data formatPart V – The end result

time to read 2 min | 226 words

So far I have written about the problem we had, the requirement for the solution, then did a deep dive into the actual implementation and finally I talked about the improvement in performance, which was a nice double digits improvements in percentage. There are a couple of tricks there that I still want to talk about, but that is pretty much it.

Except that this actually misses the entire point of this exercise. What the blittable format gives us is immediate access to any property without the need to parse the whole thing. How important is that for us?

Well, for one specific scenario, that is actually quite important. Let us imagine that we have the following RavenDB index:

from c in docs.Companies
select new
{
	c.Name,
	c.Overview
}

Here are the results, counting purely the time to load about 18,000 documents and get the relevant data:

I removed all I/O from the benchmark, and I'm testing only the cost of loading documents already saved in RavenDB and getting those properties from them.

And that is what I'm talking about Smile .

Tweet Share Share 12 comments

Tags:

More posts in "The importance of a data format" series:

(25 Jan 2016) Part VII–Final benchmarks
(15 Jan 2016) Part VI – When two orders of magnitude aren't enough
(13 Jan 2016) Part V – The end result
(12 Jan 2016) Part IV – Benchmarking the solution
(11 Jan 2016) Part III – The solution
(08 Jan 2016) Part II–The environment matters
(07 Jan 2016) Part I – Current state problems

Comments

13 Jan 2016
10:44 AM

Dejan

Is this already part of RavenDB or is it yet to come in some of the future versions?

13 Jan 2016
11:15 AM

Oren Eini

Dejan, those are changes that we are doing for ravendb 4.0. We are very early stages yet

13 Jan 2016
15:07 PM

dap_infinity

Very interesting series and work you are doing here - really appreciate how you document and walk through things like this. With regards to naming the format as comments from your earlier posts how about the "Odin" format?

From Wikipedia "The Prose Edda explains that Odin is referred to as "raven-god" due to his association with Huginn and Muninn. In the Prose Edda and the Third Grammatical Treatise, the two ravens are described as perching on Odin's shoulders." Look forward to seeing how the new format progresses in Raven.

13 Jan 2016
15:59 PM

Oren Eini

Dap, Odin format, I like that. Certainly a possibiliity

14 Jan 2016
11:57 AM

Edward

Stunning. How did the memory performed? Can you say something about that?

14 Jan 2016
13:40 PM

Oren Eini

Edward, I'm not sure that I understand the question. By "the memory" do you mean the blit format?

14 Jan 2016
14:40 PM

Ryan Heath

This blit format wil only work with Voron, since with Esent you do not have a memory mapped file?

// Ryan

14 Jan 2016
14:59 PM

Oren Eini

Ryan, Yes, with 4.0, we are going to be working only on Voron. We actually can use blittable in Esent, we'll just need to read the full document to memory (vs. use the already mmap data)

14 Jan 2016
15:54 PM

Edward

I mean the memory cost of accessing those properties blit vs json.

ps I like the name Blit

14 Jan 2016
21:35 PM

Oren Eini

Edward, There is no memory cost. That is pretty much the point. We don't need to materialize anything here. When you actually need to read the values, then we have to actually generate the string / value. But that is only at the end, and we'll have code that will make sure that we can go from the document to Lucene with minimal costs

21 Jan 2016
11:18 AM

Frank S

It seems it would be good for the community if we could nuget install this as a json reader/writer. I am running on some .NET code which could use this. It is on mono/unix though, where I can't use RavenDB.

21 Jan 2016
21:21 PM

Oren Eini

Frank, This actually run on Linux, but it isn't really suitable for general purpose consumption. It is meant to be used in the RavenDB internal use cases

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB