The importance of a data formatPart V – The end result
So far I have written about the problem we had, the requirement for the solution, then did a deep dive into the actual implementation and finally I talked about the improvement in performance, which was a nice double digits improvements in percentage. There are a couple of tricks there that I still want to talk about, but that is pretty much it.
Except that this actually misses the entire point of this exercise. What the blittable format gives us is immediate access to any property without the need to parse the whole thing. How important is that for us?
Well, for one specific scenario, that is actually quite important. Let us imagine that we have the following RavenDB index:
from c in docs.Companies select new { c.Name, c.Overview }
Here are the results, counting purely the time to load about 18,000 documents and get the relevant data:
I removed all I/O from the benchmark, and I'm testing only the cost of loading documents already saved in RavenDB and getting those properties from them.
And that is what I'm talking about .
More posts in "The importance of a data format" series:
- (25 Jan 2016) Part VII–Final benchmarks
- (15 Jan 2016) Part VI – When two orders of magnitude aren't enough
- (13 Jan 2016) Part V – The end result
- (12 Jan 2016) Part IV – Benchmarking the solution
- (11 Jan 2016) Part III – The solution
- (08 Jan 2016) Part II–The environment matters
- (07 Jan 2016) Part I – Current state problems
Comments
Is this already part of RavenDB or is it yet to come in some of the future versions?
Dejan, those are changes that we are doing for ravendb 4.0. We are very early stages yet
Very interesting series and work you are doing here - really appreciate how you document and walk through things like this. With regards to naming the format as comments from your earlier posts how about the "Odin" format?
From Wikipedia "The Prose Edda explains that Odin is referred to as "raven-god" due to his association with Huginn and Muninn. In the Prose Edda and the Third Grammatical Treatise, the two ravens are described as perching on Odin's shoulders." Look forward to seeing how the new format progresses in Raven.
Dap, Odin format, I like that. Certainly a possibiliity
Stunning. How did the memory performed? Can you say something about that?
Edward, I'm not sure that I understand the question. By "the memory" do you mean the blit format?
This blit format wil only work with Voron, since with Esent you do not have a memory mapped file?
// Ryan
Ryan, Yes, with 4.0, we are going to be working only on Voron. We actually can use blittable in Esent, we'll just need to read the full document to memory (vs. use the already mmap data)
I mean the memory cost of accessing those properties blit vs json.
ps I like the name Blit
Edward, There is no memory cost. That is pretty much the point. We don't need to materialize anything here. When you actually need to read the values, then we have to actually generate the string / value. But that is only at the end, and we'll have code that will make sure that we can go from the document to Lucene with minimal costs
It seems it would be good for the community if we could nuget install this as a json reader/writer. I am running on some .NET code which could use this. It is on mono/unix though, where I can't use RavenDB.
Frank, This actually run on Linux, but it isn't really suitable for general purpose consumption. It is meant to be used in the RavenDB internal use cases
Comment preview