A few posts ago, I talked about the problem of having unstable ids, in particular, ids that can be reused. That leads to quite a lot of complexity, as anyone who ever had to deal with Lucene documents ids knows.
So we are willing to pay something toward stable ids, the questions is what?
One way of doing that is to just store the physical id (unstable) and a virtual id (stable) in a B+Tree (actually, a pair of them, since you’ll need to refer to them back and forth). That means that for the most part, internally to the engine, we’ll use the physical id (with its nice property of having O(1) access time), but externally we’ll expose the stable virtual id (probably sequential numbering, since that is easiest).
Note that I still want to use the physical ids, I’ll discuss exactly why that is important in my next post, for now, let us just say that it is an important component of ensuring high performance for large datasets.
The problem with using B+Tree is that the cost of finding the virtual <—> physical id mapping is O(logN), which for 10 million nodes and 100 million edges is 23 & 24 respectively. Except that this isn’t the real cost function for B+Tree.
Assuming that we have 255 items per page, we actually would need to do 4 page lookups, and a total of 54 comparisons to find the right value. For the edges, we would need 5 page look ups and over 60 comparisons. Note that this isn’t an issue on its own, but it is an issue when we are talking about having this kind of cost in the hot path of the application. And this is very likely going to be in the hot path.
Oh, there are ways around it, we can only translate back and forth at the edges of the database, so internally we’ll always use the physical address, and only translate it out when we are done. But that is hard to actually do properly, since you need the virtual address for a whole lot of stuff all over the place.
We can steal the idea of page translation tables from the processor. Something like this:
Effectively, we’ll lazy allocate segments of pages and pull them together into a hierarchy. So finding out the physical address of id 84 would involve looking at the root, finding the next page down with mod operation, and so forth until we find the right value and check there. This has the advantage of being simple, O(1) and obvious. It is also pretty good in terms of space saving, since the virtual id can be “stored” without taking any space (it is the position of the physical id in the “array” we created.
This has one drawback, there is no way to recover space. Because the indexer into this data structure is meaningful, we can’t just compact things. Once space is allocated, that is it. Now, to be fair, the cost in size here for all 100 million edges is about 0.75 GB, so not meaningful in the long run, but if we have a busy database that always write and delete, we have no way to recover the space.
The “proper” answer, by the way, is to implement an external hash table. That has the property of O(1), can grow and shrink as the amount of data changes. I’m not presenting it here mostly because it is something that we haven’t yet had the need to implement in Voron, so it isn’t something that I can just show and move on. Beside, it is fun to explore all the wrong ways of doing something.
More posts in "Database Building 101" series:
- (25 Aug 2016) Graph querying over large datasets
- (24 Aug 2016) Stable node ids
- (22 Aug 2016) High level graph operations
- (19 Aug 2016) Graphs aren’t toys
- (18 Aug 2016) The cost of graph storage
- (17 Aug 2016) The cost of graphing
- (16 Aug 2016) The storage layer
- (15 Aug 2016) Let us graph this for real