Database Building 101The cost of graph storage
Like any storage system, Voron needs to store some metadata about our data, and sometimes this can be very surprising to people.
Let’s look at each of the items that we store in turn.
- Node data is stored in a table.
- Edge data is stored in a table.
- The edge itself is stored in a B+Tree containing fixed size trees.
A table does a bunch of stuff, including reserving some space on the disk, but we don’t have dynamic tables here, so those costs are relatively fixed.
The cost per item, however, depends on the size of the data. If the data size is less than 2036 bytes, then the cost of storing the data is a 4 bytes overhead. If, however, the size of the data item is higher than 2036, we round it up to 4KB section.
In other words, storing ten million nodes, which measure 1KB in size, will cost us about 40 MB of overhead (compared to roughly 10 GB of data). However, if the size of the data is 2KB, we need to store them in a full page. The reason for this, by the way, is that we need to balance the cost of insert and the cost of update. So we only modify things on page boundary (in this case, 4KB). If the value is small, we pack multiples of them in a single page, but beyond a certain size, we just allocate dedicated pages for them, and live with a bit of free space in the end.
More interesting is the storage of the edge data, actually. A B+Tree costs a minimum of 4KB, and we have one of these per each of the edge types that we have. In practice, we don’t expect there to be a high number of edge types, and we can readily ignore this as fixed size costs. In most cases, I would be stunned to hear that there is more than a single 4KB page for all your edges types (should be enough for a hundred or so).
What isn’t fixed size is the number of fixed size tree (one per source node) and the number of entries in the fixed size trees (one per connection). The whole reason we have fixed size trees is that they allow us to make efficient use of storage by making assumptions. You can see this in their usage. A fixed size tree has an int64 as the key, and you need to specify upfront how much space you need for the values. That makes it very simple to write.
Fixed size trees actually have two forms, they can be embedded or they can be free floating. That mostly depends on their size. If they are embedded, they are stored inside the parent tree, but if they are too large, we store them in their own page. Embedded usage takes 6 bytes per fixed size tree, we have 8 bytes for the key, and the entry size itself (which in our case is also 8 bytes). So a total of 16 bytes per entry inside the fixed size tree.
What this means, in practice, is that up until the moment a node has more than 254 connections, it can be stored as embedded value. When it goes beyond that, we’ll spill over to a full page and start taking space at 4KB increments.
One thing that I didn’t mention is that we store the fixed size trees (keyed by their source node ID), inside a parent B+Tree. Here we need to track a lot more information (keys and values have different sizes, etc). The overhead cost per entry in a B+Tree is 14 bytes. Add to that the 8 bytes for the source node id, and it comes to 22 bytes per source node.
Given all of those numbers, if we had a graph with 10 million nodes and each node was connected to a 10 other nodes in average, and each node/edge was 0.5KB in size, we would have:
- 5 GB – Nodes data – 10,000,000
- 50 GB – Edges data – 100,000,000
- 80 MB – overhead data for nodes & edges.
- 1.75 GB – edges information about all nodes.
Note that in such a graph, we have 10 million nodes, but a hundred million edges. All of which can fit comfortably into RAM on a good server, and give you blazing good performance.