Voron & Graphs
One of our guys is having fun playing with graph databases, and we had a serious discussion on how we can use Voron for that.
No, we don’t have any plans to do a graph database. This is purely one of the guys playing with something that interest him.
For the purpose of this post, I’m only concerned with having the ability to store graph data and read them efficiently. I don’t care for the actual graph operations.
Let us look at the following graph:
Here is how we can define this in code:
var michael = db.CreateNode(); michael["Name"] = "Michael"; var graphs = db.CreateNode(); graphs["Name"] = "Graphs"; var edge = michael.RelatesTo(graphs, db.Relationship("Plays With"));
Now, how would we go about implementing something like this? Well, with Voron, that is pretty easy.
We’ll start with defining a Nodes tree, which is going to using an incremental 64 bits integer for the key, and a JSON object for the value. This means that on CreateNode, we’ll just allocate the id for it, and just have the node itself as a JSON object that can be as complex as you want.
We also have relationships, and here it gets a bit complex, a relationship is always from a node to a node, and it has a specific type. Because the types of relationships tend to be very few, we will limit them to 65,536 relationship types. I think that this would be more than enough. As a result, I can quickly get the id of a relationship type. This leads us to having another tree in Voron, the RelationshipTypes tree, with a key that is the string name of the relationship and the value is just an incremental short. The reason we need to do this will be obvious shortly.
After we have the relationship type, we need to record the actual relationships. That means that we need to consider how we want to record that. Relationships can have their own properties, so the actual relationship is going to be another JSON object as the value in a tree. But what about the key for this tree? The question here is how are we going to work with this? What sort of queries are we going to issue. Obviously, in a graph database, we are going to follow relationships a lot. And the kind of questions we are going to ask are almost always going to be “from node X, find all outgoing relations of type Y”. So we might as well do this properly.
The key for the relations tree would be 18 bytes, the first 8 bytes are the source node id, the next 2 bytes are the relationship type and the last 8 bytes are the destination node id. That means that on the disk, the data is actually sorted first by the node id, then by the relationship type. Which make the kind of queries that I was talking about very natural and fast.
And that is pretty much it. Oh, you’re going to need metadata tree for things like the last relationship type id, and probably other stuff. But that is it, when speaking from the point of view of the storage.
The overall structure is:
Nodes - (Key: Int64, Val: JSON)
RelationshipTypes – (Key: string, Val: Int16)
Relationships ( Key: Int64, Int16, Int64, Val: JSON)
And on top of that you’ll be able to write any sort of graph logic.
Comments
Cool. There is a growing .Net graph community. Frontenac.Blueprints is a graph API based on Blueprints from the Java world, but uses a Gremlinq as the query language rather than Gremlin. As the name suggests, it is based on LINQ.
The guy behind Frontenac is also building an open source graph database based on Esent called Grave.
VelocityDB has a native graph API, and it also supports Frontenac.
CloudGraph.com is supposed to release a .Net graph db later this year.
Would be cool if Voron.Graph would support Frontenac.Blueprints. https://github.com/Loupi/Frontenac
Every time you say "no, we don't have a plan to implement x" a couple months later I see a post saying "so we went ahead and decided to implement x".
Isn't it a little ironic that you're performing normalization on a document database? BTW, how would you handle reverse relationship lookup (find me all 'incoming' nodes that point to a selected node)? A secondary index?
Frontenac.Blueprints looks very cool, I will definitely take a look
Rafal , How is this a normalization? Voron is a key-value store, and Voron.Graph will store two types of values in it.
About reverse relationship lookup - the project is very much in its beginnings, and lots of things are not yet fleshed out, but most likely it will indeed be a secondary index.
Rafal, That isn't normalization. This isn't a relational concept. I don't think that reverse relationship is a good idea, if you need that, store that as a the reverse edges.
Well, maybe it's not a relational normalization, but you split your KV database into 3 separate stores in order to reduce redundancy, and you introduce a composite key to simulate table columns. Not a relational concept?
Rafal, Not, it isn't a relational concept. It looks the same because both RDMBS and Voron rely on B+Trees.
Hey Michael,
How will you do to make the traverse? I already tried to build a graph DB using Raven, even Ayende some time ago :), but when it comes to navigate through the nodes and maintain the performance, I failed. What I really wanted to do is to allow relationships between documents, so we can have the goods of the two worlds.
I am playing a lot with neo4j and its cypher api is great. So, for curiosity, how could you do the following, for example, find the friends of my friends that are my friends?
Daniel, Traversal of that is done by looking at an initial node, finding all the edges of a particular type, following to the related nodes, etc... That is pretty fast to do in this model.
Low level operation for friends of my friends who aren't my friends (note that there should be API to cover that for you...). But the idea is this:
var node = graphDb.Get(initialNodeId); var edges = graphDB.EdgesFor(node, "Friends"); var friends = edges.Select(x=>x.To).ToHashSet();
var friendsOfMyFriends = new HashSet<string>();
foreach(var f in friends) { var edges = graphDb.EdgesFor(f, "Friends"); foreach(var friendOfFriend in edges.Select(x=>x.To)) { if(friends.Contains(friendOfFriend)) continue; friendsOfMyFriends.Add(friendOfFriend); } }
Also note that this assumes that those are NOT network calls, and that the DB optimize itself based on usage. This will result in a pretty fast operation overall.
Daniel, Traversal will work pretty much as Ayende has described, using standard BFS/DFS. In my implementation I've also added an ability to filter traversal by edge type. (for example you can look here https://github.com/myarichuk/Voron.Graph/blob/master/Voron.Graph/Algorithms/Search/BreadthFirstSearch.cs)
The speed here comes from the way it is modeled and from using Voron storage engine - its operations have very low overhead and Voron in general is very fast.
Comment preview