Voron & Graphs

time to read 4 min | 660 words

One of our guys is having fun playing with graph databases, and we had a serious discussion on how we can use Voron for that.

No, we don’t have any plans to do a graph database. This is purely one of the guys playing with something that interest him.

For the purpose of this post, I’m only concerned with having the ability to store graph data and read them efficiently. I don’t care for the actual graph operations.

Let us look at the following graph:

Here is how we can define this in code:

var michael = db.CreateNode();
michael["Name"] = "Michael";

var graphs = db.CreateNode();
graphs["Name"] = "Graphs";

var edge = michael.RelatesTo(graphs, db.Relationship("Plays With"));

Now, how would we go about implementing something like this? Well, with Voron, that is pretty easy.

We’ll start with defining a Nodes tree, which is going to using an incremental 64 bits integer for the key, and a JSON object for the value. This means that on CreateNode, we’ll just allocate the id for it, and just have the node itself as a JSON object that can be as complex as you want.

We also have relationships, and here it gets a bit complex, a relationship is always from a node to a node, and it has a specific type. Because the types of relationships tend to be very few, we will limit them to 65,536 relationship types. I think that this would be more than enough. As a result, I can quickly get the id of a relationship type. This leads us to having another tree in Voron, the RelationshipTypes tree, with a key that is the string name of the relationship and the value is just an incremental short. The reason we need to do this will be obvious shortly.

After we have the relationship type, we need to record the actual relationships. That means that we need to consider how we want to record that. Relationships can have their own properties, so the actual relationship is going to be another JSON object as the value in a tree. But what about the key for this tree? The question here is how are we going to work with this? What sort of queries are we going to issue. Obviously, in a graph database, we are going to follow relationships a lot. And the kind of questions we are going to ask are almost always going to be “from node X, find all outgoing relations of type Y”. So we might as well do this properly.

The key for the relations tree would be 18 bytes, the first 8 bytes are the source node id, the next 2 bytes are the relationship type and the last 8 bytes are the destination node id. That means that on the disk, the data is actually sorted first by the node id, then by the relationship type. Which make the kind of queries that I was talking about very natural and fast.

And that is pretty much it. Oh, you’re going to need metadata tree for things like the last relationship type id, and probably other stuff. But that is it, when speaking from the point of view of the storage.

The overall structure is:

Nodes - (Key: Int64, Val: JSON)

RelationshipTypes – (Key: string, Val: Int16)

Relationships ( Key: Int64, Int16, Int64, Val: JSON)

And on top of that you’ll be able to write any sort of graph logic.

Tweet Share Share 11 comments

Tags:

raven

Comments

22 May 2014
16:40 PM

Vidar

Cool. There is a growing .Net graph community. Frontenac.Blueprints is a graph API based on Blueprints from the Java world, but uses a Gremlinq as the query language rather than Gremlin. As the name suggests, it is based on LINQ.

The guy behind Frontenac is also building an open source graph database based on Esent called Grave.

VelocityDB has a native graph API, and it also supports Frontenac.

CloudGraph.com is supposed to release a .Net graph db later this year.

Would be cool if Voron.Graph would support Frontenac.Blueprints. https://github.com/Loupi/Frontenac

22 May 2014
19:29 PM

Gilligan

Every time you say "no, we don't have a plan to implement x" a couple months later I see a post saying "so we went ahead and decided to implement x".

22 May 2014
21:35 PM

Rafal

Isn't it a little ironic that you're performing normalization on a document database? BTW, how would you handle reverse relationship lookup (find me all 'incoming' nodes that point to a selected node)? A secondary index?

23 May 2014
07:48 AM

Michael Yarichuk

Frontenac.Blueprints looks very cool, I will definitely take a look

23 May 2014
08:02 AM

Michael Yarichuk

Rafal , How is this a normalization? Voron is a key-value store, and Voron.Graph will store two types of values in it.

About reverse relationship lookup - the project is very much in its beginnings, and lots of things are not yet fleshed out, but most likely it will indeed be a secondary index.

23 May 2014
08:44 AM

Ayende Rahien

Rafal, That isn't normalization. This isn't a relational concept. I don't think that reverse relationship is a good idea, if you need that, store that as a the reverse edges.

23 May 2014
09:41 AM

Rafal

Well, maybe it's not a relational normalization, but you split your KV database into 3 separate stores in order to reduce redundancy, and you introduce a composite key to simulate table columns. Not a relational concept?

23 May 2014
10:47 AM

Ayende Rahien

Rafal, Not, it isn't a relational concept. It looks the same because both RDMBS and Voron rely on B+Trees.

27 May 2014
03:19 AM

Daniel Ferreira Monteiro Alves

Hey Michael,

How will you do to make the traverse? I already tried to build a graph DB using Raven, even Ayende some time ago :), but when it comes to navigate through the nodes and maintain the performance, I failed. What I really wanted to do is to allow relationships between documents, so we can have the goods of the two worlds.

I am playing a lot with neo4j and its cypher api is great. So, for curiosity, how could you do the following, for example, find the friends of my friends that are my friends?

27 May 2014
11:26 AM

Ayende Rahien

Daniel, Traversal of that is done by looking at an initial node, finding all the edges of a particular type, following to the related nodes, etc... That is pretty fast to do in this model.

Low level operation for friends of my friends who aren't my friends (note that there should be API to cover that for you...). But the idea is this:

var node = graphDb.Get(initialNodeId); var edges = graphDB.EdgesFor(node, "Friends"); var friends = edges.Select(x=>x.To).ToHashSet();

var friendsOfMyFriends = new HashSet<string>();

foreach(var f in friends) { var edges = graphDb.EdgesFor(f, "Friends"); foreach(var friendOfFriend in edges.Select(x=>x.To)) { if(friends.Contains(friendOfFriend)) continue; friendsOfMyFriends.Add(friendOfFriend); } }

Also note that this assumes that those are NOT network calls, and that the DB optimize itself based on usage. This will result in a pretty fast operation overall.

27 May 2014
20:51 PM

Michael Yarichuk

Daniel, Traversal will work pretty much as Ayende has described, using standard BFS/DFS. In my implementation I've also added an ability to filter traversal by edge type. (for example you can look here https://github.com/myarichuk/Voron.Graph/blob/master/Voron.Graph/Algorithms/Search/BreadthFirstSearch.cs)

The speed here comes from the way it is modeled and from using Voron storage engine - its operations have very low overhead and Voron in general is very fast.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Voron & Graphs

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication