Graphs in RavenDBRecursive queries
Graph queries as I discussed them so far gives you the ability to search for patterns. On the right, you can see the family tree of the royal family of Great Britain going back a few hundred years. That make for an interesting subject for practicing graph queries.
A good example we might want to ask is who is the royal grand parent of Elizabeth II. We can do that using:
This is great, and nicely demonstrate how we can scan for specific patterns in the graph. However, it is limited by its rigidity. For example, let’s say that I want to find someone in the family tree and I’m not sure about the exact nature of the relationship?
“We are not amused” comes to mind, but off the top of my head and without consulting the chart, I don’t think that I would be able to figure it out. Luckily, I don’t have to, I can ask RavenDB to be so kind and tell me.
Note the use of the recursive element here. We are asking RavenDB to start in a particular document and go up the parents, trying to find an unamused royal. The recursion portion of the query can be zero to six steps in size and should abort as soon as we have any match. Following the zero to six parents, there should be a parent that is both a royal an unamused.
The Cypher syntax for what they call variable length queries is reminiscent of regular expressions, and I don’t mean that in a complimentary manner. Looking at the query above, you might have noticed that there is a distinct difference between it and the first one. The recursive query will go up the Parents link, regardless of whatever that parent is royal or not. RavenDB Graph Queries has what I believe to be a unique feature. The recursive pattern isn’t limited to a single step and can be as complex as you like.
For example, let’s ensure that we are only going to go up the chain of the royal parents.
The recursive element has a few knows that you can tweak. The minimum and maximum distance, for example, are obvious examples, but the results criteria for the recursion is also interesting. In this query, we use the shortest, instead of the lazy. This will make RavenDB work a bit harder and find the shortest recursive path that matches the query, where as lazy stops on the first one that matches. The following options are available:
- Lazy – stop on the first pattern that matches. Good for: “Am I related to Victoria?”
- Shortest – find the shortest path that match the pattern. Good for: “How am I related to Victoria?”
- Longest – find the longest path that match the pattern. Good for: “For how many generations has Victoria’s family been royals?”
- All – find all the paths that match the pattern. Good for if you have multiple paths in your ancestry to Victoria.
More posts in "Graphs in RavenDB" series:
- (08 Nov 2018) Real world use cases
- (01 Nov 2018) Recursive queries
- (31 Oct 2018) Inconsistency abhorrence
- (29 Oct 2018) Selecting the syntax
- (26 Oct 2018) What’s the role of the middle man?
- (25 Oct 2018) I didn’t mean to build this feature!
- (22 Oct 2018) Query results
- (21 Sep 2018) Graph modeling vs. document modeling
- (20 Sep 2018) Pre-processing the queries
- (19 Sep 2018) The query language
- (18 Sep 2018) The overall design
Comments
Awesome!. This graph query language is a huge step forward for RavenDB!. But be aware, some people will abuse it. You can write so complex query that it would be slow. It happens on relational database systems, you are not alone.
Jesús, Oh, I'm well aware that it will be abused. I'm afraid that there isn't much that we can do about it. Placing limits to prevent abuse is going to hurt legitimate uses. We are working on exposing what is going on so a user can understand what is costing them, though.
I'm not really into graphs but this gets my attention because of similarity to relational db joins. SQL doesnt have recursion (which i dont condemn) and so there's always some arbitrary limit on how many times you can traverse the same join. But since you're free to decide about Raven query language, maybe you could introduce some concepts (like a transitive closure for reachability tests) without uglifying them with some arbitrary limits like the recursion depth. And then have some simple syntax for that, for example Elizabeth -> Parents for actual parents and Elizabeth ==> Parents for all ancestors
Rafal, The problem here is that you need to have these limits, otherwise you might be force to scan a LOT of data. Six degrees of separation holds, for example, and not specifying a limit on the graph may force a LOT of traversals.
thats true, but in this case the limit is arbitrary for performance reasons and transitive closure is one of ways you can check reachability without that limit - provided that you pre-compute the closure and save it in your database. Which might make sense because you'll be processing the data anyway.
Comment preview