Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,738 | Comments: 48,777

filter by tags archive

RavenDB Node.JS client updated (now supporting subscription)

time to read 1 min | 85 words

The node.js RavenDB client had a milestone release last month, which I only now got to talking about. The most important factor about this release is that now node.js users can use subscriptions to process data from RavenDB.

Here is how you can do this:

This is one of the most powerful abilities RavenDB have, you will continuously get new orders in your script and be able to process them. This make it very easy to build features such as data pipelines, background processing, etc.

Graphs in RavenDBReal world use cases

time to read 6 min | 1032 words

imageI talked a lot about how graph queries in RavenDB will work, but one missing piece of the puzzle is how they are going to be used. I’m going to use this post to discuss some of the options that the new features enables. We’ll take the following simple model, issue tracking. Let’s imagine that a big (secret) project is going on and we need to track down the tasks for it. On the right, you have an image of the permissions graph that we are interested in.

The permission model is pretty standard I think. We have the notion of users and groups. A user can be associated to one or more groups. Groups memberships are hierarchical. An issue’s access is controlled either by giving access to a specific users or to a group. The act of assigning a group will also allow access to all the group’s parents and any user that is associated to any of them.

Here is the issue in question:

image

Given the graph on the right, let’s see, for each user, how we can find out what issues they have access to.

We’ll start with Sunny, who is named directly in the issue as allowed access. Here is the query to find the issues on which we are directly named.

image

This was easy, I must admit. We could also write this without any need for graph queries, because it is so simple:

image

It gets a bit trickier when we have to look at Max. Unlike Sunny, Max isn’t directly named. Instead, he is a member in a group that is directly named in the issue. Let’s see how we can query on issues that Max has access to:

image

This is where things gets start to get interesting. If you’ll look into the match clause, you’ll see that we have arrows that go both left and right. We are telling RavenDB that we want to find issues that have the same group (denoted as g in the query) as the user. This kind of query you already can’t express with RavenDB right now without the graph query syntax. Another way to write this query is to use explicit clauses, like so:

image

In this case, all the arrows go in a single direction, but we have two clauses that are being and together. These two queries are exactly the same, in fact, RavenDB will translate the one with arrows going both ways to the one with the and.

The key here is that between any two clauses with an and, RavenDB will match the same alias to the same value.

So we now can find issues for Max and Sunny, but what about all the rest of the people? We can, of course, do this manually. Here is how we can find issues for people a group removed from the issue.

image

This query gives us all the issues for Nati that are one group removed from him. That works, but it isn’t how we want to do things. It is a good exercise in setting out the structure, though. Because instead of hard coding the pattern, we are now going to use recursion.

image

The key for this query is the recursive element in the third line. We moved from the issue to its groups, and then we recurse from the issues’ groups to their parents. We allow empty recursion and we follow all the paths in the graph.

On the other side, we go from the user and try to find a match from the user’s group to any of the groups that end the query. In Nati’s case, we went from project-x group to team-nati, which is a match, so we can return this issue.

Here is the final query that we can use, it is a bit much, even if it is short, so I will deconstruct it below:

image

We use a parameterized query here, to make things easier. We start from a user and find an issue if:

  • The issue directly name the user as allowed access. This is the case for Sunny.
  • The issue’s groups (or their parents) match the groups that the user belong to (non recursively, mind).

In the case of Max, for example, we have a match because the recursive element allows a zero length path, so the project-x group is allowed access and Max is a member of it.

In the case of Nati, on the other hand, we have to go from project-x to team-nati to get a match.

If we’ll set $uid to users/23, which is Pheobe, all the way to the left in the graph, we’ll also have a match. We’ll go from project-x to execs to board and then find a match.

Snoopy, on the other hand, doesn’t have access. Look carefully at the direction of the arrows in the graph. Snoopy belongs to the r-n-d group, and that groups is a child of exces, but the query we specified only go up to the parents, not the children, so Snoopy is not included.

I hope that this post gave you some ideas about what kind of use cases are enabled by graph queries in RavenDB. I would love to hear about any scenarios you have for graph queries so we can play with them and see how they are answered by RavenDB.

Graphs in RavenDBRecursive queries

time to read 3 min | 565 words

imageGraph queries as I discussed them so far gives you the ability to search for patterns. On the right, you can see the family tree of the royal family of Great Britain going back a few hundred years. That make for an interesting subject for practicing graph queries.

A good example we might want to ask is who is the royal grand parent of Elizabeth II. We can do that using:

image

This is great, and nicely demonstrate how we can scan for specific patterns in the graph. However, it is limited by its rigidity. For example, let’s say that I want to find someone in the family tree and I’m not sure about the exact nature of the relationship?

“We are not amused” comes to mind, but off the top of my head and without consulting the chart, I don’t think that I would be able to figure it out. Luckily, I don’t have to, I can ask RavenDB to be so kind and tell me.

image

Note the use of the recursive element here. We are asking RavenDB to start in a particular document and go up the parents, trying to find an unamused royal. The recursion portion of the query can be zero to six steps in size and should abort as soon as we have any match. Following the zero to six parents, there should be a parent that is both a royal an unamused.

The Cypher syntax for what they call variable length queries is reminiscent of regular expressions, and I don’t mean that in a complimentary manner. Looking at the query above, you might have noticed that there is a distinct difference between it and the first one. The recursive query will go up the Parents link, regardless of whatever that parent is royal or not. RavenDB Graph Queries has what I believe to be a unique feature. The recursive pattern isn’t limited to a single step and can be as complex as you like.

For example, let’s ensure that we are only going to go up the chain of the royal parents.

image

The recursive element has a few knows that you can tweak. The minimum and maximum distance, for example, are obvious examples, but the results criteria for the recursion is also interesting. In this query, we use the shortest, instead of the lazy. This will make RavenDB work a bit harder and find the shortest recursive path that matches the query, where as lazy stops on the first one that matches. The following options are available:

  • Lazy – stop on the first pattern that matches. Good for: “Am I related to Victoria?”
  • Shortest – find the shortest path that match the pattern. Good for: “How am I related to Victoria?”
  • Longest – find the longest path that match the pattern. Good for: “For how many generations has Victoria’s family been royals?”
  • All – find all the paths that match the pattern. Good for if you have multiple paths in your ancestry to Victoria.

Graphs in RavenDBInconsistency abhorrence

time to read 3 min | 462 words

In my previous post, I discussed some options for changing the syntax of graph queries in RavenDB from Cypher to be more in line with the rest of the RavenDB Query Language. We have now completed that part and can see the real impact it has on the overall design.

In one of the design review, one of the devs (who have built non trivial applications using Neo4J) complained that the syntax is now much longer. Here are the before and after queries to compare:

image

The key, from my perspective, is that the new form is more explicit and easier to read after the fact. Queries tend to grow more complex over time, and they are being read a lot more often than written). As such, I absolutely want to lean toward being readable over being terse.

The example above just show the extra characters that you need to write. Let’s talk about something that is a bit more complex:

image

Now we have a lot more text, but it is a lot easier to understand what is going on. Focus especially on the Lines edge, where we can very clearly separate what constitute the selection on the edge, the filter on the edge and what is the property that contains the actual linked document id.

The end result is that we now have a syntax that is a lot more consistent and approachable. There are other benefits, but I’ll show them off in the next post.

A major source of annoyance for me with this syntax was how to allow anonymous aliases. In the Cypher syntax we used, you could do something like:

image

There is a problem with how to express this kind of syntax of anonymous aliases with the Collection as alias mode. I initially tried to make it work by saying that we’ll look at the rest of the query and figure it out. But that just felt wrong. I didn’t like this inconsistency. I want a parse tree that I can look at in isolation and know what is going on. Simplifying the language is something that pays dividends over time, so I eventually decided that the query above will look this with the next syntax:

image

There is a lot of precedence of using underscore as the “I don’t care” marker, so that works nice and resolves any ambiguities in the syntax.

Graphs in RavenDBSelecting the syntax

time to read 5 min | 981 words

When we started building support for graph queries inside RavenDB, we looked at what is the state of the market in this regard. There seems to be two major options: Cypher and Gremlins. Gremlins is basically a fluent interface that represent a specific graph pattern while Cypher is a more abstract manner to represent the graph query. I don’t like Gremlins, and it doesn’t fit into the model we have for RQL, so we went for the Cypher syntax. Note the distinction between went for Cypher and went for Cypher syntax.

One of the major requirements that we have is fitting in into the pre-existing Raven Query Language, but the first concern we had was just getting started and getting some idea about our actual scenarios. We are now at the point where we have written a bunch of graph queries and got a lot more experience in how it mesh into the overall environment. And at this point, I can really feel that there is an issue in meshing Cypher syntax into RQL. They don’t feel the same at all. There are a lot of good ideas there, make no mistake, but we want to create something that would flow as a cohesive whole.

Let’s look at some of our queries and how we can better express them. The one I talked to about the most is this:

image

Let see what we have here:

  • match is the overall clause that apply a graph pattern query to the dataset.
  • () – is an indication of a node in the graph.
  • [] – is an indication of an edge.
  • a:Dogs, l:Likes and b:Dogs – this is an alias and a path specification.
  • -[]-> – is an indication of an edge between two nodes
  • (expression) – is a filter on a node or an edge

I’m ignoring the select statement here because it is just the usual RQL select statement.

The first thing that keeps biting us is the filter in (a:Dogs (id() = 'dogs/arava')), I keep being tripped by missing the closing ), so that has got to go. Luckily, is it very obvious what to do here:

image

We use an explicit where clause, instead of the () to express the inline filter. This fits a lot more closely with how the rest of RQL works.

Now, let’s look at the aliases: (b:Dogs). The alias:Collection syntax is pretty foreign to RQL, we tend to use the Collection as alias syntax. Let’s see how that would look like, shall we?

image

This looks a lot more natural to me, and it is a good fit into RQL in general. This syntax does bring a few things to the table. In particular, look a the edge. In Cypher, an anonymous edge would be: [:Likes], and using this method, we will have just [Likes].

However, as nice as this syntax is, we still run into a problem. The query above is actually just a shorthand way to write the full query, which looks like so:

image

In fact, we have two queries here, to show off the actual problem we have in parsing. In the first case, we have a match clause the only refers to explicit with statement. On the second case, we have a couple of explicit with statements, but also an implicit with edges expression (the Likes).

From the point of view of the parser, we can’t distinguish those two. Now, we can absolutely say that if the edge expression contains a single name, we’ll simply look for an edge with that name and otherwise assume that this is the path that will be used.

But this seems to be error prone, because you might have a small typo or remove a edge statement and get a completely different (and unexpected) meaning. I thought about adding some sort of prefix to help tell an alias from an implicit definition, but that looks very ugly, see:

image 

And on the other hand, I really like the –[Likes]-> syntax in general. It is a lot cleaner and easier to read.

At this point, I don’t have a solution for this. I think we’ll go with the mode in which we can’t tell what the query is meant to say just from the parser, and look at the explicit with statements to figure it out (with the potential for mistakes that I pointed out earlier) until we can figure out something better.

One thing that I’m thinking about is that the () and [] which help distinguish between nodes and edges, aren’t actually required for us if we have an explicit statement. So we can write it like so:

image

In this manner, we can tell, quite easily, if you meant to define an implicit edge / node or refers to an explicitly defined alias. I’m not sure whatever this would be a good idea, though.

Another issue we have to deal with is:

image

Note that in this case, we have a filter expression on the edge as well. Applying the same process we have done so far, we get:

image

The advantages here is that this is very clear and obvious about what is going on. The disadvantage is that this takes quite a bit longer to express.

Graphs in RavenDBWhat’s the role of the middle man?

time to read 2 min | 279 words

imageAn interesting challenge with implementing graph queries is that you sometimes get into situations where the correct behavior is counter intuitive.

Consider the case of the graph on the right and the following query:

image

This will return:

  • Source: Arava, Destination: Oscar

But what would be the value of the Edge property? The answer to that is… complicated.  What we actually return is the edge itself. Let’s see what I mean by that.

image

And, indeed, the value of Edge in this query is going to be dogs/oscar.

image

This isn’t very helpful if we are talking about a simple edge like this. After all, we can deduce this from the Src –> Destination pair. This gets more interesting when the edge is more complex. Consider the following query:

image

What do you this should be the output here? In this case, the edge isn’t the Product property, it is the specific line that match the filter on the edge. Here is what the result looks like:

image

As you can imagine, knowing exactly what edge led you from one document to another can be very useful when you look at the query results.

Graphs in RavenDBI didn’t mean to build this feature!

time to read 2 min | 258 words

I was busy working on the implementation on filtering in graph queries, as discussed in my previous post. What I ended up implementing is a way for the user to tell us exactly how to handle the results. The actual query we ended up with is this:

image

And the key part here is the where clause, were we state that a and c cannot be the same dog. This also matches the behavior of SQL, and for that reason allow (predictably), that’s a good idea.

However, I didn’t just implement inequity, I implement full filtering capabilities, and you can access anything in the result. Which means that this query is now also possible:

image

I’ll let you a moment to analyze this query in peace. Try to de-chyper it (pun intended).

What this  query is doing is to compare the actual sale price and the regular price of product on a particular order, for products that match a particular set of categories.

This is a significant query because, for the first time in RavenDB, you have the ability to perform such a query (previous, you would have had to define a specific index for this query).

In other words, what graph query filtering brings to the table is joins. And I did not set out to build this feature and I’m feeling very strange about it.

Graphs in RavenDBQuery results

time to read 2 min | 381 words

imageWe run into an interesting design issue when building graph queries for RavenDB. The problem statement is fairly easy. Should a document be allowed to be bound to multiple aliases in the query results, or just one? However, without context, the problem statement in not meaningful, so let’s talk about what the actual problem is. Consider the graph on the right. We have three documents, Arava, Oscar and Phoebe and the following edges:

  • Arava Likes Oscar
  • Phoebe Likes Oscar

We now run the following query:

image

This query asks for a a dog that likes another dog that is liked by a dog. Another way to express the same sentiment (indeed, how RavenDB actually considers this type of query) is to write it as follows:

image

When processing the and expression, we require that documents that match to the same alias will be the same. Given the graph that we execute this on, what would you consider the right result?

Right now, we have the first option, in which a document can be match to multiple different alias in the same result, which would lead to the following results:

image

Note that in this case, the first and last entries match A and C to the same document.

The second option is to ensure that a document can only be bound to a single alias in the result, which would remove the duplicate results above and give us only:

image

Note that in either case, position matters, and the minimum number of results this query will generate is two, because we need to consider different starting points for the pattern match on the graph.

What do you think should we do in such a case? Are there reasons to want this behavior or that and should it be something that the user select?

DZone Webinar: Migrating from RavenDB 2.5 to 4.1 in 36,000 Locations

time to read 1 min | 87 words

On Oct 18, Rodrigo is going to talk about how a quick service restaurant chain has moved 36,000 locations (and millions of instances) from RavenDB 2.5 to 4.1.

In this session you will learn:

  • Why RDI Software chose RavenDB for its quick service restaurant platform?
  • All the hurdles that massive-scale deployments add to the mix
  • How the migration to Version 4.0 enables the future vision of our platform?
  • What were the trade-offs we had to make in moving from one version to the next?

Please register for the webinar in advance.

Graphs in RavenDBGraph modeling vs. document modeling

time to read 3 min | 445 words


One of the most important design decisions we made with RavenDB is not forcing users to explicitly create edges between documents. Instead, the edges are actually just normal properties on the documents and can be used as-is. This means that pretty much any existing RavenDB database can immediately start using graph operations, you don’t need to do anything.

The image below shows an order, using the RavenDB’s sample Northwind dataset. The highlighted portions mark the edges from this document. You can use these to traverse the graph by hopping from document to document.

image

For example, using:

image

This is easy to do, migrates well (zero cost to do so, yeah!) and usually matches nicely with what you already have. It does lead to an interesting observation. Typically, in a graph DB, you’ll model everything as a graph. But with RavenDB, you don’t need to do that.

In particular, let’s take a look at the Neo4J’s rendition of Northwind:

image

As you can see, everything is modeled as a node / edge. This is the only thing you could model it as. With RavenDB, you would typically use a domain driven model. In this case, it means that a value object, like an OrderLIne, will not have its own concrete existence. Either as a node or an edge. Instead, it will be embedded inside its root aggregate (the order).

Note that this is actually quite interesting, because it means that we need to be able to provide the ability to query on complex edges, such as the order lines. Here is how works:

image

This will give us all the discount products sold in London as well as their discount rate.

Note that in here, unlike previous queries, we use an named alias for the edge. In this case, it gives us the ability to access it properties and project the line’s Discount property to the user. This means that you can have a domain model with strong cohesion and locality, following the domain driven design principles while still being able to run arbitrary graph queries on it.  Combining this with the ability to pull data from indexes (including map/reduce) ones, you have a lot of things that you can do that used to be very hard but now are easy.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Graphs in RavenDB (11):
    08 Nov 2018 - Real world use cases
  2. Challenge (54):
    28 Sep 2018 - The loop that leaks–Answer
  3. Reviewing FASTER (9):
    06 Sep 2018 - Summary
  4. RavenDB 4.1 features (12):
    22 Aug 2018 - MongoDB & CosmosDB Migration Wizards
  5. Reading the NSA’s codebase (7):
    13 Aug 2018 - LemonGraph review–Part VII–Summary
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats