Graphs in RavenDBQuery results
We run into an interesting design issue when building graph queries for RavenDB. The problem statement is fairly easy. Should a document be allowed to be bound to multiple aliases in the query results, or just one? However, without context, the problem statement in not meaningful, so let’s talk about what the actual problem is. Consider the graph on the right. We have three documents, Arava, Oscar and Phoebe and the following edges:
- Arava Likes Oscar
- Phoebe Likes Oscar
We now run the following query:
This query asks for a a dog that likes another dog that is liked by a dog. Another way to express the same sentiment (indeed, how RavenDB actually considers this type of query) is to write it as follows:
When processing the and expression, we require that documents that match to the same alias will be the same. Given the graph that we execute this on, what would you consider the right result?
Right now, we have the first option, in which a document can be match to multiple different alias in the same result, which would lead to the following results:
Note that in this case, the first and last entries match A and C to the same document.
The second option is to ensure that a document can only be bound to a single alias in the result, which would remove the duplicate results above and give us only:
Note that in either case, position matters, and the minimum number of results this query will generate is two, because we need to consider different starting points for the pattern match on the graph.
What do you think should we do in such a case? Are there reasons to want this behavior or that and should it be something that the user select?
More posts in "Graphs in RavenDB" series:
- (08 Nov 2018) Real world use cases
- (01 Nov 2018) Recursive queries
- (31 Oct 2018) Inconsistency abhorrence
- (29 Oct 2018) Selecting the syntax
- (26 Oct 2018) What’s the role of the middle man?
- (25 Oct 2018) I didn’t mean to build this feature!
- (22 Oct 2018) Query results
- (21 Sep 2018) Graph modeling vs. document modeling
- (20 Sep 2018) Pre-processing the queries
- (19 Sep 2018) The query language
- (18 Sep 2018) The overall design
Comments
Here is related corner case which I'm thinking about. Let's have following graph:
1. Arava likes Oscar.
2. Arava likes Arava (well, he/she is a narcissist :-) )
Now I want following query:
With "1 entity to multiple alias matching", this query shall have this output:
Arava , Oscar
Without such matching, there is no result - and this is a case which I'm considering wrong.
I may be showing my relational leanings but I'd prefer to have all of the results (presuming there's an easy way to take on something along the line of
a <> c
as an additional part of the query, if I don't want the "duplicates").Say that the query is asymmetric, such that there's a further link hanging off of
c
. And say that, for the purposes of following that link tod
, I do wanta
to be treated identically to any otherc
. How do I express that ifa
andc
are automatically precluded from being the same?is the problem the same "document" or is the problem that the first resultset is showing the same relationship, which is NOT what the query is trying to find? In relational sql, when doing a self join we would prevent the same row by adding e.g. where a.id != b.id. However, sometimes the same row can match itself, depending on the columns being compared.
Jan, Thanks for the interesting query. Right now, this query will return:
The problem I have is with the second result, which make sense, but is probably not expected
Damien, What we are considering is adding a
where
clause after thematch
, which will allow you to define exclusions like that. I'm not sure that I'm following the point on asymmetric query.I believe it makes more sense to return all the results. If we need, we can always add a "where a.Id <> b.Id" clause.
Peter, See my example to Jan, which may explain it better. Yes, the problem is that it is matched to itself
Okay, so for the "asymmetric" query, I was thinking something along the lines of "Find all dogs which are two likes removed from dogs that Arava likes". if we express that as
(a:Dogs) -[:Likes] -> (b:Dogs) <-[:Likes]- (c:Dogs) <- [:Likes] - (d:Dogs)
.d
contains the dogs that are two likes removed, except those that like Arava, if we auto-filter to preventc
being equal toa
.Damien, Oh, I see. Yes. The end result is that we must implement some manner to allow the user to filter these.
where a.Id <> b.Id
is not very convenient butwhere a.Id > b.Id
is more general for de-duping so that triples can follow the same pattern: a.Id > b.Id > c.Id etc.wqw, I'm not sure that I follow why you would want
a.Id > b.Id
here. In particular, note that in RavenDB, ids are usually strings without lexical sorting, so I don't see how greater then would helpReturning all results would mean:
Which is a rather clear violation of the "principle of least astonishment". In my view, it's imperative that features behave as closely as possible to the human logical understanding of the feature and not necessarily with the widest mathematical understanding. The problem with allowing A to alias B is in multiple operand queries there would have to be lot's of filters added, which can lead to the number of filter clauses to become factorial to the number of operands:
IE:
(a:Dogs) -[:Likes] -> (b:Dogs) <-[:Likes]- (c:Dogs) <- [:Likes] - (d:Dogs) where a != c and a != d and b != d
IF someone whishes to express
(a:Dogs) -[:Likes] -> (b:Dogs) <-[:Likes]- (c:Dogs)
as a can euqual to b then it can do it(a:Dogs) -[:Likes] -> (b:Dogs) <-[:Likes]- (c:Dogs) or (a:Dogs) -[:Likes] -> (b:Dogs) <-[:Likes]- (a:Dogs)
but this query makes little sense to me.Pop, A better example might be something like:
(e:Employees)-[:Manager]->(m:Employees (Nice = true) )
How would you handle the case where you have a self managed employee that is nice?
@Oren
by default, I would like
e
andm
to be exclusive but for other purposes, I would like some form of set algebraic operators to indicate relationships between a and m, something like 'e supersetof m', which I personally think it's more expressive than boolean where conditions between elements (But this is my opinion, I could be biased here).Pop,
The problem is that users, I believe, will expect to match this behavior:
select e.Id, m.Id from Employees e join Employees m where e.Manager = m.Id
@Ayende,
While the expectation is true, the self-managed employee is usually the case of bugs in software. Whenever it happens to have a self-managed employee there's usually some bugs to go along with it. Some of which I've encountered myself:
While I fully agree about the expectation of the orthogonality of the feature: graph query should behave like Linq counterpart, Is still think to work with multiple relationships over a set, it a nuisance. I don't think there's a wrong or right way from the general sense, just a matter of preference for various types of users.
Pop, I'm not sure if you mean that graph queries should behave like Linq or should not, but I'm pretty sure that writing the above query with Linq would produce the same result, unless we explicitly do something to prevent it, no?
Yes, Makes sense. If the relationship exists it should appear in the Query.
Comment preview