Self “joins” in RavenDB

time to read 6 min | 1106 words

A customer had an interesting problem in the mailing list. He had the following model:

public class Case
{
    public string Id { get; set; }
    public string[] CasesCitedByThisCase { get; set; }
    public string Name { get; set; }
    // other interesting properties
}

And he needed to be able to query and sort by the case’s properties, but also by its importance. A case important is determined by how many other cases are citing it.

This seems hard to do, because you cannot access other documents during standard indexing, so you have to use map reduce to make this work. The problem is that it is really hard to just map/reduce to make this work, you need two disparate sets of information from the documents.

This is why we have multi map, and we can create the appropriate index like this:

public class Cases_Search : AbstractMultiMapIndexCreationTask<Cases_Search.Result>
{
    public class Result
    {
        public string Id { get; set; }
        public int TimesCited { get; set; }
        public string Name { get; set; }
    }

    public Cases_Search()
    {
        AddMap<Case>(cases =>
                        from c in cases
                        select new
                        {
                        c.Id,
                        c.Name,
                        TimesCited = 0
                        }
            );

        AddMap<Case>(cases =>
                        from c in cases
                        from cited in c.CasesCitedByThisCase
                        select new
                        {
                            Id = cited,
                            Name = (string)null,
                            TimesCited = 1
                        }
            );

        Reduce = results =>
                    from result in results
                    group result by result.Id
                    into g
                    select new
                    {
                    Id = g.Key,
                    Name = g.Select(x => x.Name).FirstOrDefault(x => x != null),
                    TimesCited = g.Sum(x => x.TimesCited)
                    };
    }
}

We are basically doing two passes on the cases, one to get the actual case information, the next to get the information about the cases it cite. We then take all of that information together and reduce it together, resulting in the output that we wanted.

The really nice thing about this? This being a RavenDB index, all of the work is done once, and queries on this are going to be really cheap.

Tweet Share Share 20 comments

Tags:

raven

Comments

06 Aug 2012
10:23 AM

Andreas Kroll

Hi Ayende,

cool stuff. And published totally at the right time. I just have this "problem" ahead of me.

One interesting question would be if it is possible to count not only the immediate childs, but all childs that have the current item as parent. There are a few problems within that approach, because this only works if you have a directed and loop free graph. But if you take an org-chart as example, it is not possible that one person is above and below another person in the hierarchy.

Do you also have a solution to count all descendants?

Thanks Andreas

06 Aug 2012
10:47 AM

Ayende Rahien

Andreas, If you have all of the parents in each docs, that is easy. Otherwise, it gets complex and probably not worth it.

06 Aug 2012
11:21 AM

Andreas Kroll

Ayende, I was not trying to have a node have multiple parents, but rather leave the structure as you showed it, but instead of counting only immediate childs I'd like to count all childs below a node.

A --- B | |---C----D |-----E

So B, D and E would have a child count of 0 C has a child count of 2 A has a child count of 4

Would that be possible to achieve with map/reduce?

Thanks Andreas

06 Aug 2012
11:22 AM

Andreas Kroll

Sorry, the comment engine reformatted the picture of the nodes

A has childs B and C C has childs D and E B, D and E have no childs

06 Aug 2012
15:08 PM

configurator

Here's another interesting case. Say I have some kind of hierarchical data, and I want to show its tree. So when accessing a page /a/b/c/d we'd show a tree that shows all the parentage (a, b, and c), and d and its children.

Suppose we model our documents so that they only have the Children property and not a Parent (or Ancestry), could you create a Map/Reduce index that returns the ancestry for an object?

06 Aug 2012
16:01 PM

Ayende Rahien

Andreas, You would have to have all of the parents available in the document. So:

e-> a, b,c,d d->a,b,c c->a,b b-> a a

If you don't have this, then no, you would have to do multiple passes, and we don't allow it.

06 Aug 2012
16:02 PM

Ayende Rahien

Configurator, No need for map/reduce here, you could do that, but you would require multiple queries (O(H), where H is the height).

06 Aug 2012
16:09 PM

configurator

The question is, could you prepare a map/reduce index so that it can be done in a single query, e.g. the results would be { document: "docs/17", parents: [ "docs/4", "docs/9", "docs/13" ] }

06 Aug 2012
16:10 PM

Ayende Rahien

Configurator, If the Children property on the parents was recursive (So it contains ALL children, not just direct ones), yes, easily. Otherwise, no.

06 Aug 2012
20:08 PM

Ajai

Just curious on how re indexing works in RavenDB.

Say I delete a Case, or update a case (add/remove cases CitedByThisCase) - do you re-run map/reduce on the whole case document collection?

Ajai

06 Aug 2012
22:29 PM

Andreas Kroll

Hi Ayende,

thanks for the clarification. You are - as most of the time - right that this requirement would indicate a needed change in the model itself. A change in a single node could otherwise trigger a lot of changes in the index and that really would not be performant at all.

As Configurator was also interested in some sort of hierarchical information.... how would you tackle such a task so it could be implemented with RavenDB?

The classical example in SQL is having an org-chart (person and his/her manager) and the usage of a hierarchical function. You can query for a single node and have all its children or all its parents in the "flat" output.

How would that translate to RavenDB? A recursive function in SQL is no "index" of any sort, but rather a function running on the server to "iterate" over the hierarchical layers and aggregate the data.

How would you perform that sort of task in RavenDB?

Thanks Andreas

07 Aug 2012
04:06 AM

Ayende Rahien

Ajai, We would re-run things for the specific case that was updated/deleted.

07 Aug 2012
04:08 AM

Ayende Rahien

Andreas, RavenDB contains builtin support for recursion. Here is an example on one such index:

from ex in examples from ov in ex.Overrides let ancestry = Recurse(ov, x => x.Parent) select new { ex.Id, ov.OwnerId, Name = ov.OverriddenValues["Name"] ?? (ancestry.Any(x => x.OverriddenValues["Name"] != null) ? ancestry.First(x => x.OverriddenValues["Name"] != null).OverriddenValues["Name"] : ex.Name), Description = ov.OverriddenValues["Description"] ?? (ancestry.Any(x => x.OverriddenValues["Description"] != null) ? ancestry.First(x => x.OverriddenValues["Description"] != null).OverriddenValues["Description"] : ex.Description) });

You could do this in the transform results and load the entire hierarchy along with the child object.

07 Aug 2012
04:43 AM

Ajai

Hi Ayende

Not understanding this correctly.

Say:

C1 cites C2 & C3 C4 cites C2

Run map/reduce:

C2 -> TimesCited = 2 C3 -> TimesCited = 1

When C1 is deleted how can indexing work just by reprocessing just the deleted record?

Do we not need to rerun map/reduce on whole collection to recompute TimesCited for all then cases? Or are somehow intermediate results also stored?

Any pointers on the implementation or link to relevant source is much appreciated...

Thanks

Ajai

07 Aug 2012
04:46 AM

Ayende Rahien

Ajai, We store some of the intermediate results, exactly for this purpose. It basically works by saying:

Delete C1, figure out what part of the map/reduce would be affected, re-run those parts.

07 Aug 2012
04:56 AM

Ajai

Thanks Ayende

That is super cool! Will try dig into the implementation...

Ajai

07 Aug 2012
13:54 PM

Andreas Kroll

Hi Ayende,

man.... I must admit you never cease to amaze with RavenDB. I never thought that recursive actions would be part of the concept.

Awesome!!!

Is there any source I could read more about those things?

Thanks Andreas

07 Aug 2012
13:55 PM

Ayende Rahien

Andreas, The mailing list is the best place for that. We have a BIG documentation push coming up.

08 Aug 2012
03:21 AM

Jack Jones

This sort of stuff is trivially simple with graph databases. If the relationships between records are equally as important as the record contents, I would recommend you check out graph db's and some of the algorithms that they make simple.

Oren: I know you were looking at implementing a graph db on top of Raven at some point, but I understand that you dropped it due to market size? I wish you had continued with that.

08 Aug 2012
04:01 AM

Ayende Rahien

Jack, I could have create a bad impl of a graph db, or created a really awesome doc db.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB