Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,131 | Comments: 45,568

filter by tags archive

RavenDB and complex tagging

time to read 3 min | 418 words

In the RavenDB mailing list, we got a question about tagging. In this case, the application need:

1. Tags have identity  ("set" has a different meaning if I'm talking math, music or sports).

2. I want to know who tagged what and when.

2. I want to do this once, as a service, so i don’t need have ids in each document i want to tag. In my app, there are many such document types.

Let us see how we can approach this in RavenDB. We are going to do it like so:


Note that because tags have identity, we store only the tag id inside the tagged object, along with the required information about who & when it was tagged.

Now, let us try to have some fun with this. Let us say that I want to be able to show, given a specific album, all the albums that have any of the same tags as the specified album.

We start by defining the following index:


Note that the naming convention matches what we would expect using the default Linq convention, so we can easily query this index using Linq.

And now we want to query it, which, assuming that we are starting with albums/1, will look like:


This translate to “show me all of the albums that share any of the specified tags, except albums/1”.

And this is pretty much it, to be fair. Oh, if you want to show the tags names you’ll have to include the actual tags documents, but there really isn’t anything complex going on.

But what about the 3rd requirement?

Well, it isn’t really meaningful. You can move this Tags collection to a the layer super type, but if you want to be able to do nice tagging with RavenDB, this is probably the easiest way to go.



I thought you don't like when document child elements have ids...


What if 5000 users add 5 tags to a single doc. Isn't the album doc becoming pretty large (and slow to load/deserialize) with a Tags array of 25000 items then?


@alexidsa - the reason ids are needed is that tag names can be ambiguous. If i add the tag "rock", am i talking about geology or music. Imagine you're a newspaper, with multiple sections (music, outdoors, etc). Searching for "rock" without context can return incorrect results.

The ids in my case point to a Term which has a name, like "rock", but also a Vocabulary ("sports", "entertainment", or whatever) which helps to provide context and disambiguate the tag name.


@henry - definitely a consideration. In my case, i don't anticipate there being more than 5 per document.

But you are correct in that this can drive the decision of how documents are structured. For instance, i have a few document types which can be voted on, and there is a much higher probability that the number of votes could exceed the amount i'd like to keep in a single document. So voting information is split into a separate document.

Ayende Rahien

alexidsa, That is a reference to another document, not an internal id.


But with such design you have to perform a join/distinct operation to display tags for an album. Imho it would be better to modify the structure of the document like:

Tags: [ { Id: 'tags/1', Name: 'Tag 1', Tagged: [{By: 'user/1', When: '2010-01-01'}, ...]}, (... and so on) ]

Ayende Rahien

Rafal, I won't have to do ANY join/distinct operation. Those are relational concepts. I can include the related docs, and that is a cheap operation


That's nice - I thought the include function works only for single doc references.

Chanan Braunstein

In this scheme, what is in the Tag document? Does it have a "purpose" here? Why not just store the tag name in the album instead of the id and remove the need for the tag document? I am guessing it has something to do with the first requirement, but I am not getting it.

Ayende Rahien

Chanan, For example, it might have fields like: "Name", "Description", "IsAdult", etc. Users might want to follow a tag, and you want to track how many are doing so. It is an actual entity in the system

Daniel Lang

I would denormalize the label here as well. A tags label is something that almost never changes and when it does, it is ok to have a batch operation that goes through all the albums and updates the label accordingly.


Is it also possible to return the count of matching tags from the index?

Ayende Rahien

John, You already have the list of tags, just count them.

Comment preview

Comments have been closed on this topic.


  1. RavenDB Conference 2016–Slides - 4 hours from now
  2. Proposed solution to the low level interview question - about one day from now

There are posts all the way to Jun 02, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats