Ayende @ Rahien

Refunds available at head office

RavenDB and complex tagging

In the RavenDB mailing list, we got a question about tagging. In this case, the application need:

1. Tags have identity  ("set" has a different meaning if I'm talking math, music or sports).

2. I want to know who tagged what and when.

2. I want to do this once, as a service, so i don’t need have ids in each document i want to tag. In my app, there are many such document types.

Let us see how we can approach this in RavenDB. We are going to do it like so:

imageimage

Note that because tags have identity, we store only the tag id inside the tagged object, along with the required information about who & when it was tagged.

Now, let us try to have some fun with this. Let us say that I want to be able to show, given a specific album, all the albums that have any of the same tags as the specified album.

We start by defining the following index:

image

Note that the naming convention matches what we would expect using the default Linq convention, so we can easily query this index using Linq.

And now we want to query it, which, assuming that we are starting with albums/1, will look like:

image

This translate to “show me all of the albums that share any of the specified tags, except albums/1”.

And this is pretty much it, to be fair. Oh, if you want to show the tags names you’ll have to include the actual tags documents, but there really isn’t anything complex going on.

But what about the 3rd requirement?

Well, it isn’t really meaningful. You can move this Tags collection to a the layer super type, but if you want to be able to do nice tagging with RavenDB, this is probably the easiest way to go.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

alexidsa
07/04/2012 09:38 AM by
alexidsa

I thought you don't like when document child elements have ids...

henry
07/04/2012 02:27 PM by
henry

What if 5000 users add 5 tags to a single doc. Isn't the album doc becoming pretty large (and slow to load/deserialize) with a Tags array of 25000 items then?

ccollie
07/04/2012 03:35 PM by
ccollie

@alexidsa - the reason ids are needed is that tag names can be ambiguous. If i add the tag "rock", am i talking about geology or music. Imagine you're a newspaper, with multiple sections (music, outdoors, etc). Searching for "rock" without context can return incorrect results.

The ids in my case point to a Term which has a name, like "rock", but also a Vocabulary ("sports", "entertainment", or whatever) which helps to provide context and disambiguate the tag name.

ccollie
07/04/2012 03:39 PM by
ccollie

@henry - definitely a consideration. In my case, i don't anticipate there being more than 5 per document.

But you are correct in that this can drive the decision of how documents are structured. For instance, i have a few document types which can be voted on, and there is a much higher probability that the number of votes could exceed the amount i'd like to keep in a single document. So voting information is split into a separate document.

Ayende Rahien
07/04/2012 11:08 PM by
Ayende Rahien

alexidsa, That is a reference to another document, not an internal id.

Rafal
07/05/2012 08:04 AM by
Rafal

But with such design you have to perform a join/distinct operation to display tags for an album. Imho it would be better to modify the structure of the document like:

Tags: [ { Id: 'tags/1', Name: 'Tag 1', Tagged: [{By: 'user/1', When: '2010-01-01'}, ...]}, (... and so on) ]

Ayende Rahien
07/05/2012 08:29 AM by
Ayende Rahien

Rafal, I won't have to do ANY join/distinct operation. Those are relational concepts. I can include the related docs, and that is a cheap operation

Rafal
07/05/2012 09:51 AM by
Rafal

That's nice - I thought the include function works only for single doc references.

Chanan Braunstein
07/05/2012 01:28 PM by
Chanan Braunstein

In this scheme, what is in the Tag document? Does it have a "purpose" here? Why not just store the tag name in the album instead of the id and remove the need for the tag document? I am guessing it has something to do with the first requirement, but I am not getting it.

Ayende Rahien
07/05/2012 01:50 PM by
Ayende Rahien

Chanan, For example, it might have fields like: "Name", "Description", "IsAdult", etc. Users might want to follow a tag, and you want to track how many are doing so. It is an actual entity in the system

Daniel Lang
07/07/2012 05:32 PM by
Daniel Lang

I would denormalize the label here as well. A tags label is something that almost never changes and when it does, it is ok to have a batch operation that goes through all the albums and updates the label accordingly.

John
07/07/2012 09:45 PM by
John

Is it also possible to return the count of matching tags from the index?

Ayende Rahien
07/07/2012 11:17 PM by
Ayende Rahien

John, You already have the list of tags, just count them.

Comments have been closed on this topic.