Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 79

filter by tags archive

RavenDB and complex tagging

time to read 3 min | 418 words

In the RavenDB mailing list, we got a question about tagging. In this case, the application need:

1. Tags have identity  ("set" has a different meaning if I'm talking math, music or sports).

2. I want to know who tagged what and when.

2. I want to do this once, as a service, so i don’t need have ids in each document i want to tag. In my app, there are many such document types.

Let us see how we can approach this in RavenDB. We are going to do it like so:

imageimage

Note that because tags have identity, we store only the tag id inside the tagged object, along with the required information about who & when it was tagged.

Now, let us try to have some fun with this. Let us say that I want to be able to show, given a specific album, all the albums that have any of the same tags as the specified album.

We start by defining the following index:

image

Note that the naming convention matches what we would expect using the default Linq convention, so we can easily query this index using Linq.

And now we want to query it, which, assuming that we are starting with albums/1, will look like:

image

This translate to “show me all of the albums that share any of the specified tags, except albums/1”.

And this is pretty much it, to be fair. Oh, if you want to show the tags names you’ll have to include the actual tags documents, but there really isn’t anything complex going on.

But what about the 3rd requirement?

Well, it isn’t really meaningful. You can move this Tags collection to a the layer super type, but if you want to be able to do nice tagging with RavenDB, this is probably the easiest way to go.


Comments

alexidsa

I thought you don't like when document child elements have ids...

henry

What if 5000 users add 5 tags to a single doc. Isn't the album doc becoming pretty large (and slow to load/deserialize) with a Tags array of 25000 items then?

ccollie

@alexidsa - the reason ids are needed is that tag names can be ambiguous. If i add the tag "rock", am i talking about geology or music. Imagine you're a newspaper, with multiple sections (music, outdoors, etc). Searching for "rock" without context can return incorrect results.

The ids in my case point to a Term which has a name, like "rock", but also a Vocabulary ("sports", "entertainment", or whatever) which helps to provide context and disambiguate the tag name.

ccollie

@henry - definitely a consideration. In my case, i don't anticipate there being more than 5 per document.

But you are correct in that this can drive the decision of how documents are structured. For instance, i have a few document types which can be voted on, and there is a much higher probability that the number of votes could exceed the amount i'd like to keep in a single document. So voting information is split into a separate document.

Ayende Rahien

alexidsa, That is a reference to another document, not an internal id.

Rafal

But with such design you have to perform a join/distinct operation to display tags for an album. Imho it would be better to modify the structure of the document like:

Tags: [ { Id: 'tags/1', Name: 'Tag 1', Tagged: [{By: 'user/1', When: '2010-01-01'}, ...]}, (... and so on) ]

Ayende Rahien

Rafal, I won't have to do ANY join/distinct operation. Those are relational concepts. I can include the related docs, and that is a cheap operation

Rafal

That's nice - I thought the include function works only for single doc references.

Chanan Braunstein

In this scheme, what is in the Tag document? Does it have a "purpose" here? Why not just store the tag name in the album instead of the id and remove the need for the tag document? I am guessing it has something to do with the first requirement, but I am not getting it.

Ayende Rahien

Chanan, For example, it might have fields like: "Name", "Description", "IsAdult", etc. Users might want to follow a tag, and you want to track how many are doing so. It is an actual entity in the system

Daniel Lang

I would denormalize the label here as well. A tags label is something that almost never changes and when it does, it is ok to have a batch operation that goes through all the albums and updates the label accordingly.

John

Is it also possible to return the count of matching tags from the index?

Ayende Rahien

John, You already have the list of tags, just count them.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The industry at large - 2 hours from now
  2. The insidious cost of allocations - about one day from now
  3. Buffer allocation strategies: A possible solution - 4 days from now
  4. Buffer allocation strategies: Explaining the solution - 5 days from now
  5. Buffer allocation strategies: Bad usage patterns - 6 days from now

And 2 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    01 Sep 2015 - The case of the lying configuration file
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats