The design of RavenDB’s attachments

time to read 4 min | 800 words

Originally posted at 1/6/2011

imageI got a question on attachments in RavenDB recently:

I know that RavenDb allows for attachments. Thinking in terms of facebook photo albums - would raven attachments be suitable?

And one of the answers from the community was:

We use attachments and it works ok. We are using an older version of  RavenDB (Build 176 unstable), and the thing I wish would happen is that attachments were treated like regular documents in the DB. That way you could query them just like other documents. I am not sure if this was changed in newer releases, but there was talk about it being changed.

If I had to redesign again, I would keep attachments out of the DB cause they are resources you could easily off load to a CDN or cloud service like Amazon or Azure. If the files are in your DB, that makes it more work to optimize later.

In summary: You could put them in the DB, but you could also put ketchup on your ice cream. :)

I thought that this is a good point to stop and explain a bit about the attachment support in RavenDB. Let us start from the very beginning.

The only reason RavenDB has attachments support is that we wanted to support the notion of Raven Apps (see Couch Apps) which are completely hosted in RavenDB. That was the original impetus. Since then, they evolved quite nicely. Attachments in RavenDB can have metadata, are replicated between nodes, can be cascade deleted on document deletions and are HTTP cacheable.

One of the features that was requested several times was automatically turning a binary property to an attachment on the client API. I vetoed that feature for several reasons:

  • It makes things more complicated.
  • It doesn’t actually gives you much.
  • I couldn’t think of a good way to explain the rules governing this without it being too complex.
  • It encourage storing large binaries in the same place as the actual document.

Let us talk in concrete terms here, shall we? Here is my model class:

public class User
{
  public string Id {get;set;}
  public string Name {get;set;}
  public byte[] HashedPassword {get;set;}
  public Bitmap ProfileImage {get;set;}
}

From the point of view of the system, how is it supposed to make a distinction between HashedPassword (16 – 32 bytes, should be stored inside the User document) and ProfileImage (1Kb – 2 MB, should be stored as a separate attachment).

What is worst, and the main reason why attachments are clearly separated from documents, is that there are some things that we don’t want to store inside our document, because that means that:

  • Whenever we pull the document out, we have to pull the image as well.
  • Whenever we index the document, we need to load the image as well.
  • Whenever we update the document we need to send the image as well.

Do you sense a theme here?

There is another issue, whenever we update the user, we invalidate all the user data. But when we are talking about large files, changing the password doesn’t means that you need to invalidate the cached image. For that matter, I really want to be able to load all the images separately and concurrently. If they are stored in the document itself (or even if they are stored as an external attachment with client magic to make it appears that they are in the document) you can’t do that.

You might be familiar with this screen:

image_thumb[1]

If we store the image in the Animal document, we run into all of the problems outlined above.

But if we store it as a Url reference to the information, we can then:

  • Load all the images on the form concurrently.
  • Take advantage of HTTP caching.
  • Only update the images when they are actually changed.

Overall, that is a much nicer system all around.