The design of RavenDB’s attachments
Originally posted at 1/6/2011
I got a question on attachments in RavenDB recently:
I know that RavenDb allows for attachments. Thinking in terms of facebook photo albums - would raven attachments be suitable?
And one of the answers from the community was:
We use attachments and it works ok. We are using an older version of RavenDB (Build 176 unstable), and the thing I wish would happen is that attachments were treated like regular documents in the DB. That way you could query them just like other documents. I am not sure if this was changed in newer releases, but there was talk about it being changed.
If I had to redesign again, I would keep attachments out of the DB cause they are resources you could easily off load to a CDN or cloud service like Amazon or Azure. If the files are in your DB, that makes it more work to optimize later.
In summary: You could put them in the DB, but you could also put ketchup on your ice cream. :)
I thought that this is a good point to stop and explain a bit about the attachment support in RavenDB. Let us start from the very beginning.
The only reason RavenDB has attachments support is that we wanted to support the notion of Raven Apps (see Couch Apps) which are completely hosted in RavenDB. That was the original impetus. Since then, they evolved quite nicely. Attachments in RavenDB can have metadata, are replicated between nodes, can be cascade deleted on document deletions and are HTTP cacheable.
One of the features that was requested several times was automatically turning a binary property to an attachment on the client API. I vetoed that feature for several reasons:
- It makes things more complicated.
- It doesn’t actually gives you much.
- I couldn’t think of a good way to explain the rules governing this without it being too complex.
- It encourage storing large binaries in the same place as the actual document.
Let us talk in concrete terms here, shall we? Here is my model class:
public class User { public string Id {get;set;} public string Name {get;set;} public byte[] HashedPassword {get;set;} public Bitmap ProfileImage {get;set;} }
From the point of view of the system, how is it supposed to make a distinction between HashedPassword (16 – 32 bytes, should be stored inside the User document) and ProfileImage (1Kb – 2 MB, should be stored as a separate attachment).
What is worst, and the main reason why attachments are clearly separated from documents, is that there are some things that we don’t want to store inside our document, because that means that:
- Whenever we pull the document out, we have to pull the image as well.
- Whenever we index the document, we need to load the image as well.
- Whenever we update the document we need to send the image as well.
Do you sense a theme here?
There is another issue, whenever we update the user, we invalidate all the user data. But when we are talking about large files, changing the password doesn’t means that you need to invalidate the cached image. For that matter, I really want to be able to load all the images separately and concurrently. If they are stored in the document itself (or even if they are stored as an external attachment with client magic to make it appears that they are in the document) you can’t do that.
You might be familiar with this screen:
If we store the image in the Animal document, we run into all of the problems outlined above.
But if we store it as a Url reference to the information, we can then:
- Load all the images on the form concurrently.
- Take advantage of HTTP caching.
- Only update the images when they are actually changed.
Overall, that is a much nicer system all around.
Comments
I agree with what you were saying. I don't know if my comment was unclear in any way, but I meant that if you do decide to have attachement in your DB instance it would be great if RavenDB would create a standard Attachment/File Document that coincided with that attachment, kind of like MetaData but more visible than metadata. At the very least it would be great to have a Raven index that could show you the attachments.
You are right about replicating attachments, but why even do it to start with (from a developer's standpoint)? Depending on the size of your attachments, this could end up being costly to replicate. If you stored files in a system separate from the DB, it just becomes way more flexible for you when the time comes to refactor the storage aspect of your application.
I think we have the same understanding, but I just learned it the hard way. :)
Khalid,
Attachments already have metadata, and I am not sure what having a document referencing an attachment would give you.
The ability to index on the attachments?
What is the use case behind this?
For replication, we need to replicate attachments if we are replicating documents, because it is not good to tell users "yes, we support attachments, oh, but not for replication"
The ability to index on the attachments? Yes
What is the use case behind this? If I created a Wiki and allowed arbitrary attachments (google groups), I would probably end up creating a "File" model within my domain. I am stating that the notion of a file is probably standard enough that Raven could have a default File document, so you wouldn't have to add it to your domain every time you start an application that deals with files. If you want a different "File", create one that fits your domain better.
standard file (IMO).
File {
}
something like RavenFile ravenFile = Session.Attachments.Store(file); or RavenFile ravenFile = Session.Attachments.GetFileInfo(key);
some examples.
I agree with you here it would be silly. What I meant, was I wouldn't store attachments in the database cause that functionality does exist. I wouldn't want it to replicate large files, but I know it will. So to avoid the replicating of large attachments, the only option is not to put them into the database.
Khalid,
That sounds like something that can be done safely externally. And if it can be done safely & easily externally, I would much rather have it done that way.
It reduces the amount of complexity that I have to deal with.
Thanks for posting this; as one of the people who frequently requested this, your explanation of the situation from a technical perspective really helped to fill in the gaps.
I completely agree with you that it can be safely and easily done externally to RavenDB. Also much easier for you since you (Oren) don't have to maintain it.
Ultimately if I had to choose again, I would have created a concept of a File in my application and stored the binary files separately from RavenDB.
I guess to clarify, RavenDB has attachments but your recommendation (mine as well) is not to build your application (.NET) storage on attachments. The attachments were to facilitate storage for web based apps (HTML5, Javascript) that ran on the client. Is that correct?
Khalid,
Yes, that is the case.
It might be even more accurate to state that attachments are mostly there as light weight option if you have really simplistic needs.
You should put "attachments are mostly there as light weight option if you have really simplistic needs." In H1 tags on the documentation page. That sums it up perfectly in a short and sweet statement.
I noticed you talking about the concept of Raven Apps. Is this something that is still on the roadmap, and / or do you have any plans on releasing documentation / samples?? (I can't find any information on the RavenDB website at the moment)
Cheers
Sam
Khalid,
I don't think so.
They can be used for additional stuff, and I don't want to be too limiting
Sam,
It just works, we have JS API available that you can use, and that is about it :-)
Comment preview