Attachments, RavenFS and scoping out the market
RavenFS is a pretty cool technology. It was designed to handle both very large files over geographically distributed environment and large number of small files in a single datacenter. It has some really cool features, such as the ability to run metadata searches, delta replication, etc. And yet, pretty much all our customers are using it primarily as a way to handle small set of binaries, typically strongly related to the documents. We also got a lot of feedback / worries about attachments deprecation from customers.
This post is intended to lay out some of our thoughts regarding this feature. And the idea is that we are going with the market. We are going to merge RavenFS back into RavenDB.
Instead of having files with metadata, we’ll reverse things, you’ll have documents with attachments. Let us consider the simplest example that I could conceive. Users and profile pictures.
You are going to store the user’s information in “users/1” document. And then you need to store the profile pic somewhere. You’ll be able to do that by push that into RavenDB as an attachments. An attachment is always going to be tied to a specific document, and if it is deleted, all its attachments will also be deleted. So in this case, we’ll have “profile.png” attachment on “users/1”.
Of course, you don’t have just a single profile picture, you also have a thumbnail of that. So after the user has uploaded their pic and you attached that to the user’s document, we’ll have an offline process to generate the thumbnail and attach that as well to the document.
Documents will have a metadata flag that will indicate whatever they have attachments, and if they do, the metadata will contain the list of attachments they have. So loading the document will be enough to enable you to peek at all its attachments, however load an attachment would be a separate operation. You’ll always be able to access and attachment directly, naturally. Attachment won’t have metadata or the ability to search them, instead, you can define your indexes on documents, as you normally do, and go from there to the attachments you desire.
Adding / deleting / modifying an attachment will also update the etag of the document they are attached to (since it updates the document metadata). The attachments will receive the same etag as their document at the time of modification, and will be replicated along the same manner. Obviously, only new attachments will be replicated whenever the document is updated. Conflicts on attachments is also a conflict on the document, and will be resolved based on however the document conflict is resolved.
Because attachments reside in the same location as documents, we can now have a transaction that spans both a document and attachment (not necessarily to the same document, mind), which will make things easier on our users.