Attachments, RavenFS and scoping out the market
RavenFS is a pretty cool technology. It was designed to handle both very large files over geographically distributed environment and large number of small files in a single datacenter. It has some really cool features, such as the ability to run metadata searches, delta replication, etc. And yet, pretty much all our customers are using it primarily as a way to handle small set of binaries, typically strongly related to the documents. We also got a lot of feedback / worries about attachments deprecation from customers.
This post is intended to lay out some of our thoughts regarding this feature. And the idea is that we are going with the market. We are going to merge RavenFS back into RavenDB.
Instead of having files with metadata, we’ll reverse things, you’ll have documents with attachments. Let us consider the simplest example that I could conceive. Users and profile pictures.
You are going to store the user’s information in “users/1” document. And then you need to store the profile pic somewhere. You’ll be able to do that by push that into RavenDB as an attachments. An attachment is always going to be tied to a specific document, and if it is deleted, all its attachments will also be deleted. So in this case, we’ll have “profile.png” attachment on “users/1”.
Of course, you don’t have just a single profile picture, you also have a thumbnail of that. So after the user has uploaded their pic and you attached that to the user’s document, we’ll have an offline process to generate the thumbnail and attach that as well to the document.
Documents will have a metadata flag that will indicate whatever they have attachments, and if they do, the metadata will contain the list of attachments they have. So loading the document will be enough to enable you to peek at all its attachments, however load an attachment would be a separate operation. You’ll always be able to access and attachment directly, naturally. Attachment won’t have metadata or the ability to search them, instead, you can define your indexes on documents, as you normally do, and go from there to the attachments you desire.
Adding / deleting / modifying an attachment will also update the etag of the document they are attached to (since it updates the document metadata). The attachments will receive the same etag as their document at the time of modification, and will be replicated along the same manner. Obviously, only new attachments will be replicated whenever the document is updated. Conflicts on attachments is also a conflict on the document, and will be resolved based on however the document conflict is resolved.
Because attachments reside in the same location as documents, we can now have a transaction that spans both a document and attachment (not necessarily to the same document, mind), which will make things easier on our users.
Comments
Any update on the interface, how it will look like? How does it compare to IDatabaseCommands.Put/GetAttachment which are marked obsolete at the moment.
// Ryan
Heh, live and learn.
When Raven switched to RavenFS, I felt it was a little jarring -- disconnected from documents -- but seemingly more open-ended.
However, attachments fit my scenarios better. Attachments for me are always a small set of binaries and each attachment is always connected to a document. I've never used RavenFS for anything but this. So, I'm cool with this change back to attachments.
Cool. What's the tentative roadmap/release that will have this?
I actually really like it the way it is now. We use RavenFS heavily to store binary or large non-structured text files with lots of searching based on metadata. They are loosely coupled to documents in that sets of documents might share certain properties with the metadata of certain sets of files but that's it. But I suppose it's 6 of one vs. a half dozen of the other; we can move searchable properties out of metadata and into a document with attachments.
Ryan, The User Interface? Consider attachments in gmail, that is probably what we'll do.
From API perspective, the low level API is here: https://github.com/ravendb/ravendb/blob/v4.0/test/FastTests/Client/Attachments/AttachmentsCrud.cs#L33
But we'll also have something directly in the session, that will take part of the same tx.
David, This is going to be in 4.0
Will there be at least an option for a standalone RavenFS? We currently use RavenFS as it is intended right now with fairly lose ties to the various document stores.
Going down the attachments route would requires having to create a document just so a file can be attached.
A better option might be for better integration between a document and file store.
Hassan, This scenario will still be supported, yes. Effectively, you'll have a single document that contains all your attachments, and then you can have as many of those as plain binary data.
Sigh! it took me a couple of hours to migrate from attachments to RavenFS, because you said attachments would be removed. And now you want to bring attachments back?
I like the association of attachments with documents - that's exactly how I was always doing this. I just used the doc-Id / attachment-Id / file-path for this (e.g. docs/123 -> docs/123/images/small.jpg). I also like that you will be able to update documents and attachments in a single transaction - that was always something, that bothered me a little bit.
What I don't like: The metadata on the attachments / RavenFS-files were quite useful (storing checksums, content-type information and stuff like that). And I'm not sure I, like that adding/deleting/modifying attachments will also touch the associated document. (Maybe I would then end up having two docs: docs/123, docs/123/attachments.meta and only the latter one having attachments)
If I could make a wish, then it would be simple and straightforward attachments as they are right now, but with support for having them in the same transaction as the documents.
eTobi, It took several years to build RavenFS, I'm not happy that it isn't getting the amount of usage we expected it to. And yes, I'm aware that it isn't a good place to be in terms of changes.
Checksum & content type are natively stored for each attachment. Other metadata per attachment can be stored in the parent doc metadata easily enough. The
/attachments
trick will work well if you want to do this separation, and it brings you most of the way toward the same behaviors with attachments.This sounds like a simpler approach and I prefer it if bulk export / import picks these up in the same way as RaveDB attachments. It's a bit of a shame because we have used RavenFS in a number of places but that's life. We wil probably use attachments even more than RavenFS if it's simpler and less moving parts.
What is the situation with cloning because this is important to us. Assume if you clone a document the attachments are also cloned but then if you update them it doesn't impact the original document?
Ian, What do you mean bulk import / export picks it up? It would be part of standard import / export, yes. Go through replication and managed just the same.
What do you mean by cloning? Attachments are actually stored internally in 2 places. There is the attachment data itself, which is stored with its hash, then you have references from the doc to the attachment by hash (de-dup). Duplicating the document will result in new entries for the attachments, and updating them will impact only the new doc
Sorry for delay... at the moment, we move tenants between environments around by bulk export / import (often through Studio currently) and the RavenFS files need to be done seperately, hence why having it work as attachments will be one less thing to manage.
Cloning: I mean copying a document in RavenDB by setting Id to null, evict / store & save changes. With attachments the copying of these references will be taken care of by the sound of it, rather than us having to manage this seperately with calls to RavenFS.
In terms on migration to RavenDB 4.0, if we went through the process of moving away from RavenFS now and started using the previous RavenDB attachments engine, will it be backwards compatible? I would like to get ahead of the game now because we want to get RavenDB 4.0 up and running asap to get the performance benefits (especially indexing which is painful for us). Or do we need to wait and do it against a new client API for attachments? Whilst slightly nervous about a switch to Voron / new stuff we are excited by all the great work you & team have been doing and keen to get the benefits.
On that note, do you have updated timescales for beta / RTM at this stage so we can plan ahead?
Ian, Cloning in this manner will not work for attachments, no. What you are doing is creating a whole new document. The API make it easy, sure, but the server get a whole new document. You could do that with the attachment as well, though, but saving them as part of the new cloning feature. It will be done in the same transaction.
The new API will not be backward compatible.
Last alpha was just released, we are now gearing to the beta. The general idea is to to that in May.
Comment preview