What is new in RavenDB 3.0RavenFS
A frequent request from RavenDB users was the ability to store binary data. Be that actual documents (PDF, Word), images (user’s photo, accident images, medical scans) or very large items (videos, high resolution aerial photos).
RavenDB can do that, sort of, with attachments. But attachments were never a first class feature in RavenDB.
With RavenFS, files now have first class support. Here is a small screen shot, I’ve a detailed description of how it works below.
The Raven File System exposes a set of files, which are binary data with a specific key. However, unlike a simple key/value store, RavenFS does much more than just store the binary values.
It was designed upfront to handle very large files (multiple GBs) efficiently at API and storage layers level. To the point where it can find common data patterns in distinct files (or even in the same file) and just point to it, instead of storing duplicate information. RavenFS is a replicated and highly available system, updating a file will only send the changes made to the file between the two nodes, not the full file. This lets you update very large files, and only replicate the changes. This works even if you upload the file from scratch, you don’t have to deal with that manually.
Files aren’t just binary data. Files have metadata associated with them, and that metadata is available for searching. If you want to find all of Joe’s photos from May 2014, you can do that easily. The client API was carefully structured to give you full functionality even when sitting in a backend server, you can stream a value from one end of the system to the other without having to do any buffering.
Let us see how this works from the client side, shall we?
var fileStore = new FilesStore() { Url = "http://localhost:8080", DefaultFileSystem = "Northwind-Assets", }; using(var fileSession = fileStore.OpenAsyncSession()) { var stream = File.OpenRead("profile.png"); var metadata = new RavenJObject { {"User", "users/1345"}, {"Formal": true} }; fileSession.RegisterUpload("images/profile.png", stream, metadata); await fileSession.SaveChangesAsync(); // actually upload the file } using(var fileSession = fileStore.OpenAsyncSession()) { var file = await session.Query() .WhereEquals("Formal", true) .FirstOrDefaultAsync(); var stream = await session.DownloadAsync(file.Name); var file = File.Create("profile.png"); await stream.CopyToAsync(file); }
First of all, you start by creating a FileStore, similar to RavenDB’s DocumentStore, and then create a session. RavenFS is fully async, and we don’t provide any sync API. The common scenario is using for large files, where blocking operations are simply not going cut it.
Now, we upload a file to the server, note that at no point do we need to actually have the file in memory. We open a stream to the file, and register that stream to be uploaded. Only when we call SaveChangesAsync will we actually read from that stream and write to the file store. You can also see that we are specifying metadata on the file. Later, we are going to be searching on that metadata. The results of the search is a FileHeader object, which is useful if you want to show the user a list of matching files. To actually get the contents of the file, you call DownloadAsync. Here, again, we don’t load the entire file to memory, but rather will give you a stream for the contents of the file that you can send to its final destination.
Pretty simple, and highly efficient process, overall.
RavenFS also has all the usual facilities you need from a data storage system, including full & incremental backups, full replication and high availability features. And while it has the usual file system folder model, to encourage familiarity, the most common usage is actually as a metadata driven system, where you locate a desired file based searching.
More posts in "What is new in RavenDB 3.0" series:
- (24 Sep 2014) Meta discussion
- (23 Sep 2014) Operations–Optimizations
- (22 Sep 2014) Operations–the nitty gritty details
- (22 Sep 2014) Operations–production view
- (19 Sep 2014) Operations–the pretty pictures tour
- (19 Sep 2014) SQL Replication
- (18 Sep 2014) Queries improvements
- (17 Sep 2014) Query diagnostics
- (17 Sep 2014) Indexing enhancements
- (16 Sep 2014) Indexing backend
- (15 Sep 2014) Simplicity
- (15 Sep 2014) JVM Client API
- (12 Sep 2014) Client side
- (11 Sep 2014) The studio
- (11 Sep 2014) RavenFS
- (10 Sep 2014) Voron
Comments
What license is RavenFS? I found the github repository, but there's no license info in there...
This is, I mean, yeah, this is pretty impressive. A lot.
Matthijs , There isn't a separate repository for RavenFS any more. What you are looking at is a very old remnant. RavenFS is licensed under the same license as RavenDB.
In your code example you're not using the metadata object while storing so I don't think this will work like this. Also it doesn't compile I guess (missing a semicolon at the end).
Koen, Thanks, I updated the code sample.
This is very cool. Will there bee any tool to upgrade/migrate existing attachments in Raven 2.5 to RavenFS?
Great news - I've been eagerly awaiting this feature. There are many things we're planning to do with it :-)
"If you want to find all of Joe’s photos from May 2014, you can do that easily."
Damn it, iCloud.
This look really great! Have you thought about how you will priced this on RavenHQ?
Mike, Yes, there is such a tool, it is part of the 3.0 release dist.
Olav, This is just the 3.0 release status. The RavenHQ stuff and especially billing is something that will be handled separately.
Cool. Just wondering, if my client already purchased Raven DB 2.x, will they need to purchase a new license for 3.0?
Mike, That depend on the type of license they purchased. If they went with the subscription model, then yes, they can just upgrade and there is no issue with versioning. If your client purchased a one time license, that requires a new license purchase (we do provide 15% discount).
What about file versioning, will RavenFS support that as well?
Steven, Currently we didn't implement automatic versioning. Considering the fact that we are looking at handling this for very large files, that is something that we wanted the user to have a choice about. It is probable we'll add that once we have enough customer feedback.
Ayende, What would happen if the file was edited at the same time at different sites? How does RavenFS handle this?
Matt, That would generate a conflict, just like in RavenDB. You would be asked to resolve that conflict, and everything would go on as usual.
Thanks for the quick response. Did you ever consider the ability to 'lock' a file?
Matt, When you upload a file to a node, that is locked _on that node_. We don't do distributed locks, however.
Have you considered adding WebDav support?
Rik, Not at the moment, no.
This is going to be a killer feature I think, even on projects using an sql database.
I've been on a few projects recently where we wanted file storage like this but things like amazon weren't an option
Ayende, do you see RavenFS replacing the need for, say, Azure blob storage, or AWS's S3 or other storage option? If so, up to what point? What would be use case threshold where one might say, "O.K., we've outstripped RavenFS's ability to meet demand; time to move to Azure or AWS"?
Eric, RavenFS is there to provide more than just blob storage. To start with, it is replicated and has rich metadata capabilities (including searching).
The use case is quite different. A common use case can be distributing large files across multiple nodes (data bus), storing information in a way that allows you to do fast queries on their metadata, and avoiding moving the data to a remote location.
Where are the files stored? And can we control where? Just thinking that where I have the RavenDB server may not have the ability to store gigabytes of data.
Shmueli, We store them in a data directory, inside a single file on the file system. You can control where the happens, yes.
Ayende, can I implement sharding with RavenFS?How?
Nannez, That is probably something that we need to discuss over email. It can be done, yes, but you need to handle the distribution yourself right night.
Comment preview