What is new in RavenDB 3.0RavenFS

time to read 5 min | 802 words

A frequent request from RavenDB users was the ability to store binary data. Be that actual documents (PDF, Word), images (user’s photo, accident images, medical scans) or very large items (videos, high resolution aerial photos).

RavenDB can do that, sort of, with attachments. But attachments were never a first class feature in RavenDB.

With RavenFS, files now have first class support. Here is a small screen shot, I’ve a detailed description of how it works below.

image

The Raven File System exposes a set of files, which are binary data with a specific key. However, unlike a simple key/value store, RavenFS does much more than just store the binary values.

It was designed upfront to handle very large files (multiple GBs) efficiently at API and storage layers level. To the point where it can find common data patterns in distinct files (or even in the same file) and just point to it, instead of storing duplicate information. RavenFS is a replicated and highly available system, updating a file will only send the changes made to the file between the two nodes, not the full file. This lets you update very large files, and only replicate the changes. This works even if you upload the file from scratch, you don’t have to deal with that manually.

Files aren’t just binary data. Files have metadata associated with them, and that metadata is available for searching. If you want to find all of Joe’s photos from May 2014, you can do that easily. The client API was carefully structured to give you full functionality even when sitting in a backend server, you can stream a value from one end of the system to the other without having to do any buffering.

Let us see how this works from the client side, shall we?

var fileStore = new FilesStore()
{
    Url = "http://localhost:8080",
    DefaultFileSystem = "Northwind-Assets",
};

using(var fileSession = fileStore.OpenAsyncSession())
{
    var stream = File.OpenRead("profile.png");
    var metadata = new RavenJObject
    {
        {"User", "users/1345"},
        {"Formal": true}
    };
    fileSession.RegisterUpload("images/profile.png", stream, metadata);
    await fileSession.SaveChangesAsync(); // actually upload the file
}

using(var fileSession = fileStore.OpenAsyncSession())
{
    var file = await session.Query()
                    .WhereEquals("Formal", true)
                    .FirstOrDefaultAsync();

    var stream = await session.DownloadAsync(file.Name);

    var file = File.Create("profile.png");

    await stream.CopyToAsync(file);
}

First of all, you start by creating a FileStore, similar to RavenDB’s DocumentStore, and then create a session. RavenFS is fully async, and we don’t provide any sync API. The common scenario is using for large files, where blocking operations are simply not going cut it.

Now, we upload a file to the server, note that at no point do we need to actually have the file in memory. We open a stream to the file, and register that stream to be uploaded. Only when we call SaveChangesAsync will we actually read from that stream and write to the file store. You can also see that we are specifying metadata on the file. Later, we are going to be searching on that metadata. The results of the search is a FileHeader object, which is useful if you want to show the user a list of matching files. To actually get the contents of the file, you call DownloadAsync. Here, again, we don’t load the entire file to memory, but rather will give you a stream for the contents of the file that you can send to its final destination.

Pretty simple, and highly efficient process, overall.

RavenFS also has all the usual facilities you need from a data storage system, including full & incremental backups, full replication and high availability features. And while it has the usual file system folder model, to encourage familiarity, the most common usage is actually as a metadata driven system, where you locate a desired file based searching.

More posts in "What is new in RavenDB 3.0" series:

  1. (24 Sep 2014) Meta discussion
  2. (23 Sep 2014) Operations–Optimizations
  3. (22 Sep 2014) Operations–the nitty gritty details
  4. (22 Sep 2014) Operations–production view
  5. (19 Sep 2014) Operations–the pretty pictures tour
  6. (19 Sep 2014) SQL Replication
  7. (18 Sep 2014) Queries improvements
  8. (17 Sep 2014) Query diagnostics
  9. (17 Sep 2014) Indexing enhancements
  10. (16 Sep 2014) Indexing backend
  11. (15 Sep 2014) Simplicity
  12. (15 Sep 2014) JVM Client API
  13. (12 Sep 2014) Client side
  14. (11 Sep 2014) The studio
  15. (11 Sep 2014) RavenFS
  16. (10 Sep 2014) Voron