RavenDB Webinar: Aggregation just jump a grade or two…
In tomorrow’s Webinar, we will discuss handle dynamic aggregation using RavenDB. A new feature in 2.5, this is meant to give you more options for reporting queries, including complex aggregation, dynamic selection, etc.
You can register here: https://www2.gotomeeting.com/register/789291530
The difference between Ordering & Boosting
This seems to be a pretty common issue with people getting the two of them confused. As an example, let us take the users in Stack Overflow:
Here, we want to get the users in order. We want to get all the users in descending order of reputation.
But what happens when we want to do an actual search, for example, we want to get users by tag. Perhaps we want to get someone that knows some ravendb.
Here is the data that we have to work with:
Now, when searching, we want to be able to do the following. Find users that match what the tags that we specified, that are relevant and have them show up in reputation order.
And that is where it kills us. Relevancy & order are pretty much exclusive. Before we can explain that, we need to understand that order is absolute, but relevancy is not. If I have 10,000 tags, there is very little meaning to me having a tag or not. But if I have 10 tags, me having a tag or not is a lot more important. You want to talk with an expert in a specific field, not just someone who is a jack of all trades.
Now, it might be that you want to apply some boost factor to users with high reputation, because there are people who are jack of all trades and master of most. That is the difference between boosting and ordering.
Ordering is absolute, while boosting is a factor applied against the relative relevancy of the current query.
How not to deal with Replication Lag
Because RavenDB replication is async in nature, there is a period of time between a write has been committed on the master and until it is visible to the clients.
A user has requested that we would provide a low latency way to provide a solution to that. The idea was that the master server would report to the secondaries that a write happened, and then they would mark all reads from them for those documents as dirty, until replication caught up.
Implementation wise, this is all ready to happen. We have the Changes API, which is an easy way to get changes from a db. We have the ability to return a 204 Non Authoritative response, so it looks easy.
In theory, it sounds reasonable, but this idea just doesn’t hold water. Let us talk about normal operations. Even with the “low latency” notifications (and replication is about as low latency as it already get), we have to deal with a window of time between the write completing on the master and the notification arriving on the secondaries. In fact, it is the exact same window as with replication. Sure, if you have a high replication load, that might be different, but those tend to be rare (high write load, very big documents, etc).
But let us assume that this is really the case. What about failures?
Let us assume Server A & B and client C. Client C makes a write to A, A notifies B and when C reads from B, it would get 204 response until A replicates to B. All nice & dandy. But what happens when A can’t talk to B ? Remember a server being down is the easiest scenario, the hard part is when both A & B are operational, but can’t talk to one another. RavenDB is designed to gracefully handle network splits and merges, so what would happen in this case?
Client C writes to A, but A can’t notify B or replicate to it. Client C reads from B, but since B got no notification about a change, it return 200 Ok response, which means that this is the latest version. Problem.
In this case, this is actually a bigger problem than you might consider. If we support the notifications under the standard scenario, user will make assumptions about this. They will have separate code paths for non authoritative responses, for example. But as we have seen, we have a window of time where the reply would say it is authoritative even though it isn’t (a very short one, sure, but still) and under failure scenarios we will out right lie.
It is better not to have this “feature” at all, and let the user handle that on his own (and there are ways to handle that, reading from the master for important stuff, for example).
RavenDB Clusters & Write Assurances
RavenDB handles replication in an async manner. Let us say that you have 5 nodes in your cluster, set to use master/master replication.
That means that you call SaveChanges(), the value is saved to the a node, and then replicated to other nodes. But what happens when you have safety requirements? What happens if a node goes down after the call to SaveChanges() was completed, but before it replicate the information out?
In other systems, you have the ability to specify W factor, to how many nodes this value will be written before it is considered “safe”. In RavenDB, we decided to go in a similar route. Here is the code:
1: await session.StoreAsync(user);2: await session.SaveChangesAsyng(); // save to one of the nodes3:4: var userEtag = session.Advanced.GetEtagFor(user);5:6: var replicas = await store.Replication.WaitAsync(etag: userEtag, repliacs: 1);
As you can see, we now have a way to actually wait until replication is completed. We will ping all of the replicas, waiting to see that replication has matched or exceeded the etag that we just wrote. You can specify the number of replicas that are required for this to complete.
Practically speaking, you can specify a timeout, and if the nodes aren’t reachable, you will get an error about that.
This gives you the ability to handle write assurances very easily. And you can choose how to handle this, on a case by case basis (you care to wait for users to be created, but not for new comments, for example) or globally.
RavenDB & Locking indexes
One of the things that we keep thinking about with RavenDB is how to make it easier for you to run in production.
To that end, we introduce a new feature in 2.5, Index Locking. This looks like this:
But what does this mean, to lock an index?
Well, let us consider a production system, in which you have the following index:
from u in docs.Users select new { Query = new[] { u.Name, u.Email, u.Email.Split('@') } }
After you go to production, you realize that you actually needed to also include the FullName in the search queries as well. You can, obviously, do a full deployment from scratch, but it is generally so much easier to just fix the index definition on the production server, update the index definition on the codebase, and wait for the next deploy for them to match.
This works, except that in many cases, RavenDB applications call IndexCreation.CreateIndexes() on start up. Which means that on the next startup of your application, the change you just did will be reverted. These options allows you to lock an index for changes, either in such a way that gives you the ability ignore changes to this index, or by raising an error when someone tries to modify the index
It is important to note that this is not a security feature, you can at any time unlock the index. This is there to make help operations, that is all.
Better patching API for RavenDB: Creating New Documents
A while ago we introduced the ability to send js scripts to RavenDB for server side execution. And we have just recently completed a nice improvement on that feature, the ability to create new documents from existing ones.
Here is how it works:
store.DatabaseCommands.UpdateByIndex("TestIndex", new IndexQuery {Query = "Exported:false"}, new ScriptedPatchRequest { Script = script } ).WaitForCompletion();
Where the script looks like this:
for(var i = 0; i < this.Comments.length; i++ ) { PutDocument('comments/', { Title: this.Comments[i].Title, User: this.Comments[i].User.Name, By: this.Comments[i].User.Id }); } this.Export = true;
This will create a set of documents for each of the embedded documents.
RavenDB Map/Reduce optimizations
So I was diagnosing a customer problem, which required me to write the following:
This is working on a data set of about half a million records.
I took a peek at the stats and I saw this:
You can ignore everything before 03:23, this is the previous index run. I reset it to make sure that I have a clean test.
What you can see is that we start out with a mapping & reducing values. And you can see that initially this is quite expensive. But very quickly we recognize that we are reducing a single value, and we switch strategies to a more efficient method, and we suddenly have very little cost involved in here. In fact, you can see that the entire process took about 3 minutes from start to finish, and very quickly we got to the point where are bottle neck was actually the maps pushing data our way.
That is pretty cool.
The state of Rhino Mocks
I was asked to comment on the current state of Rhino Mocks. The current codebase is located here: https://github.com/hibernating-rhinos/rhino-mocks
The last commit was 2 years ago. And I am no longer actively / passively monitoring the mailing list.
From my perspective, Rhino Mocks is done. Done in the sense that I don’t have any interest in extending it, done in the sense that I don’t really use mocking any longer.
If there is anyone in the community that wants to steps in and take charge as the Rhino Mocks project leader, I would love that. Failing that, the code it there, it works quite nicely, but that is all I am going to be doing with this for the time being and the foreseeable future.
RavenDB 2.5 Features: Import data to Excel
I wonder what it says about RavenDB that we spend time doing excel integration
.
At any rate, we have the following documents inside RavenDB:
And we want to get this data into Excel. Not only that, but we want this to be something more than just a flat file. We want something that will auto update itself.
We start by defining the shape of the output, using a transformer.
Then we go an visit the following url:
- http://localhost:8080/databases/MusicBox – The server & database that we are querying.
- streams/query/Raven/DocumentsByEntityName?query=Tag:Albums – Stream the results of querying the index Raven/DocumentsByEntityName for all Tag:Albums (effectively, give me all the albums).
- resultsTransformer=Albums/ShapedForExcel – transform the results using the specified transformer.
- format=excel – output this in a format that excel will find easy to understand
The output looks like this:
Now, let us take this baby and push this to Excel. We create a new document, and then go to the Data tab, and then to From Text:
We have a File Open Dialog, and we paste the previous URL as the source, then hit enter.
We have to deal with the import wizard, just hit next on the first page.
We mark the input as comma delimited, and then hit finish.
We now need to select where it would go on the document:
And now we have the data inside Excel:
We aren’t done yet, we have the data in, now we need to tell Excel to refresh it:
Click on the connections button, where you’ll see something like this:
Go to Properties:
- Uncheck Prompt for file name on refresh
- Check Refresh data when opening the file
Close the file, go to your database and change something. Open the file again, and you can see the new values in there.
You have now create an Excel file that can automatically pull data from RavenDB and give your users immediate access to the data in a format that they are very comfortable with.
Raven’s Storage: Memtables are tough
Memtables are conceptually a very simple thing. You have the list of values that you were provided, as well as a skip list for searches.
Complications:
- Memtables are meant to be used concurrently.
- We are going to have to have to hold all of our values in memory. And I am really not sure that I want to be doing that.
- When we switch between mem tables (and under write conditions, we might be doing that a lot), I want to immediately clear the used memory, not wait for the GC to kick in.
The first thing to do was to port the actual SkipList from the leveldb codebase. That isn’t really hard, but I had to make sure that assumptions made for the C++ memory model are valid for the CLR memory model. In particular, .NET doesn’t have AtomicPointer, but Volatile.Read / Volatile.Write are a good replacement, it seems. I decided to port the one from leveldb because I don’t know what assumptions other list implementations have made. That was the first step in order to create a memtable. The second was to decide where to actually store the data.
Here is the most important method for that part:
public void Add(ulong seq, ItemType type, Slice key, Stream val)
The problem is that we cannot just reference this. We have to copy those values into memory that we control. Why is that? Because the use is free to change the Stream contents or the Slice’s array as soon as we return from this method. By the same token, we can’t just batch this stuff in memory, again, because of the LOH. The way this is handled in leveldb never made much sense to me, so I am going to drastically change that behavior.
In my implementation, I decided to do the following:
- Copy the keys to our own buffer, and keep them inside the skip list. This is what we will use for actually doing searches.
- Change the SkipList to keep track of values, as well as the key.
- Keep the actual values in unmanaged memory, instead of managed memory. That avoid the whole LOH issue, and give me immediate control on when the memory is disposed.
This took some careful coding, because I want to explicitly give up on the GC for this. That means that I need to make damn sure that I don’t have bugs that would generate memory leak.
Each memtable would allocate 4MB of unmanaged memory, and would write the values to it. Note that you can write over 4MB (for example, by writing a very large value, or by writing a value whose length exceed the 4MB limit. At that point, we would allocate more unmanaged memory, and hand over the memory table to compaction.
The whole thing is pretty neat, even if I say so myself
.
Raven’s Storage: Understanding the SST file format
This is an example of an SST that stores:
- tests/0000 –> values/0
- tests/0001 –> values/1
- tests/0002 –> values/2
- tests/0003 –> values/3
- tests/0004 –> values/4
As well as a bloom filter for rapid checks.
If you are wondering about the binary format, that is what this post is all about. We actually start from the end. We have the last 48 bytes of the file are dedicated to the footer.
The footer format is:
- Last 8 bytes of the file is a magic number: 0xdb4775248b80fb57ul – this means that we can quickly identify whatever this is an SST file or not.
Here is what this number looks like broken down to bytes:
- The other 40 bytes are dedicated for the metadata handle and the index handle.
Those are two pair of longs, encoded using 7 bit encoding, in our case, here is what they look like:
Let us see if we can parse them properly:
- 100
- 38
- 143, 1 = 143
- 14
Note that in order to encode 143 properly, we needed two bytes, because it is higher than 127 (and we use the last bit to indicate if there are more items to read). The first two values are actually the metadata handle (position: 100, count: 38), the second are the index handle (position: 143, count: 14).
We will start by parsing the metadata block first:
You can see the relevant portions in the image.
The actual interesting bits here are the first three bytes. Here we have:
- 0 – the number of shared bytes with the previous key (there is no, which is why it is zero).
- 25 – the number of non shared bytes (in this case ,the full value, which is 25).
- 2 – the size of the value, in this case ,the value is the handle in the file of the data for the filter.
You can read the actual key name on the left ,”filter.BuiltinBloomFilter”, and then we have the actual filter data:
You can probably guess that this is the filter handle (position: 82, count: 18).
The rest of the data are two 4 bytes integers. Those are the restart array (position 130 –133) and the restart count (position 134 – 137). Restarts are a very important concept for reducing the size of the SST, but I’ll cover them in depth when talking about the actual data, not the metadata.
Next, we have the actual filter data itself, which looks like this:
This is actually a serialized bloom filter, which allows us to quickly decide if a specified key is here or not. There is a chance for errors, but errors can only be false positive, never false negative. This turn out to be quite useful down the road, when we have multiple SST files are need to work with them in tandem. Even so, we can break it apart into more detailed breakdown:
- The first 8 bytes are the actual serialized bloom filter bits.
- The 9th byte is the k value in the bloom filter algorithm. The default value is 6.
- The last value (11) is the lg value (also used in the bloom filter algo).
- The rest of the data is a bit more interesting. The 4 bytes preceding the 11 (those are 9,0,0,0) are the offset of valid data inside the filter data.
- The four zeros preceding that are the position of the relevant data in the file.
Honestly, you can think about this as a black box. Filter data is probably enough.
Now that we are done with the filter, we have to look at the actual index data. This is located on 143, but we know that the filter data is actually ended on 100 + 38, so why the gap? The answer is that after each block, we have a block type (compressed or not, basically) and the block CRC, which is used to determine if the file has been corrupted.
Back to the index block, is tarts at 143 and goes for 14 bytes, looking like this:
The last 4 bytes (32 bit int equal to 1) is the number of restarts that we have for this block. And the 4 bytes preceding them (32 bit int equal to 0) is the offset of the restarts in the index block.
In this case, we have just one restart, and the offset of that is 0. Hold on about restarts, I’ll get there. Now let us move to the beginning of the index block. We have the following three bytes: 0,1,2.
Just like in the meta block case, those fall under (shared, non shared, value size) and are all 7 bit encoded ints. That means that there is no shared data with the previous key (because there isn’t previous key), the non shared data is 1 and the data size is 2. If you memorized your ASCII table, 117 is lower case ‘u’. The actual value is a block handle. This time, for the actual data associated with this index. In this case, a block with position: 0 and count: 70.
Let us start analyzing this data. 0,10, 8 tells us shared, non shared, value . Indeed, the next 10 bytes spell out ‘tests/0000’ and the 8 after that are ‘values/0’. And what about the rest?
Now we have 9,1,8. Shared is 9, non shared 1 and value size is 8. We take the first 9 bytes of the previous key, giving us ‘tests/000’ and append to it the non shared data, in this case, byte with value 49 (‘1’ in ASCII), giving us a full key of ‘tests/0001’. The next 8 bytes after that spell out ‘values/1’. And the rest is pretty much just like it.
Now, I promised that I would talk about restarts. You see how we can use the shared/non shared data to effectively compress data. However, that has a major hidden cost. In order to figure out what the key is, we need to read all the previous keys.
In order to avoid this, we use the notion of restarts. Every N keys (by default, 16), we we will have a restart point and put the full key into the file. That means that we can skip ahead based on the position specified in the restart offset, and that in turn is governed by the number of restarts that we have in the index block.
And… that is pretty much it. This is the full and unvarnished binary dump of an SST. Obviously real ones are going to be more complex, but they all follow the same structure.
Raven’s Storage: Reading a Sorted String Table
When reading a SST, we have to deal with values of potentially large sizes. I want to avoid loading anything into managed memory if I can possible avoid it. That, along with other considerations has led me to use memory mapped files as the underlying abstractions for reading from the table.
Along the way, I think that I made the following assumptions:
- Work in little endian only.
- Can work in 32 bits, but 64 bits are preferred.
- Doesn’t put pressure on the .NET memory manager.
In particular, I am worried about users wanting to do store large values, or large number of keys.
As it turned out, .NET’s memory mapped files are a pretty good answer for what I wanted to do. Sure, it is a bit of a pain with regards to how to handle things like WinRT / Silverlight, etc. But I am mostly focused on server side for now. And I got some ideas on how to provide the mmap illusion on top regular streams for platforms that don’t support it.
The fact that SST is written by a single thread, and once it is written it is immutable has drastically simplified the codebase. Although I have to admit that looking at hex dump to figure out that I wrote to the wrong position is a bit of a bother, but more on that later. A lot of the code is basically the leveldb code, tweaked for .NET uses. One important difference that I made was with regards to the actual API.
- Keys are assumed to be small. Most of the time, less than 2KB in size, and there are optimizations in place to take advantage of that. (It will still work with keys bigger than that, but will consume more memory).
- In general, through the codebase I tried to put major emphasis on performance and memory use even at this early stage.
- Values are not assumed to be small.
What does the last one mean?
Well, to start with, we aren’t going to map the entire file into our memory space, to start with, it might be big enough to start fragmenting our virtual address space, but mostly because there is no need. We always map just a single blokc at at time, and usually we never bother to read the values into managed memory, instead just accessing the data directly from the memory mapped file.
Here is an example of reading the key for an entry from the SST:
As you can see, we need to read just the key itself to memory, the value itself is not even touched. Also, there is some buffer management going on to make sure that we don’t need to re-allocate buffers as we are scanning through the table.
When you want to get a value, you call CreateValueStream, which gives you a Stream that you can work with. Here is how you use the API:
This is actually part of the internal codebase, we are actually storing data inside the SST that will later help us optimize things, but that is something I’ll touch on a later point in time.
Except for the slight worry that I am going to have to change the underlying approach from memory mapped files to streams if I need to run it outside the server/client, this is very cool.
Next on my list is to think on how to implement the memtable, again, without impacting too much on the managed memory.
This is debugging, old school
Raven.Storage just passed its first “test”
Take a look at the code below. This actually completed as expected, and was working beautifully. As I probably mentioned, the architecture of this is really nice, and I think I was able to translate this into .NET code in a way that is both idiomatic and useful. 4:30 AM now, and I think that this is bed time for me now. But I just couldn’t leave this alone.
RavenDB Course Updates
I am updating the RavenDB Course for 2.5. It is now a 3 days affair that include quite a lot of stuff. It was quite cramp at 2 days, so moving to 3 days will allow us to do a lot more and do more interesting stuff.
I’ll be giving the first 3 days RavenDB Course in about two weeks in London. Currently this is what I have:
Can you think of additional stuff that you’ll like me to cover?
Raven’s Storage: Building a Sorted String Table
Applying the lessons from the leveldb codebase isn’t going to be a straightforward task. I already discussed why we can’t just use leveldb directly, so I won’t go into that.
I decided to see if I can implement a leveldb inspired storage for RavenDB. And the first thing that I wanted to try is to build an SST. That seems like an easy first step. SST are just sorted data in a file, and it require nothing much to write to them. As mentioned, we can’t just do a line by line port of the code. As easy as that would be. The way leveldb manages memory is… not really suitable for what we can / should do in .NET.
- I want this to be an efficient .NET implementation.
- It is inspired, but not intended to be compatible with leveldb.
The leveldb API deals with byte arrays all over the place. This makes sense for a C++ codebase, but it is horrible for a .NET application, especially when we expect a lot of our values to be large enough to hit the LOH. That means fragmentation, and pain down the road. Instead, our API uses ArraySegment<byte> for keys, and Stream for values. Since keys are expected to be relatively small, I don’t foresee this being a problem. And the values are streams, so they are easily handled without introducing any cost from the API.
Another thing that leveldb does quite a lot is batch things in memory for a while. It may be the current block, it may be the current data block, it may be the index block, but it does so quite often. That works nicely for C++ apps with expected small values, but not so much for our expected use case. So I want to avoid as much as possible holding items in managed memory. Here is the API for creating an SST:
1: var options = new StorageOptions();2: using (var file = File.Create("test.sst"))3: using(var temp = new FileStream(Path.GetTempFileName(),FileMode.CreateNew,FileAccess.ReadWrite,4: FileShare.None, 4096, FileOptions.DeleteOnClose | FileOptions.SequentialScan))5: {6: var tblBuilder = new TableBuilder(options, file, temp);7:8: for (int i = 0; i < 100; i++)9: {10: var key = "tests/" + i.ToString("0000");11: tblBuilder.Add(key, new MemoryStream(Encoding.UTF8.GetBytes(key)));12: }13:14: tblBuilder.Finish();15: }
As you can see, we uses two streams here, one to actually write to the table, the second is a temporary stream that we use to write the index block while we are working, then we merged it back to the actual sst file. Note that after building the table, the temp file can be deleted (indeed, we marked is as delete on close, so that would automatically happen).
That part was easy, all it required was simple I/O for generating the file. The more interesting part is going to be reading the values out.
State of the Raven - Operational & Maintenance tasks and features sneak peaks
Reviewing LevelDB: Part XVIII–Summary
Well, I am very happy at the conclusion of this blog post series. Beside being one of the longest that I have done, this actually stretched my ability to count using roman numerals.
In summary, I am quite happy that I spent the time reading all of this code. The LevelDB codebase is really simple, when you grok what it actually does. There is nothing there that would totally stun a person. What there is there, however, is a lot of accumulated experience in building those sort of things.
You see this all over the place, in the format of the SST, in the way compaction is working, in the ability to add filters, write merging, etc. The leveldb codebase is a really good codebase to read, and I am very happy to have done so. Especially since doing this in C++ is way out of m comfort zone. It was also interesting to read what I believe is idiomatic C++ code.
Another observation about leveldb is that it is a hard core C++ product. You can’t really take that and use the same approach in a managed environment. In particular, efforts to port leveldb to java (https://github.com/dain/leveldb/) are going to run into hard issues with problems like managing the memory. Java, like .NET, has issues with allocating large byte arrays, and even from the brief look I took, working with leveldb on java using the codebase as it is would likely put a lot of pressure there.
Initially, I wanted to use leveldb as a storage engine for RavenDB. Since I couldn’t get it compiling & working on Windows (and yes, that is a hard requirement. And yes, it has to be compiling on my machine to apply), I thought about just porting it. That isn’t going to be possible. At least not in the trivial sense. Too much work is require to make it work properly.
Yes, I have an idea, no, you don’t get to hear it at this time
.
RavenDB’s transaction merging
I like reading interesting code bases. And in this case, I have been reading through the leveldb codebase for a while. Most of the time, I am concentrating on grokking how other people are doing things. And the knowledge that I glean from those is only relevant a while later.
In this case, I was able to apply leveldb’s knowledge in very short order. We got a complaint about the speed of RavenDB under high transactional writes. The customer complained that they were seeing about 100 – 200 writes/sec with multi threaded inserts, with a separate session per document. This simulate pretty well the behavior of a web application that does a write per request.
The problem was that we basically had many write requests all competing on the disk. Since all transactions need to call fsync, that meant that we were limited by the number of fsync that we could call on the physical medium (more or less, it is a bit more complex than that, but that works).
There really isn’t much that I can do about it when the transactions are sequential. But there might be when we have parallel transactions. Instead of make them wait for one another, I took a leaf from the leveldb codebase and decided to merge them. I re-wrote the code so we would use the following approach:
pendingBatches.Enqueue(pending); while (currentTask.Completed() == false && pendingBatches.Peek() != pending) { batchCompletedEvent.Wait(); } if (currentTask.Completed()) return pending.Result; lock (putSerialLock) { try { batchCompletedEvent.Reset(); var sp = Stopwatch.StartNew(); TransactionalStorage.Batch(actions => { int totalCommands = 0; PendingBatch currentBatch; while (totalCommands < 1024 && // let us no overload the transaction buffer size too much pendingBatches.TryDequeue(out currentBatch)) { batches++; if (ProcessBatch(currentBatch) == false) // we only go on committing transactions as long as we are successful, if there is an error, // we abort and do the rest in another cycle break; actions.General.PulseTransaction(); totalCommands += currentBatch.Commands.Count; } }); } finally { batchCompletedEvent.Set(); } }
As you can see, this code is very similar to the leveldb one. We queue the transaction to be executed and then we check if we are the first in the queue. If so, we will execute that transaction and continue executing all available transactions.
The key here is that because we merge those transactions, we can benefit from only calling fsync once, at the end of the global operation.
This code is nice because it allows us to take advantage on load on our system. The more, the more efficient we can batch things. But at the same time, if there isn’t any load, we don’t care.
Note that I limited the amount of work that can be done in merged transactions, because we don’t want the first transaction, the one that is actually doing the work for all of the others, to wait for too long. This is a really pretty way of doing this, especially since it doesn’t even require a background thread, which is how I usually solved this issue.
Oh, and the results?
On my machine, without this change, we get about 400 writes / second. Afterward, with 25 threads, we get over 1,100 writes / second.
Reminder: RavenDB Webinar Tomorrow
We will cover:
- New features for JS Scripting
- Replication topologies
- How to handle maintenance tasks using RavenDB
- Sneak peek at reporting queries
You can register here: https://www2.gotomeeting.com/register/118743562
Searching for a lease in time & space
For some reason, there are a lot of real estate / rental people using RavenDB. I can tell you that I did not see that coming. However, that does bring us some interesting decisions.
In particular, at one client, we had the need to search for a lease. Searching for a lease can be done on one of many interesting properties. For example, the unit number, or internal code, or by the leaser name.
And here we got an interesting bug report.
Jane Smith leased an apartment from us at Dec 2012. At Feb 2013, she got married and changed her name to Jane Smith-Smyth. We need to allow searches on both names to find the appropriate lease.
Now, remember, you can’t go and change the lease document. That is a legal document that is frozen. Any change to that would invalidate it. (To be rather more accurate, you can modify the document, but there are certain fields that are frozen after the lease is signed.)
Luckily, this was really easy to do, using RavenDB’s referenced document feature:
1: from lease in docs.Leases2: select new3: {4: Leasers = lease.Leasers.Select(x=>x.Name)5: .Union(lease.Leasers.Select(x=>LoadDocument(x.Id).Name))6: .Distinct()7: }
And now we can support changes in the names, while maintaining the sanctity of the frozen fields.
Sadly, this is still not enough. And we actually need to keep track of all of the names that the leaser had during the leasing period.
Jane Smith-Smyth decided that it is a stupid name and changed her name to Jane Smite.
Now we need to support multiple names per leaser, while at the same time we have the current name for the lease. It looks like this:
1: from lease in docs.Leases2: select new3: {4: Leasers = lease.Leasers.Select(x=>x.Name)5: .Union(lease.Leasers.SelectMany(x=>LoadDocument(x.Id).Names))6: .Distinct()7: }
I highlighted the required changes
.
Exporting Data from RavenDB, the new way
In RavenDB 2.5, we provide an easy way to grab all the data from the database, regardless of the usual paging limit.
I had the chance to experiment with that recently during a course. Here is the code we wrote. For fun, we made it use the async API:
I am pretty happy about that. We have stuff that streams all the way from the ravendb to the end client.
Querying your way to madness: the Facebook timeline
Facebook certainly changed the way we are doing things. Sometimes, that ain’t always for the best, as can be seen from the way too much time humanity as a whole spend watching cat videos.
One of the ways that Facebook impacted our professional lives is that a lot of people look at that as a blue print of how they want their application to be. I am not going to touch on whatever that is a good thing or not, suffice to say that this is a well known model that is very easy for a lot of users to grasp.
It is also a pretty hard model to actually design and build. I recently had a call from a friend who was tasked with building a Facebook like timeline. Like most such tasks, we have the notion of social network, with people following other people. I assume that this isn’t actually YASNP (yet another social network project), but I didn’t check too deeply.
The question was how to actually build the timeline. The first thing that most people would try is something like this:
1: var user = session.Load<User>(userId);2: var timelineItems =3: session.Query<Items>()4: .Where(x=>x.Source.In(user.Following))5: .OrderByDescending(x=>x.PublishedAt)6: .Take(30)7: .ToList();
Now, this looks good, and it would work, as long as you have small number of users and no one follows a lot of people. And as long as you don’t have a lot of items. And as long as you don’t have to do any additional work. When any of those assumption is broken… well, welcome to unpleasantville, population: you.
It can’t work. And I don’t care what technology you are using for storage. You can’t create a good solution using queries for something like the timeline.
Nitpicker corner:
- If you have users that are following a LOT of people (and you will have those), you are likely to get into problems with the query.
- The more items you have, the slower this query becomes. Since you need to sort them all before you can return results. And you are likely to have a LOT of them.
- You can’t really shard this query nicely or easily.
- You can’t apply additional filtering in any meaningful way.
Let us consider the following scenario. Let us assume that I care for this Rohit person. But I really don’t care for Farmville.

And then:

Now, try to imagine doing this sort of thing in a query. For fun, assume that I do care for Farmville updates from some people, but not from all. That is what I mean when I said that you want to do meaningful filtering.
You can’t do this using queries. Not in any real way.
Instead, you have to turn it around. You would do something like this:
1: var user = session.Load<User>(userId);2: var timelineItmes = session.Query<TimeLineItems>()3: .Where(x=>x.ForUser == userId)4: .OrderBy(x=>x.Date)5: .ToList();
Note how we structure this. There is a set of TimeLineItems objects, which store a bit of information about a set of items. Usually we would have one per user per day. Something like:
- users/123/timeline/2013-03-12
- users/123/timeline/2013-03-13
- users/123/timeline/2013-03-14
That means that we get well scoped values, we only need to search on a single set of items (easily sharded, with a well known id, which means that we can also just load them by id, instead of querying for them).
Of course, that means that you have to have something that builds those timeline documents. That is usually an async process that run whenever you have a user that update something. It goes something like this:
1: public void UpdateFollowers(string itemId)2: {3: var item = session.Include<Item>(x=>x.UserId)4: .Load(itemId);5:6: var user = session.Load<User>(item.UserId);7:8: // user.Followers list of documents with batches of followers9: // we assume that we might have a LOT, so we use this techinque10: // to avoid loading them all into memory at once11: // http://ayende.com/blog/96257/document-based-modeling-auctions-bids12: foreach(var followersDocId in user.Followers)13: {14: NotifyFollowers(followersDocId, item);15: }16: }17:18: public void NotifyFollowers(string followersDocId, Item item)19: {20: var followers = session.Include<FollowersCollection>(x=>x.Followers)21: .Load(followersDocId);22:23: foreach(var follower in followers.Followers)24: {25: var user = session.Load<User>(follower);26: if(user.IsMatch(item) == false)27: continue;28: AddToTimeLine(follower, item);29: }30: }
As you can see, we are batching the operation, loading the followers and batched on their settings, decide whatever to let that item to be added to their timeline or not.
Note that this has a lot of implications. Different people will see this show up in their timeline in different times (but usually very close to one another). Your memory usage is kept low, because you are only processing some of it at any given point in time. For users with a LOT of followers, and there will be some like those, you might want to build special code paths, but this should be fine even at its current stage.
What about post factum operations? Let us say that I want to start following a new user? This require special treatment, you would have to read the latest timeline items from the new user to follow and start merging that with the existing timeline. Likewise when you need to delete someone. Or adding a new filter.
It is a lot more work than just changing the query, sure. But you can get things done this way. And you cannot get anywhere with the query only approach.
RavenDB Indexing: An off the cuff stat
I am currently teaching a RavenDB Course, and we were just talking about indexing. In particular, Search Indexes, like the one below:
After we defined this guy, we took a look at the stats.
As you can see, indexing 1 million documents took just over 2 minutes (full text support enabled). More interesting, you can see how we rapidly increased the number of items that we indexed to finish indexing everything faster.
Quite nice.
RavenDB signs of maturity?
- We are working on Excel Integration with RavenDB.
- I just told a customer, that is a bug, but you can set this config value and it will fix this issue.