Ayende @ Rahien

filter by tags archive

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Jun 20 2017

Reviewing Noise Search EngineSummary

time to read 5 min | 835 words

Tweet Share Share 1 comments

Tags:

reviews

After the last three parts, I think that I have a good high level understanding of what is going on in the Noise codebase, as a reminder, I’m reviewing 48d808375e541eca383826ea928272976bce064d.

I love reading code, especially stuff that I’m not familiar with, such as a new platform or doing something novel. The Noise codebase is quite interesting, and it is a very early stage in the project. That said, when looking at what it is doing and comparing to what I’m doing in my day job, a few things popped up.

I couldn’t help but think that this is a very similar model to the default Azure DocumentDB. All the data is indexed, and the values are shredded as they go in. The model you use was very similar. The nice thing about this kind of model is that it gives you a lot of freedom and flexibility, but the downside is that if you need more, there aren’t any alternatives. Storing large JSON documents inside Noise is likely to cause problems, especially if you are typically working with the full document. If you are just throwing things into it and then pulling back pieces, especially if the documents you write are relatively small, this is likely to be much more suited.

Noise is making some clever usage of the LSM nature of RocksDB to do cleanups, and I really like the elegance of the indexing engine. The query model is also interesting, in that it gives you a lot of freedom to operate. Select the right fields, define the appropriate aggregation, etc. But queries are limited (no computation) and aggregation queries are going to run with cost of O(N) with respect to their results. Given that this is intended to be embedded, and apparently into low end devices, that might very well be the right decision.

A side affect of the shredding is that you need to reconstruct documents all the time, since they were broken into component parts. Things like scanning results by one field while returning another looks like they can generate O(N * log N) number of queries in RocksDB. I’m not sure how much that can scale with regards to queries. On the one hand, there are very clear layers between the pieces of the code, which is great, but that also have some built in issues with regards to how much performance you can squeeze out of the system. I don’t have a feeling for Rust code performance, nor do I think that the project is ready at this early stage for performance analysis, but it is a concern. Then again, I did just spent the past two years head down in a major performance work, so that might very well be the reason. And it is entirely possible that the code will perform significantly better when the project is done.

One last thing to note, when I tried building a replacement to Lucene one of the annoying thing to pop up was that it is really hard to replace Lucene efficiently and still be better in term of time / space / complexity. Noise in particular is storing all the data at least twice (once as terms to be searched, once as the actual document data). This make is very easy to do certain operations (such as update a document), but it also means that the cost of updating a document is now doubled. We first need to index the new document, then we need to re-index old document, compare between the two and then write the changes. I guess that it is possible that reducing the amount work to be written would help matters, but the cost of indexing is likely to dominate.

I would expect a Noise index to be significantly larger than the Lucene equivalent, because Lucene does a lot to reduce file size (in quite evil & nasty ways, to be fair).

Another thing that I don’t think that is good is the reliance on a query language in an embedded solution. A much better idea in my option would be to expose the raw documents out and let the client handle it. If it is running in the same memory space, there is no extra cost. This is because the moment that this open up a huge number of features / issues. I’m talking specifically about the notion of “return” here. The idea is really cool, you can project exactly what you want from the document, and that allows the engine to only pull the relevant data.

At any rate, this is really getting late now, so I’ll close with the note that Noise is a really cool codebase to read. Even if you don’t know Rust, going ahead and reading something non trivial is a really good exercise in learning new things. And I’m really interested in what Noise is going to do next.

Jun 19 2017

Reviewing Noise Search EnginePart III

time to read 5 min | 837 words

Tweet Share Share 0 comments

Tags:

reviews

After the previous two parts, I think that I have a good high level understanding of what is going on in the Noise codebase, as a reminder, I’m reviewing 48d808375e541eca383826ea928272976bce064d.

Now I’m reading through the JsonValue file. The first interesting thing here is:

Given all the code I have read so far, I think that this is a really elegant way to handle JSON, both in terms of how you can define such a recursive structure in 7 lines of code and in how this allow you to express relatively complex scenario quite nicely.

One thing that bugs me is that there are a lot of things like this:

This is done to have enough space for escaping. The problem is that this waste quite a bit of space. The last few blog posts I had were about 2KB of text each. Imagine doing a batch of a few thousands of them, just this types of calls can take up a lot of extra memory. Now, I’m used to managed memory, where this kind of thing hangs around and need to be collected. This code is in Rust, and it will clean itself up as soon as it is out of scope. But given that we hold on to it for a non trivial amount of time, I think that it is a concern. It might actually be cheaper to go over the string multiple times to figure out exactly how many escape chars we need.

One of the first optimizations we did when we wrote our own JSON parser was get rid of the costly escape char handling. It took literally more time than anything else, because we kept having to scan the string and break it over and over again.

Another issue that popped up is here:

A silent assumption here is that the order of properties in the two objects is the same. Two identical objects that have different property order will sort unequal. That is likely to be a surprise to the user.

The next file is parser.rs, which is over 1,300 lines of code (almost 20% of the entire codebase of Noise!), this is pretty typical parsing code, low level, nasty and quite tedious. I skimmed it briefly, but I don’t think that there is anything really important going on there. Its main purpose is to take a query string and produce a filter, next.

The next file is query.rs and I’m going to assume that this will lead me to also jump to filter.rs which I previous skipped.

The code here start using terms that I’m familiar with from Lucene. We’ll start where the money is, in Query::get_matches, which starts by building the query itself:

Basically, create a snapshot of the index, parse the query. We need to pass the snapshot to the parser so it can pass it on to the filters that it builds internally. The order of operations in this code also determines the required order of appearances of the clauses in the Noise query syntax.

The rest of the method goes over the query and build it, setting up ordering, aggregation and such. There is close cooperation between query.rs, filter.rs and snapshot.rs. The query is a high level concept of everything that must happen, filter is a much lower level, just finding the right stuff. On top of that the query layer boosting, scoring, ordering and aggregation. The filter uses the snapshot to access the stored information.

For example, the AllDocsFilter is using the snapshot.new_all_docs_iterator, which is defined as:

Basically, scan all the values with the prefix of S, which I think stands for sequence number.

More complex stuff can be seen in the Scorer, as a good example. We have terms prefixed with C, which I think stands for Count, which gives the number of times that a particular term appear in a particular position. And K is the number of times a particular property has shown up in general. This seems to be the basis of tf-idf calculation. Those are merged together using RocksDB merge command and the custom merge policy that we looked at in the index.rs.

At this point I looked at the clock and realized that this is now after midnight and that I got to the most interesting parts that I wanted to see, the rest are pretty familiar details, since a lot of this follows the Lucene construction.

I’m going to have another blog post about my thoughts about this project, which will show up tomorrow.

Jun 16 2017

Reviewing Noise Search EnginePart II

time to read 6 min | 1169 words

Tweet Share Share 0 comments

Tags:

reviews

In my previous post, I started going over the Noise project code. We got to the indexer, but there is still a lot to do, as a reminder, I’m reviewing 48d808375e541eca383826ea928272976bce064d.

The indexer code handed off a lot of work to the shredder, so it is a good thing that this is my next target for review. At a glance, it looks like Noise is using rustc_serialize to do the actual JSON parsing. I’m guessing it make sense at this point, but given that RavenDB 4.0 had rolled its own JSON parser, and given that Noise is accepting the data as strings, I’m going to assume that JSON parsing costs are going to be high on any profiler output.

The first thing to see in the file is the shredder struct:

The existing_key_value_to_delete is very interesting, since it implies that this is how deletes & updates are handled.

It looks like this is working really closely with the key builder. The first method on the file is:

There are a whole bunch of stuff like that, with roughly the same pattern (add_bool_null_entries, add_stemmed_entries, etc). Because of the close cooperation of those two, I’m going to be reading them in parallel. The KeyBuilder struct looks like:

At this point, I’m guessing that what is going on is that the shredder goes over the JSON document, and add the property path to the KeyBuilder. But given add_number_entries, I don’t know see how that would work. We are adding the document sequence number to the end of the key path, and store the value of the property in the value. That is… strange. The KeyBuilder data contains the property paths and the current position in the array (possibly nested), so that much is clear, its usage, not so much at this stage.

Because this is confusing, I’ll start from the shred function and go from there, to see what it is exactly that this is going on.

This takes the string, pass it to the JSON parser and start iterating over it. Okay, going over the code, that seems easier to follow now. This parse the JSON one token at at time. When it encounters a property, it add that to the KeyBuilder. So the KeyBuilder is basically the path into the current value in the document that we are parsing.

This bugs me, though:

I’m following what is going on, and it makes perfect sense. But all those special key prefixes that are scattered are not documented anywhere that I could see, and as I mentioned previously, single letter prefixes are hard on readability and I don’t think they add much, given that RocksDB is using prefix compression anyway.

There is already a PR to fix this, which is likely to be applied by the time you are reading this blog post, by the way. This change the prefixes to be constants, improving code readabily.

A lot of the functionality seems to be inside maybe_add_value, see the call sites:

It looks like all the numbers are recorded as floats (given that JSON use double precision IEEE floating point, I guess that make sense). You can see the different prefixes that are given for the values.

The tests at the bottom of the file completes the picture, and if I understand this properly, given the following JSON:

{ "name":"John", "age":31, "city":"New York", “hobbies”: [“reading”, “dogs”] }

The shredder is going to output the following entries, given a document sequence of 555:

W.age!f31#555,

W.hoobies$!dogs#555,1

W.hobbies$!reading#555,0
W.name!john#555,

I think that the code calls this key path, and the general format is (white space added:

W <property path> ! <value> # <document sequence> , <array path>

A critical observation is that an array is denoted as a $ in the property path, and there are as many indexes in the array path as there are $ signs in the property path. I’m currently ignoring the full text search aspect of things, since it is not important for our purposes. With the understanding of what is being generated, let’s move to seeing what is actually going on.

Let us go back to merge_existing_doc, and see how that works. This is called after we shredded a document, so we have done all of the interesting work with regards to parsing the document and building the tokens, splitting the data, etc. At this point, we check if we already have this document, and if so, we load all the existing values and compare them to the current version. The nice thing is that we get to optimize. If a certain value didn’t change, we don’t need to do anything at all. On the other hand, if a value was there and is no longer in the current version, we need to delete it. That is effectively how updates are handled.

With deletes, we do the same, but in reverse, we reconstruct the document, find all the indexed values, and then we remove it. Finally, we have add_all_to_batch, which take all the in memory work that was done and apply that to the RocksDB batch, pretty much finishing all the work. I don’t like that some methods are holding into the state (shredded_key_values / existing_key_value_to_delete) then flush it and some are manipulating the RocksDB batch directly, it seems like there should be either or kind of approach.

Coming back to the issue that I had with the compaction routine, we have this code, that add a number to the entries:

Now, if I’m reading things properly, if the value is zero (not uncommon by any means), that will result in 8 bytes of zeros being written as the value. Combine this with this:

And now I think that this will cause that value to be removed during compaction, resulting in data corruption. No idea if this is actually the case, since I’m just going over the code, and not running / testing it thoroughly.

It turns out that I’m not reading this properly (pun intended). I missed the negation on the first line of compaction_filter. The C & K values are the only things that will be removed during compaction, not the other way around, so no data corruption, just me remember that there is a reason I use “== false” instead of “!”, because I keep missing them.

Jun 15 2017

Reviewing Noise Search EnginePart I

time to read 11 min | 2097 words

Tweet Share Share 0 comments

Tags:

reviews

I run across this post by Damien Katz, and was immediately intrigued. That has several reasons. The last time I did a deep review of Damien’s code was about a decade ago, when I went over the CouchDB source code, in an attempt to learn Erlang. What end up happening was that I wrote RavenDB.

Note: This is pretty much my thoughts as I’m going through the code, it isn’t meant to be a guide or anything except a record of what went through my head as I read the code.

Another reason is that this is obviously a subject matter that is near & dear to my heart. Searching is something that we do all the time, and I would dearly love to read some code that isn’t Lucene to do so. Finally, the code is written in Rust, which I have some passing interest at, so I consider this a win all around. You can find the project site here and the code is here. I’m reviewing 48d808375e541eca383826ea928272976bce064d.

Some things that are obvious from the get go, just by skimming through the docs. This is a small project, less than 10,000 lines of code. That is great, since it means that I can review it in one sitting. I also don’t really care that much for the parser and format that they do, I’m mostly interested in how they are storing and searching the data. The storage is done via RocksDB, which is interesting, because it impose certain limitations on the solution (in particular, single key space) so it is interesting how the data is stored. Most importantly, I really want to see how this is handling updates, which is what I find to be the hardest problem for search impl.

As usual, I’m going to start reading code in file name order, without any relation to the current structure of the code, and jump around from there.

In aggregates.rs, we find this piece of code, which allows to run an aggregation. The init / action / extract are the set of operations that happen in the aggregation process.

Right now I don’t know what JsonValue is, but it is interesting to read just from the signatures. The action function can get a reference to a mutable instance, an immutable instance and a reference to an immutable instance. I’m assuming that the mutable instance is the one that is being modified to hold the aggregated value.

The aggregation itself is implemented by defining an enum that has a get_fun_impls method that initialize the AggregateFunImpls based on the specific aggregation value required.

Sum is defined as two functions, sum_init and sum, without any final step. The actual implementation is:

That is fun to read, because it pretty simple, although the syntax in the if let line is really confusing to me (first time in the code, non Rust dev). Basically, it says that we’ll take the JsonValue instance, convert it from a JsonValue to a number instance, and then set it to the same variable name as the original one.

If I was writing it in C it would probably look like:

JsonValue* existing = …;
int* existing = extract_number_(existing);
existing += new;

That is leaving aside the fact that the “new” that is used here and the “new’ in the function arguments are not the same (well, they are conceptually, but not the same type, etc). I find this confusing, but I know that this is fairly common in Rust code.

The rest of the file is pretty similar constructs for other aggregation operations, and the next is error.rs, which is just boilerplate code that Rust requires from anything. The next file is filters.rs, which seems really interesting, but it doesn’t make sense at the current stage of the code, because it refers to a lot of other things that I haven’t checked yet, so moving to the next one, index.rs. Hopefully that will start the juicy bits.

Indeed, the interesting bits start when we call open, where we open up a rocksdb instance and start playing around. There is some interesting error handling here:

In particular, note that if the error failed because the database didn’t exists, it is created, and a header version is added. I wholeheartedly approve, because such early planning will pay dividends down the road. It is important to note that assert statements in Rust are going to be there for release. Only debug_assert will be removed in release builds.

The index has the ability to create a snapshot, based on the RocksDB Snapshots. I wrote a blog post on how that works on LevelDB (which RocksDB is based on).

I do wonder about the unwrap call here. If I understand correctly, this would cause a panic if I’m access an instance that isn’t open. That might be bad if you have a query hitting the server during shutdown, I’m guessing. I assume that this is oversight because it make the code easier to consume.

The add method there is where stuff really start to happen.

This is quite interesting, because you pass it a json string, and a batch (a Noise concept that basically wraps a RocksDB batch).

Actually, thinking about this, there is something that might be trippy here. RocksDB allow concurrent batches, but while it has support for optimistic concurrency, I’m guessing (at this point it is a guess) that this isn’t used by Noise, and that can lead to severe problems if you are trying to add to the index from concurrent threads. This is important because parallel indexing is pretty much the first optimization step that anyone will try, and Lucene calls it out explicitly as a best practice.

The json is parsed by a class called shredder, I’m assuming that this is the JSON parser used by Noise.

The actual behavior is quite interesting.

We create a new shredder, then set it on the JSON string. The “let … if let” syntax gave me a pause for a bit. It is elegant, in a crazy sort of way.

But the really interesting bit here is that this code says quite a lot about the way Noise works. First, we now know that Noise has stable document ids, since we can see it being used when we check if the document is already in the index. That means that there is update support, which is pretty cool. Although I do wonder why the values are merged back in. For deletion, or are they just creating a large piece?

If there is no id, Noise uses a guid. I would usually call out that this might be a problem (because down the line, given enough work, it will create a lot of compaction work around the files), but I’m suspecting that not specifying an id is going to be a pretty rare occurrence, so that isn’t important.

Finally, we add it to the batch:

That means tat the shredder is probably a very important piece in the system. I’ll continue to read the index code for now and see what I can figure out, before jumping to that code.

Next, we have delete, and the key part of that method is:

So we have delete support, and comparing that to the update process is very interesting. They are different, which means that updates aren’t just delete/insert. That is interesting.

Next we have gather_doc_fields, which gets a string document id. What happens from there is interesting. Let us assume that our document id is “users/1”.

Noise searches the RocksDB for a value named “Iusers/1” <- Note that I prefix, I assume to stand for id. That happens in fetch_seq. If there is such a key, its value would be a string representation of int64, which is parsed and returned. That seq is the internal sequence number, I’m assuming. And I’m guessing it is used for most operations. Let us assume that the seq number returned in 1234.

gather_doc_fields then do another RocksDB search for “V1234#”. This is done via a KeyBuilder, which Noise is using to generate keys based on the depth of the search, I’m guessing. Haven’t really looked into that yet.

The code to collect the keys for a document sent me into a bit of a wild goose chase, in the sense that I couldn’t figure out what was going on here:

RocksDB is a key/value database, it doesn’t support multiple values per key. Took me a while to figure out that this is a (I’m guessing inefficient) way to to clone the value from the database owned memory to memory controlled by the Noise.

The next method, flush, just update the internal data (version and last sequence number) and submit the write batch. That concludes it as far as I’m concerned, unless something really strange is going on, indexing in Noise must be done in a single threaded manner. Note that there isn’t anything wrong with that (this is what we ended up doing in RavenDB 4.0), but it is something that needs to be stated out load, because it is contrary to the expected practice. Then again, I haven’t seen anything on indexing speed or any sign of benchmarking yet, maybe the team didn’t get there at this point.

The last thing on this file that is of interest are the RocksDB management functions. The compaction_filter – Will keep any key that starts with K or C, but if the value doesn’t start with them, then the value is interpreted as int32, and if it is 0, it is discarded. This is interesting, because HDB (the header for the db) key is not one of the protected values, and so the reason it isn’t thrown out is that Noise is using little endian value here, and accidently the int64 value (1 in this case) is interpreted as non zero int32. I’m assuming that the same holds true for the rest of values, but it seems more like accidental behavior. There is also the fact that we already know that there are some values (sequence numbers) whose value is a stringified int64. That also will pass the “not zero int32”, but even if it works, it looks wrong.

So basically, we have a few more key prefixes, which have a certain meaning, yet unknown. But if they are those values, they should be compared using the KeyBuilder comparison, otherwise, just compare them as is. I don’t like the single letter prefixes. RocksDB uses prefix compression, so I don’t think that using more readable prefix would cost anything.

Finally, we have the sum_merge, which is called by RocksDB when you make a Merge call. This is an interesting approach to handling updates, and what this basically does in the case of Noise is that it allow to merge multiple calls to the same value and return the sum total of them. I’m assuming that the value used here is the ref count or some such. Given the interplay between sum_merge and compaction_filter.

That is it for the index.rs file. There are actually tests there for the index, which is interesting. I don’t like having tests in the same file, but I think it is common in Rust.

The only thing of note there is that we find some interesting information about the format of the keys:

The document:

Is translated into the following:

That is the job of the KeyBuilder, I think, but I’ll leave digging into that to another post.

Jun 14 2017

The case of the hanging request in the cache

time to read 2 min | 351 words

Tweet Share Share 4 comments

Tags:

raven
bugs

A user just reported that after a short time of using RavenDB, all requests started to hang and eventually timed out. We are talking about making several dozen requests in the space of a minute or so, as a user operating against a web application.

That was a cause of concern, obviously, and something that was very strange. Luckily, the user was able to send us a stripped down version of his application and we were able to reproduce this state locally. On that particular application only. We couldn’t reproduce this anywhere else.

In fact, we fixed the problem and still don’t know how to reproduce it. The underlying issue is that we are making a GET request using HttpClient and specifying HttpCompletionOption.ResponseHeadersRead in the SendAsync method. Now, the server might return us a 304 response, in which case we’ll use the cached value we have. If we make a few requests like that, all future requests will hang. In order to fix this problem we need to dispose the response, or read the (empty) content of the response.

Now, the crazy thing is that I can probably speculate pretty accurately on why this is going on, we have fixed it, but I can’t reproduce this problem except within the application the user sent us. So no independent reproduction.

The likely issue is that the HttpClient considers the request open until we read all its content (even if there is non), and that as we make more requests that have the same 304 and therefor no access to the content of the request, we have more and more requests that are considered to be open, up to the maximum number of requests that are allowed against the server concurrently. At this point, we are waiting for timeout, and it look like we are hanging.

All very reasonable and perfectly understandable (if annoying because you only need to dispose it sometimes is a trap waiting to happen), but I don’t know why we couldn’t reproduce it.

Jun 13 2017

RavenDB 4.0The etag simplification

time to read 2 min | 357 words

Tweet Share Share 5 comments

Tags:

A seemingly small change in RavenDB 4.0 is the way we implement the etag. In RavenDB 3.x and previous we used a 128 bits number, that was divided into 8 bits of type, 56 bits of restarts counter and 64 bits of changes within the current restart period. Visually, this looks like a GUID: 01000000-0000-0018-0000-000000000002.

The advantage of this format is that it is always increasing, very cheap to handle and requires very little persistent data. The disadvantage is that it is very big, not very human readable and the fact that the number of changes reset on every restart means that you can’t make meaningful deduction about relative sizes between any two etags.

In RavenDB 4.0 we shifted to use a single 64 bits number for all etag calculations. That means that we can just expose a long (no need for the Etag class) which is more natural for most usages. This decision also means that we need to store a lot less information, and etags are one of those things that we go over a lot. A really nice side affect which was totally planned is that we can now take two etags and subtract them and get a pretty good idea bout the range that needs to be traversed.

Another important decision is that everything uses the same etag range. So documents, revisions, attachments and everything share the same etag, which make it very simple to scan through and find the relevant item just based on a single number. This make it very easy to implement replication, for example, because the wire protocol and persistence format remain the same.

I haven’t thought to write about this, seemed like too small a topic for post, but there was some interest about it in the mailing list, and enumerating all the reasons, it suddenly seems like it isn’t such a small topic.

Update: I forgot to mention, a really important factor of this decision is that we can do do this:

So we can give detailed information and expected timeframes easily.

Jun 12 2017

RavenDB 4.0Data subscriptions, Part II

time to read 3 min | 493 words

Tweet Share Share 4 comments

Tags:

raven

In my previous post I introduced data subscription and mentioned that there is more to it than just being able to get push based reliable stream of documents as they change. The problem is that RavenDB will send us the current document, and if the document has been modified multiple times quickly enough, we’ll only get it once. What is more, we are getting the document in our client code, but all we know is that it was changed, no what changed.

With RavenDB 4.0 we now have versioned subscriptions, working alongside the versioning feature. First, we define that a particular collection will have versioning enabled:

And now we can make use of versioned subscriptions.

In this case, you can see that we make use of Versioned<User> type, which indicates to RavenDB that we are interested in a versioned subscription. Instead of sending us just the modified document, we’ll get both the Previous and Current version of the document. In fact, we’ll be called with the Previous / Current version of the document on each change. You might have noticed the null checks in the subscription code, this is because when a document is created, we’ll get it with null Previous value and when a document is deleted, we’ll get it with a null Current value. If the document has been deleted and recreated, we’ll be called first with a Previous instance and null Current and then null Previous and a Current instance.

In other words, you are now able to track the entire history of a document inside RavenDB, and make decisions based on that. Like regular subscriptions, we have the ability to script a lot of the logic, like so:

What this subscription will do is to analyze all the changes on a user, and then send us the user document as it was banned.

It is important to note that this doesn’t require you to be monitoring the subscription as it happens, you can do this at any point, and you’ll get the historical data. For that matter, this is also a high available solution. If a client goes down, it (or another client) can resume from where it left off, and if the server goes down, the client will transparently be able to failover to a replica without any user or admin involvement, running from where it left off.

We only started looking into the implications of this feature, but the potential for analytics on the changes is already quite obvious. We are going to send you the data in the order it was generated, so you can build a model of changes as it make sense in your domain, without having to go to the trouble of manually keeping track of everything.

Jun 09 2017

RavenDB 4.0Data subscriptions, Part I

time to read 3 min | 516 words

Tweet Share Share 10 comments

Tags:

raven

I’ll be talking about this feature more once the UI for it is complete, but this feature just landed in our v4.0 branch and it is so awesome that I can’t help talking about it right away.

In RavenDB 4.0 we have taken the idea of subscriptions and pushed it up a few notches. Data subscriptions gives you a reliable push based method to get documents from RavenDB. You setup a subscriptions, and then you open it and RavenDB will stream to you all the documents that are relevant to your subscription. New documents will be sent immediately, and failures are handled and retried automatically. Subscriptions are a great way to build all sort of background jobs.

In RavenDB 3.x their main strength was that they gave you a reliable push base stream of documents, but in RavenDB 4.0, we decided that we want more. Let us take it in stages, here is the most basic subscription usage I can think of:

This is subscribing to all User documents, and RavenDB will first go through all the User documents, sending them to us, and then keep the connection alive and send us the document whenever a User document is updated. Note that we aren’t talking about just a one time thing. If I modify a document once an hour, I’ll be getting a notification on each change. That allows us to do hook this up to jobs, analytics, etc.

The really fun thing here is that this is resilient to failure. If the client maintaining the subscription goes down, it can reconnect and resume from where it left off. Or another client can take over the subscription and continue processing the documents. In RavenDB 4.0, we now also have high availability subscriptions. That means that if a server goes down, the client will simply reconnect to a sibling node and continue operating normally, with no interruption in service.

But you aren’t limited to just blinding getting all the documents in a collection. You can apply a filter, like so:

In this manner, we’ll now only get notified about active users, not all of them. This filtering allows you to handle some really complex scenarios. If you want to apply logic to the stream of changed documents, you can, getting back only the documents that match whatever logic you have in your document.

But the script can do more than just filter, it can also transform. Let us say that we want to get all the active users, but we don’t need the full document (which may be pretty big), we just want a few fields from it.

In this manner, you can select just the right documents, and just the right values you need from the document and process them in your subscription code.

There is actually more, which I’ll post in the next post, but I’m so excited about this feature that I’m not even going to wait for the next publishing schedule and push this immediately. You can read it here.

Jun 08 2017

Migration strategies considerations for Dev–>UAT—>Production

time to read 5 min | 866 words

Tweet Share Share 2 comments

Tags:

Part of the reason for RavenDB was that I wanted a database that actually took into account how it is being used into account, and provided good support for common usage scenarios. Making the process of moving between Dev –> UAT –> Production easier is a big part of that.

Typically, databases don’t handle that directly, but let the users figure it out. You can see the plethora of SQL Schema deploying and versioning options that you have to deal with.

With RavenDB, for indexes in particular, we made the process very easy. Indexes are defined in code, deployed along side the application and are versioned in the exact same manner, in the exact same system.

But as the feature set of RavenDB grows, we need to consider the deployment scenario in additional places. We recently started talking about the development cycle of ETL processes, Subscriptions, Backups and external replication. The last two are fairly rare in development / UAT scenarios, so we’ll ignore them for now. They are typically only ever setup & used in production. Sometimes you test them on a dedicated instance, but it doesn’t make sense to deploy a backup configuration in most cases. External replication is basically just destination + credentials, so there isn’t really all that much to track or deploy.

ETL Processes and Subscriptions, on the other hand, can contain quite a bit of logic in them. An ETL process that feed into a reporting database might be composed of several distinct pieces, each of them feeding some part of the data to the reporting db. If the reporting needs change, we’ll likely need to update the ETL process as well, which means that we need to consider exactly how we’ll do that. Ideally, we want a developer to be able to start working on the ETL process on their own machine, completely isolated. Once they are done working, they can checkin their work into the code repository and move on to other tasks. At some future time, this code will get deployed, which will setup the right ETL process in production.

That is a really nice story, and how we are dealing with indexes, but it doesn’t actually work for ETL processes. The problem is that ETL is typically not the purview of the application developer, it is in the hand of the operations team or maybe it is owned by the team that owns the reports. Furthermore, changes to the ETL process are pretty common and typically happen outside the release cycle of the application itself. That means that we can’t tie this behavior to the code, unlike indexes, which have a pretty tight integration with the code that is using them, ETL is a background kind of operation, with little direct impact. So it can’t be tied to the application code like indexes is. Even with indexes, we have measures in place that prevent (lock) the index definition, so an administrator can update the index definition on the fly without the application overwriting it with the old version of the index.

Subscriptions are more of a middle ground. A subscription is composed of a client side application that process the data and some server side logic related to filtering and shaping it. On the one hand, it make a lot of sense for the subscribing application to control its subscription, but an admin that wants to update the subscription definition is a very likely scenario. Maybe as a result of a data change, or need input from the business. We can update the server side code without re-deployment, and that is usually a good idea.

To make matters a bit more complex, we also have to consider secrets management. ETL processes, in particular, can contain sensitive information (connection strings). So we need to figure out a way to have the connection string, but not have the connection string Smile . In other words, if I write a new ETL process and deploy it to production, I need to be sure that I don’t need to remember to update the connection string from my local machine to the production database. Or, much worse, if I’m taking the ETL from production, I don’t want to accidently also get the production connection string. That means that we need to use named connection strings, and rely on the developer / admin to set it up properly across environments.

I would really appreciate any feedback you have about how to handle this scenario.

Both ETL processes and Subscriptions are just JSON documents of not too much complexity, so actually moving them around between servers isn’t hard, it is the process of doing so that we are trying to flesh out. I should also mention that we are probably just going to make sure that there is a process to handle that, not something that is mandated, because some companies have very different deployment models that we need to fit into. This is us trying to think about the best way to handle the most common scenario with as little friction as possible.

Jun 07 2017

Inside RavenDB 4.0Chapter 3 is done

time to read 1 min | 96 words

Tweet Share Share 2 comments

Tags:

raven

I have just completed writing the 3rd chapter for Inside RavenDB 4.0, and the full book is available here (all three chapters of it). It is just under 100 pages in size, and I’m afraid that at this rate this is going to be a massive tome.

The content so far covers setting up RavenDB 4.0 and doing basic CRUD operations, then goes on to deal with modeling documents inside RavenDB.

Any and all feedback is welcome.

Oren Eini

Oren Eini

CEO of RavenDB

Reviewing Noise Search EngineSummary

Reviewing Noise Search EnginePart III

Reviewing Noise Search EnginePart II

Reviewing Noise Search EnginePart I

The case of the hanging request in the cache

RavenDB 4.0The etag simplification

RavenDB 4.0Data subscriptions, Part II

RavenDB 4.0Data subscriptions, Part I

Migration strategies considerations for Dev–>UAT—>Production

Inside RavenDB 4.0Chapter 3 is done

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed