Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:


+972 52-548-6969

, @ Q j

Posts: 6,633 | Comments: 48,370

filter by tags archive

I WILL have orderHow Lucene sorts query results

time to read 3 min | 537 words

In this series of posts, I am going to take a look at a single feature across several search engine libraries. Given three documents, sort them by State and then by City. This is a pretty trivial query, but there is a lot that is going on behind the scenes that needs to happen for this to actually work. Let’s look at how this is implemented, shall we?

The first library to look at it Lucene, because it is so prevalent. Here is the relevant code that I’m executing:

A key part of the way Lucene executes sorting is this piece of code:


As you can see, we ask the reader (a single file in a Lucene directory) to get a the list of field values and matches for a particular field.

In this case, what his means it that doc #0 has the value in lookup[2], doc #1 as well, and doc #2 has the value in lookup[1]. This means that when we compare, we can do it using the following code:


And this is called for each field independently, like so:


All of which is pretty simple and straightforward. There is a nice optimization here in the sense that in most cases, if the readerGen is the same, we can compare the ordinals directly, without comparing the actual string values.

The problem here is that we need to hold arrays. In particular, I’m talking about the FieldCache.GetStringIndex() (and it’s related friends). The way Lucene stores the values on disk means that on first read, it needs to reconstruct the terms from the index. Here is the core of the work that is done in GetStringIndex.

As you can see, this rips through the entire file, reading each term and then getting all the documents for a particular term. The code is quite clever, because we don’t need to compare anything, we know that we are sorted, so we can take advantage of that when detecting the ordinals.

What this code isn’t very helpful about, though, is the fact that this is allocating a lot of memory. In particular, it will allocate arrays with a value per each document in the index. On large indexes, these can be very large. The good thing here is that there is a good caching going on here, so you’ll typically not need to run this code all too often. The bad thing is that this runs per segment. If you have a lot of small index batches, you’ll have a lot of values like that floating around, and then it will get merged, and you’ll have to run through this again. This is also one of the primary reasons Lucene is limited to about 2.1 billion documents per index.

The good thing about it is that this is really flexible and give us a great performance when sorting.

So now that we know how Lucene does it, let’s look at other libraries.

Distributed compare-exchange operations with RavenDB

time to read 4 min | 754 words

RavenDB uses a consensus protocol to manage much of its distributed state. The consensus is used to ensure consistency in a distributed system and it is open for users as well. You can use this feature to enable some interesting scenarios.

The idea is that you can piggy back on RavenDB’s existing consensus engine to gain the ability allow you to create robust and consistent distributed operations. RavenDB exposes these operations using  a pretty simple interface: compare-exchange.

At the most basic level, you have a key/value interface that you can make distributed atomic operations on, knowing that they are completely consistent. This is great, in abstract, but it s a bit hard to grasp without a concrete example.

Consider the following scenario. We have a bunch of support engineers, ready and willing to take on any support call that come. At the same time, an engineer can only a certain number of support calls. In order to handle this, we allow engineers to register when they are available to take a new support call. How would we handle this in RavenDB? Assuming that we wanted absolute consistency? An engineer may never be assigned too much work and work may never be lost. Assume that we need this to be robust in the face of network and node failure.

Here is how an engineer can register to the pool of available engineers.

The code above is very similar to how you would write multi-threaded code. You first get the value, then attempt to do an atomic operation to swap the old value with the new one. If we are successful, the operation is done. If not, then
we retry. Concurrent calls to RegisterEngineerAvailability will race each other. One of them will succeed and the others will have to retry.

The actual data that we store in the compare exchange value in this case is an array. You can see an example of how that would look here:


Compare exchange values can be simple values (numbers, strings), arrays or even objects. Any value that can be represented as JSON is valid there. However, the only operation that is allowed on a compare exchange value is a wholesale replacement.

The code above is only doing half of the job. We still need to be able to get an engineer to help us handle a support call. The code to complete this task is shown below:

The code for pulling an engineer from the pool is a bit more complex. Here we read the available engineers from the server. If there are none, we'll wait a bit and try again. If there are available engineers we'll remove the first one and then try to update the value. This can happen for multiple clients at the same time, so we check whatever our update was successful and only return the engineer if our change was accepted.

Note that in this case we use two different modes to update the value. If there are still more engineers in the available  pool, we'll just remove our engineer and update the value. But if our engineer is the last one, we'll delete the value
entirely. In either case, this is an atomic operation that will first check the index of the pre-existing value before performing the write.

It is important to note that when using compare exchange values, you'll typically not act on read. In other words, in PullAvailableEngineer, even if we have an available engineer, we'll not use that knowledge until we successfully wrote the new value.
The whole idea with compare exchange values is that they give you atomic operation primitive in the cluster. So a typical usage of them is always to try to do something on write until it is accepted, and only then use whatever value you read.

The acceptance of the write indicates the success of your operation and the ability to rely on whatever values you read. However, it is important to note that compare exchange operations are atomic and independent. That means an operation
that modify a compare exchange value and then do something else needs to take into account that these would run in separate transactions.

For example, if a client pull an engineer from the available pool but doesn't provide any work (maybe because the client crashed) the engineer will not magically return to the pool. In such cases, the idle engineer should periodically check
that the pool still the username and add it back if it is missing.

Daisy chaining data flow in RavenDB

time to read 4 min | 685 words

I have talked before about RavenDB’s MapReduce indexes and their ability to output results to a collection as well as RavenDb’s ETL processes and how we can use them to push some data to another database (a RavenDB database or a relational one).

Bringing these two features together can be surprisingly useful when you start talking about global distributed processing. A concrete example might make this easier to understand.

Imagine a shoe store (we’ll go with Gary’s Shoes) that needs to track sales across a large number of locations. Because sales must be processed regardless of the connection status, each store hosts a RavenDB server to record its sales. Here is the geographic distribution of the stores:


To properly manage this chain of stores, we need to be able to look at data across all stores. One way of doing this is to set up external replication from each store location to a central server. This way, all the data is aggregated into a single location. In most cases, this would be the natural thing to do. In fact, you would probably want two-way replication of most of the data so you could figure out if a given store has a specific shoe in stock by just looking at the local copy of its inventory. But for the purpose of this discussion, we’ll assume that there are enough shoe sales that we don’t actually want to have all the sales replicated.

We just want some aggregated data. But we want this data aggregated across all stores, not just at one individual store. Here’s how we can handle this: we’ll define an index that would aggregate the sales across the dimensions that we care about (model, date, demographic, etc.). This index can answer the kind of queries we want, but it is defined on the database for each store so it can only provide information about local sales, not what happens across all the stores. Let’s fix that. We’ll change the index to have an output collection. This will cause it to write all its output as documents to a dedicated collection.

Why does this matter? These documents will be written to solely by the index, but given that they are documents, they obey all the usual rules and can be acted upon like any other document. In particular, this means that we can apply an ETL process to them. Here is what this ETL script would look like.


The script sends the aggregated sales (the collection generated by the MapReduce index) to a central server. Note that we also added some static fields that will be helpful on the remote server so as to be able to tell which store each aggregated sale came from. At the central server, you can work with these aggregated sales documents to each store’s details, or you can aggregate them again to see the state across the entire chain.

The nice things about this approach are the combination of features and their end result. At the local level, you have independent servers that can work seamlessly with an unreliable network. They also give store managers a good overview of their local states and what is going on inside their own stores.

At the same time, across the entire chain, we have ETL processes that will update the central server with details about sales statuses on an ongoing basis. If there is a network failure, there will be no interruption in service (except that the sales details for a particular store will obviously not be up to date). When the network issue is resolved, the central server will accept all the missing data and update its reports.

The entire process relies entirely on features that already exist in RavenDB and are easily accessible. The end result is a distributed, highly reliable and fault tolerant MapReduce process that gives you aggregated view of sales across the entire chain with very little cost.

Code that? It is cheaper to get a human

time to read 3 min | 597 words

Rafal had a great comment on my previous post:

Much easier with humans in the process - just tell them to communicate and they will figure out how to do it. Otherwise they wouldn't be in the shoe selling business. Might be shocking for the tech folk, but just imagine how many pairs of shoes they would have to sell to pay for a decent IT system with all the features you consider necessary. Of course at some point the cost of not paying for that system will get higher than that…

This relates to have a chain of shoe stores that need to sync data and operations among the different stores.

Indeed, putting a human in the loop can in many cases be a great thing. A good example of that can be in order processing. If I can write just the happy path, I can be done very quickly. Anything not in the happy path? Let a human deal with that. That cut down costs by so much, and it allow you to make intelligent decisions on the spot, with full knowledge of the specific case. It is also quite efficient, since most orders fall into the happy path. It also means that I can come back in a few months and figure out what the most common reasons to fall off the happy path are and add them to the software, reducing the amount of work I shell to humans significantly.

I wish that more systems were built like that.

It is also quite easy to understand why they aren’t built with this approach. Humans are expensive. Let’s assume that we can pay a minimum wage, in the states, that would translate to about 20,000 USD. Note that I’m talking about the actual cost of employment, this calculation includes the salary, taxes, insurance, facilities, etc. If I need this to be a 24/7, I have to at least triple it (without accounting for vacation, sick leave, etc).

At the same time, x1e.16xlarge machine on AWS with 64 cores and 2 TB of memory will set me back by 40,000 a year. And it will likely be able to process transactions much faster than the two minimum wage employees that the same amount of money will get me.

Consider the case of a shoe store and misdirected check scenario, we need to ensure that the people actually receiving the check understand that this is meant for the wrong store and take some form of action. That means that we can just take Random Joe Teenager off the street. So another aspect to consider is the training costs. That usually means getting higher quality people and training them on your policies. All of which take quite a bit of time and effort. Especially if you want to have consistent behavior across the board.

Such a system, taken to extreme, result in rigid policy without a lot of place for independent action on the part of the people doing the work. I wish I could say that taking it to extreme was rare, but all you have to do is visit the nearest government office or bank or the post office to see common examples of people working working within very narrow set of parameters. The metric for that, by the way, is the number of times that you hear: “There is nothing I can do, these are the rules” per hour.

In such a system, it is much cheaper to have a rigid and inflexible system running on a computer. Even with the cost of actually building the system itself.

Data ownership: The story of an invoice

time to read 3 min | 478 words

imageLet’s talk about Gary, and Gary’s Shoes. Gary runs a chain of shoes stores across the nation. As part of refreshing their infrastructure, Gary want to update all the software across the entire chain. The idea is to have a unified billing, inventory, sales and time tracking for the entire chain.

Gary doesn’t spend a lot of time on this (after all, he has to sell shoes), he just installed a sync service between all the stores and HQ to sync up all the data. Well, I call in sync service. What it actually turn out to be is that the unified system is a set of Excel files on a shared DropBox folder.

Feel free to go and wash your face, have a drink, take Xanax. I know this might be a shock to see something like this.

Surprisingly enough, this isn’t the topic of my post. Instead, I want to talk about data ownership here.

Imagine that one of Gary’s stores in Chicago sold a bunch of shoes, then issued an invoice to the customer. They dutifully recorded the order in the Orders.xlsx file with the status “Pending Payment”.

That customer, however, accidently sent the check to the wrong store. No biggie, right?  The clerk at the second store can just go ahead and update the order in the shared system, marking it as “Paid in full”.

As it turns out, this is a problem. And the easiest way to explain why is data ownership. The owner of this particular record is the original store. You might say that this doesn’t matter, after all, the change happened in the same system. But the problem is that this is almost always not the case.

In addition to the operation “system” that you can see on the right, there are other things. The store manager still have a PostIt note to call that customer and ask about the missing payment. The invoice that was generated need to be closed, etc. Just updating it in the system isn’t going to cause all of that to happen.

The proper way to handle that is to call the owner of the data (the original store) and let them know that the check arrived to the wrong store. At this point, the data owner can decide how to handle that new information, apply whatever workflows need to be done, etc.

I intentionally used what looks like a toy example, because it is easy to get bogged down in the details. But in any distributed system, there are local processes that happen which can be quite important. If you go ahead and update their information behind their back, you are guaranteed to break something. And I haven’t even began to talk about the chance for conflicts… of course.

Performance optimization starts at the business process level

time to read 3 min | 447 words

I had an interesting discussion today about optimization strategies. This is a fascinating topic, and quite rewarding as well. Mostly because it is so easy to see your progress. You have a number, and if it goes in the right direction, you feel awesome.

Part of the discussion was how the use of a certain technical solution was able to speed up a business process significantly. What really interested me was that I felt that there was a lot of performance still left on the table because of the limited nature of the discussion.

It is easier if we do this with a concrete example. Imagine that we have a business process such as underwriting a loan. You can see how that looks like below:


This process is setup so there are a series of checks that the loan must go through before approval. The lender wants to speed up the process as much as possible. In the case we discussed, the operations performed were mostly in the speed in which we can move the loan application from one stage to the next. The idea is that we keep all parts of the system as busy as possible and maximize throughput. The problem is that there is only so much that we can do with a serial process like this.

From the point of view of the people working on the system, it is obvious that you need to run the checks in this order. There is no point in doing anything else. If there is not enough collateral, why should we run the legal status check, for example?

Well, what if we changed things around?


In this mode, we run all the check concurrently. If most of our lenders are valid, this means that we can significantly speedup the time for loan approval. Even if there is a significant number of people who are going to be denied, the question now becomes whatever it is worth the trouble (and expense) to run the additional checks.

At this point, it is a business decision, because we are mucking about with the business process itself.  Don’t get too attached to this example, I chose it because it is simple and obvious to see the difference in the business processes.

The point is that not thinking about this from that level completely block you from what is a very powerful optimization. There is only so much you can do within the box, but if you can get a different box…

RavenDB 4.1 FeaturesCounting my counters

time to read 3 min | 501 words

imageDocuments are awesome, they allow you to model your data in a very natural way. At the same time, there are certain things that just don’t fit into the document model.

Consider the simple case of counting. This seems like it would be very obvious, right? As simple as 1+1. However, you need to also consider concurrency and distribution. Look at the image on the right. What you can see there is a document describing a software release. In addition to tracking the features that are going into the release, we also want to count various statistics about the release. In this example, you can see how many times a release was downloaded, how many times it was rated, etc.

I’ll admit that the stars rating is a bit cheesy, but it looks good and actually test that we have good Unicode support Smile.

Except for a slightly nicer way to show numbers on the screen, what does this feature gives you? It means that RavenDB now natively understand how to count things. This means that you can increment (or decrement) a value without modifying the whole document. It also means that RavenDB will be able to automatically handle concurrency on the counters, even when running in a distributed system. This make this feature suitable for cases where you:

  • want to increment a value
  • don’t care (and usually explicitly desire) concurrency
  • may need to handle very large number of operations

The case of the download counter or the rating votes is a classic example. Two separate clients may increment either of these values at the same time a third user is modifying the parent document. All of that is handled by RavenDB, the data is updated, distributed across the cluster and the final counter values are tallied.

Counters cannot cause conflicts and the only operation that you are allowed to do to them is to increment / decrement the counter value. This is a cumulative operation, which means that we can easily handle concurrency at the local node or cluster level by merging the values.

Other operations (deleting a counter, deleting the parent document) are of course non cumulative, but are much rarer and don’t typically need any sort of cooperative concurrency.

Counters are not standalone values but are strongly associated with their owning document. Much like the attachments feature, this means that you have a structured way to add additional data types to you documents. Use counters to, well… count. Use attachments to store binary data, etc. You are going to see a lot more of this in the future, since there are a few things in the pipeline that we are already planning to add.

You can use counters as a single operation (incrementing a value) or in a batch (incrementing multiple values, or even modifying counters and documents together). In all cases, the operation is transactional and will ensure full ACIDity.

Open sourcing code is a BAD default policy

time to read 11 min | 2011 words

imageI run into this Medium post that asks: Why is this code open-sourced? Let’s flip the question. The premise of the post is interesting, given that the author asks that the default mode for code is that it should be open source. I find myself in the strange position of being a strong open source adherent that very strong disagree on pretty much every point in this article. Please sit tight, this may take a while, this article really annoyed me.

Just to clear the fields, I have been working on open source software for the past 15 years. The flagship product that we make is open source and available on GitHub and we practice a very open development process. I was also very active in a number of high profile open source projects for many years and had quite a few open source projects that I had built and released on my own. I feel that I’m quite qualified to talk from experience on this subject.

The quick answer for why the default for a codebase shouldn’t be open source is that it costs. In fact, there are several very different costs around that.

The most obvious one is the reputation cost for the individual developer. If you push bad stuff out there (like this 100+ lines method) that can have a real impact on people perception on you. There is a very different model for internal interaction inside the team and stuff that is shown externally, without the relevant context. A lot of people don’t like this exposure to external scrutiny. That leads to things like: “clean up the code before we can open source it”.  You can argue that this is something that should have been done in the first place, but that don’t change the fact that this is a real concern and add more work to the process.

Speaking of work, just throwing code over the wall is easy. I’m going to assume that the purpose isn’t to just do that. The idea is to make something useful and that means that aside from the code itself, there is also a lot of other aspects that needs to be handled. For example, engaging the community, writing documentation, ensuring that the build process can run on a wide variety of machines. Even if the project is only deployed on Ubuntu 16.04, we still need to update the build script for that MacOS guy. Oh, this is open source and they sent us a PR to fix that. Great, truly. But who is going to maintain that over time?

Open source is not an idyllic landscape that you  dump your code in and someone else is going to come and garden it for you.

And now, let me see if I can answer the points from the article in detail:

  • Open-source code is more accessible - Maintainers can get code reviews … consumers from anywhere in the world … can benefit from something I was lucky enough to be paid for building.

First, drive by code reviews are rare. As in, they happen extremely infrequently. I know that because I do them for interesting projects and I explicitly invited people to do the same for my projects and had very little response. People who are actually using the software will go in and look at the code (or some parts of it) and that can be very helpful, but expecting that just because your code is open source you’ll get reviews and help is setting yourself for failure.

There is also the interesting tidbit there about consumers benefiting from something that the maintainers were paid to build. That part is a pretty important one. Because there is a side here in the discussion that hasn’t been introduced. We had maintainers and consumers, but what about the guy who end up paying the bills? I mean, given that this is paid work, this isn’t the property of the maintainer, it belongs to the people who actually paid for the work. So any discussion on the benefits of open sourcing the code should start from the benefits for these people.

Now, I’m perfectly willing to agree (in fact, I do agree, since my projects are in the open) that there are good and valid reasons to want to open source a project and community feedback is certainly a part of that. But any such discussion should start with the interests of the people paying for the code and how it helps them. And part of that discussion should involve the real and non trivial costs of actually open sourcing a project.

  • Open-source code keeps us healthy - Serotonin and Oxytocin are chemicals in the brain that make you feel happy and love. Open source gives you that.

I did a bad job summarizing this part, quite intentionally. Mostly because I couldn’t quite believe what I was reading. The basic premise seems to be that by putting your code out there you open yourself to the possibility of someone seeing your code and sending you a “Great Job” email and making your day.

I… guess that can happen. I certainly enjoy it when it happens, sure. Why would I say no to something like that?

Well, to start with, it happens, sure, but it isn’t a major factor in the decision making process. I’ll argue that if you think that compliments from random strangers are so valuable, just get in and out of Walmart in a loop. There are perfect strangers there that will greet you every single time. Why wouldn’t you want to do that?

More to the point, even assuming that you have a very popular project and lots of people write you how awesome you are, this gets tiring fast. What is worse is you throwing code over the wall and expecting the pat in the back. But no one care, actually getting them to care takes a whole lot of additional work.

And we haven’t even mentioned that other side of open source project. The users who believe that just because your code is open source they are entitled for all your time and effort (for free). And you are expected to fix any issues they find (immediately, of course) and are quite rude and obnoxious. There aren’t a lot of them, but literally any open source project that has anything but the smallest of following will have to handle them at some point. And often dealing with such a disappointed user means dealing with abuse. That can be exhausting and painful.

Above I pointed out a piece of code in the open that is open to critique. This is a piece of code that I wrote, so I feel comfortable telling you that it isn’t so good. But imagine that I took your code and did that? If is very easy to get offended by this, even when there was no intent to offend.

  • Open-source code is more maintainable – Lots of tools are free for OSS projects

So? This is only ever valuable if you assume that tooling are expensive (they aren’t). The article mentions tools such as Travis-CI, Snyk, Codecov and Dependencies.io that are offering free tier for open source projects. I went ahead and priced these services for a year for the default plans for each. The total yearly cost of all of them was around $8,000. That is a lot of money. But that is only assuming that you are an individual working for free. Assuming that you are actually getting paid, the cost of such tools and services is miniscule compared to other costs (such as developer salaries).

So admittedly, this is a very nice property of open source projects, but it isn’t as important as you might imagine it would be. In a team of five people, if the effort to open source the project is small, only taking a couple of weeks, it will take a few years to recoup that investment in time (and I’m ignoring any additional effort to run the open source portion of the project).

  • Open-source code is a good fit for a great engineering culture

Well, no. Not really. You can have a great engineering culture without having open source and you can have a really crappy engineering with open source. They sometimes go in tandem, but they aren’t really related. Investing in the engineering culture is probably going to be much more rewarding for a company that just open sourcing projects. Of particular interest to me is this quote:

Engineers are winning because they can autonomously create great projects that will have the company’s name on it: good or bad…

No, engineers do not spontaneously create great projects. That come from hard work, guidance and a lot of surrounding infrastructure. Working in open source doesn’t meant that you don’t need coordination, high level vision and good attention for detail. This isn’t a magic sauce.

What is more, and that is really hammering the point home: good or bad. Why would a company want to attach it’s name to something that can be good or bad? That seems like a very unnecessary gamble. So in order to avoid publicly embarrassing the company, there will be the need to do the work to make sure that the result is good. But the alternative to that is not to have a bad result. The alternative to that is to not open source the code.

Now, you might argue that such a thing is not required if the codebase is good to begin with, and I’ll agree. But then again, you have things like this that you’ll need to deal with. Also, be sure that you cleaned up both the code and the commit history.

  • Just why not

The author goes on to gush about the fact that there are practically no reasons why not to go open source, that we know that projects such as frameworks, languages, operating systems and databases are all open source and are very successful.

I think that this gets to the heart of the matter. There is the implicit belief that the important thing about an open source project is the code. That couldn’t be further from the truth. Oh, of course, the code is the foundation of the project, but foundations can be replaced (see: FireFox, OpenSsl –> BoringSsl, React, etc).

The most valuable thing about an open source project is the community. The contributors and users are the thing that make a project unique and valuable. In other words, to misquote Clinton, it’s the community, stupid. 

And a community doesn’t just spring up from nowhere, it takes effort, work and a whole lot of time to build. And only when you have a community of sufficient size will you start to see actual return of investment for your efforts. Until that point, all of that is basic sunk cost.

I’m an open source developer, pretty much all the code I have written in the past decade or so is under one open source license or another and is publicly available. And with all that experience behind me I can tell you what really annoyed me the most about this article. It isn’t an article about promoting open source. It is an article that, I feel, promotes just throwing code over the wall and expecting flowers to grow. That isn’t the right way to do things. And it really bugged me that in all of this article there wasn’t a single word about the people who actually paid for this code to be developed.

Note that I’m not arguing for closed source solutions for things like IP, trade secrets, secret sauce and the like. These are valid concerns and needs to be addressed, but that isn’t the issue. The issue is that open sourcing a project (vs. throwing the code to GitHub) is something that should be done in a forthright manner. With clear understand of the costs, risks and ongoing investment involved. This isn’t a decision you make because you don’t want to pay for a private repository on GitHub. 

Self contained deployments and embedded RavenDB

time to read 3 min | 496 words

imageIn previous versions of RavenDB, we offered a way to run RavenDB inside your process. Basically, you would reference a NuGet package and will be able to create a RavenDB instance that run in your own process. That can simplify deployment concerns immensely and we have a bunch of customers who rely on this feature to just take their database engine with their application.

In 4.0, we don’t provide this ability OOTB. It didn’t make the cut for the release, even though we consider this a very important utility feature. We are now adding this in for the next release, but in a somewhat different mode.

Like before, you’ll be able to do a NuGet reference and get a document store reference and just start working. In other words, there is no installation required and you can create a database and start using it immediately with zero hassle.

The difference is that you’ll not be running this inside your own process, instead, we’ll create a separate process to run RavenDB. This separate process is actually slaved to the parent process, so if the parent process exits, so will the RavenDB process (no hanging around and locking files).

But why create a separate process? Well, the answer to that is quite simple, we don’t want to force any dependency on the client. This is actually a bit more complex, though. It isn’t that we don’t want to force a dependency as much as we want the ability to change our own dependencies.

For example, as I’m writing this, the machine is currently testing whatever .NET Core 2.1 pops up any issues with RavenDB and we are pretty good with keeping up with .NET Core releases as they go. However, in order to do that, we need a wall between the client and the server code, which we want to freely modify and play with (including changing what frameworks we are running on and using). For the client code, we’ve a well defined process and the versions we support, but for the server, we explicitly do not define this as an implementation detail. One of the nice things about .NET Core is the it allows the deployment of self contained applications, meaning that we can carry the framework with us, and not have to depend on whatever is installed on the machine. This makes services and deployment a lot easier.

There is also the issue of other clients. We have clients for .NET, JVM, Go, Ruby, Node.JS and Python and we want to give users in all languages the same experiences of just bundling RavenDB and running it with zero hassles. All of that leads us to spawning a separate process and creating a small protocol for the host application control the slaved RavenDB instance. This will be part of the 4.1 release, which should be out in about three months.

It was your idea, change the diaper

time to read 3 min | 470 words

imageYou learn a lot of things when talking to clients. Some of them are really fascinating, some of them are quite horrifying. But one of the most important things that I have learned to say to client is: “This is out of scope.”

This can be an incredibly frustrating thing to say, both for me and the client, but it is sometimes really necessary. There are times when you see a problem, and you know how to resolve it, but it is simply too big an issue to take upon yourself.

Let me give a concrete example. A customer was facing a coordination problem with their system, they need to deal with multiple systems and orchestrate actions among them. Let’s imagine that this is an online shop (because that is the default example) and you need to process and order and ship it to the user.

The problem at this point is that the ordering process need to coordinate the payment service, the fulfillment service, the shipping service, deal with backorders, etc. Given that this is B2B system, the customer wasn’t concerned with the speed of the system but was really focused on the correctness of the result.

Their desire, to have a single transaction encompass all such operations. They were quite willing to pay the price in performance for that, in order to achieve that goal. And they turned to us for help in this matter. They wanted the ability to persistently and transactionally store data inside RavenDB and only “commit” it at a given point.

We suggested a few options (draft documents, a flag in the document, etc), but we didn’t answer their core question. How could they actually get the transactional behavior across multiple operations that they wanted?

The reason we didn’t answer that question is that it is… out of scope. RavenDB doesn’t have this feature (for really good reasons) and that is clearly documented. There is no expectation for us to have this feature, and we don’t.  That is where we stop.

But what is the reason that we take this stance? We have a lot of experience in such systems and we can certainly help find a proper solution, why not do so?

Ignoring other reasons (such as this isn’t what we do), there is a primary problem with this approach. I think that the whole idea is badly broken, and any suggestion that I make will be used against us later. This was your idea, it broke (doesn’t matter you told us it would), now fix it. It is a bit more awkward to have to say “sorry, out of scope” ahead of time, but much better than having to deal with the dirty diapers at the end.


  1. I WILL have order: How Noise sorts query results - about one day from now
  2. Reviewing the Bleve search library - 2 days from now
  3. I WILL have order: How Bleve sorts query results - 3 days from now
  4. I won’t have order: Looking at search libraries without ordering - 4 days from now

There are posts all the way to May 31, 2018


  1. I WILL have order (3):
    25 May 2018 - How Lucene sorts query results
  2. RavenDB 4.1 features (4):
    22 May 2018 - Highlighting
  3. Inside RavenDB 4.0 (10):
    22 May 2018 - Book update
  4. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
  5. Challenge (52):
    03 Apr 2018 - The invisible concurrency bug–Answer
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats