Ayende @ Rahien

filter by tags archive

architecture (625) rss
bugs (451) rss
community (384) rss
databases (481) rss
design (901) rss
development (663) rss
hibernating-practices (75) rss
miscellaneous (592) rss
performance (397) rss
programming (1117) rss
raven (1485) rss
ravendb.net (572) rss
reviews (184) rss

2026
- February (2)
- January (5)
2025
- December (8)
- November (4)
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Feb 28 2019

Workflow designThe long haul

time to read 4 min | 760 words

Tweet Share Share 9 comments

Tags:

I talked about some of the requirements for proper workflow design in my previous post. As a reminder, the top ones are:

Cater for developers, not the business analysts. (More on this later).
Source control isn’t optional, meaning:

Multiple branches
Can diff & review changes
Merging
Multiple people can work at the same time

Encapsulate complexity

This may seem like a pretty poor list, because if you are a developer, you might be taking all of these as granted. Because of that, I wanted to display a small taste from what used to be Microsoft’s primary workflow engine.

A small hint… this kind of system is not going to be useful for anything relating to source control, change management, collaborative work, understanding what is going on, etc.

A better solution for this would be to use a tool that can work with source control, that developers are familiar with and can handle the required complexity.

That tool is called… code.

It checks all the boxes required, naturally. But it does have a distinct disadvantage. One of the primary reasons you want to use a workflow engine of some kind is to decouple the implementation of your business from the policies of the business. Coming back to the mortgage example, how you calculate late fees payment is fixed (in the contract itself, but usually also by law and many regulations), but figuring out whatever late fees should be waived, on the other hand, is subject to the whims of the business.

That is a pretty simple example, but in most businesses, these kind of workflows adds up. You can easily end up with dozens to hundreds of different workflows without the business being too big or complex.

There is another issue, though. Code is pretty good when you need to handle straightforward tasks. A set of if statements (which is pretty much all most workflows are) are trivial to handle. But workflow has another property, they tend to be long. Not long on computer scale (seconds), but long on people scale (months and years).

The typical process of getting a loan may involve an initial submission, review by a doctor, asking for follow up documentation (rinse – repeat a few times), getting doctor appraisal and only then being able to generate a quote for the customer. Then we have a period of time in which the customer can accept, a qualifying period, etc. That can last for a good long while.

Trying to code long running processes like that require us a very unnatural approach to coding. Especially since you are likely to need to handle software updates while the workflows are running.

In short, we are in a strange position: we want to use code, because it is clear, support software development practices that are essentials and can scale up in complexity as needed. On the other hand, we don’t want to use our usual codebase for that, because we’ll have very different deployment strategies, the manner of working is very different and there is a lot more involvement of the business in what is going on there.

The way to handle that is to create a proper boundary between parts of the system. We’ll have the workflow behavior, defined in scripts, that describe the policy of the system. These tend to be fairly high level concepts and are designed explicitly for the rule of business policy behaviors. The infrastructure for that, on the other hand, is just a standard application using normal software practices, that is driven by the workflow scripts.

And by a script, I meant literally a script. As in, JavaScript.

I want to give you a sneak peak into how I envision this kind of system, but I’ll defer full discussion of what is involved to my next post.

The idea is that we use the script to define our policy, and then we use that to make decisions and invoke the next stage in the process. You might notice that we have the state variable, which is persisted between invocations. That allow us to use a programming model that is fairly common and obvious to developers. We can usually also show this, as is, to a business analyst and get them to understand what is going on easily enough. All the actual actions are abstracted. For example, life insurance setup is a completely different workflow that we invoke.

In my next post, I’m going to drill down a bit into the details of this approach and what kind of features do we need there.

Feb 27 2019

Workflow designWhat you shouldn’t be looking for

time to read 5 min | 815 words

Tweet Share Share 6 comments

Tags:

One of the most common themes I run into when talking to customers, users and sundry people in tech is the repeated desire to fire developers.

Actually, that is probably too loaded a statement. It actually come in two parts:

Developers usually want to focus on the interesting bits, and the business logic portions aren’t that much fun.
The business analysts usually want to get things done and having to get developers to do that is considered inefficient.

If only there was a tool, or a pattern, or a framework, or something that would allow the business analysts to directly specify the behavior of the system… Why, we could cut the developers from the process entirely! And speaking as a developer, that would be a huge relief.

I think the original name for that was CASE tools, and that flopped. In fact, literally every single one of the attempts to replace developers by a tool has flopped. They got such a bad rap that people keep trying to implement them using different names. Some stuff can be done fairly easily, though. WYSIWYG for GUI is well established and Wordpress and WIX, to name the two examples that come to mind immediately, show that you can have a non techie build a proper website. In fact, you can even plug in some pretty sophisticated functionality without burdening the user with too much.

But all that takes you to a point. And past that point, the drop off is harsh. Let’s take another common tool that is used to reduce the dependency on developers, SharePoint.

SharePoint hourly rate: $ 48.3
C# hourly rate: $28.6

You pay close to double for actual developer time on SharePoint, mostly because it is so painful to work with it.

In a recent conference, I got into a conversation about business workflows and how to best implement them. You can look at the image on the right to get a good idea about what kind of process they were talking about.

To make things real, I want to take a “simple” example, of accepting a life insurance policy. Here is what the (extremely simplified) workflow looks like for issuing a life insurance policy:

This looks good, and it certainly should make sense to a business analyst. However, even after I pretty much reduced the process to its bare bones and even those has been filed away, this is still pretty complex. The process of actually getting a policy is actually a lot more complex. Some questions don’t require doctor evaluation (for example, smoking) and some require supplemental documentation (oh, you were hospitalized? Gimme all these records). The doctor may recommend different rates, rejecting entirely, some exceptions in the policy, etc. All of which need to be in the workflow. Actuarial tables needs to be consulted for each of those cases, etc, etc, etc.

But something like the diagram above isn’t going to be able to handle this level of complexity. You are going to get lost very quickly if you try to put so many boxes on the screen.

So you need encapsulation.

And you’ll probably want to have a way to develop these business workflows, which means that they aren’t static.

So you need source control.

And if you have a complex business process, you likely have different people working on it.

So you need to be able to review changes, and merge them.

Note that this is explicitly distinct from being able to store the data in source control. Being able to actually diff in a meaningful fashion two versions of such a process is anything but trivial. Usually you are left with diffing the raw XML / JSON that store the structure. Good luck with that.

If the workflow is complex, you need to be able to understand what is going on under various conditions.

So you need a debugger.

In fact, pretty soon you’ll realize that you’ll need quite a lot of the things that developers do. Except that your tool of choice doesn’t do that, or if they do, they do it poorly.

Another issue that even if you somehow managed to bypass all of those details, you are going to be facing the same drop that you see elsewhere with tools that attempt to get rid of developers. At some point, the complexity grows too large, and you’ll call your development team and hand of the system to them. At which point they will be stack with a very clucky tool that attempt to be quite clever and easy to use. It is also horribly limiting for a developer. Mostly because all of the “complexity” involved is in the business process itself, not in the actual complexity of what is going on.

There are better ways of handling that, and the easier among them is to just use code. That can be… surprisingly versatile.

Feb 26 2019

Bug of the day

time to read 1 min | 108 words

Tweet Share Share 0 comments

Tags:

bugs

This one was in an experimental feature, as part of the extra testing process of making this into a stable feature, we run into a bug.

Here is the bug fix:

The code is meant to detect changes in a distributed environment and we checked the wrong location, meaning that we never actually did the check. In 99.999% of the time, this already happened, but it exposed us to some nasty race conditions.

This particular piece of code has been the subject to multiple code reviews, all of which never noticed the issue.

Feb 25 2019

RavenDB Customers Portal

time to read 1 min | 153 words

Tweet Share Share 5 comments

Tags:

raven

I wanted to point out the RavenDB Customers Portal website, because it has a very important function that may not seem obvious.

As part of the process of setting up RavenDB, we provide our users with a domain name so they can run their clusters securely. This is pretty easy and has been used by thousands of our users.

However, advanced scenarios, such as adding a node to a cluster or changing a node IP required you to re-run the setup and weren’t convenient. We have now made it even simpler, you can use the customers portal to edit your cluster DNS configuration.

Here is how this looks like:

This is available to customers who purchased a commercial license as well as users running on the community edition. As usual, we would love to get your feedback.

Feb 22 2019

Data modeling with indexesEvent sourcing–Part III–time sensitive data

time to read 6 min | 1018 words

Tweet Share Share 0 comments

Tags:

I got a great comment on my previous post about using Map/Reduce indexes in RavenDB for event sourcing. The question was how to handle time sensitive events or ordered events in this manner. The simple answer is that you can’t, RavenDB intentionally don’t expose anything about the ordering of the documents to the index. In fact, given the distributed nature of RavenDB, even the notion of ordering documents by time become really hard.

But before we close the question as “cannot do that by design", let’s see why we want to do something like that. Sometimes, this really is just the developer wanting to do things in the way they are used to and there is no need for actually enforcing the ordering of documents. But in other cases, you want to do this because there is a business meaning behind these events. In those cases, however, you need to handle several things that are a lot more complex than they appear. Because you may be informed of an event long after that actually happened, and you need to handle that.

Our example for this post is going to be mortgage payments. This is a good example of a system where time matters. If you don’t pay your payments on time, that matters. So let’s see how we can model this as an event based system, shall we?

A mortgage goes through several stages, but the only two that are of interest for us right now are:

Approval – when the terms of the loan are set (how much money, what is the collateral, the APR, etc).
Withdrawal – when money is actually withdrawn, which may happen in installments.

Depending on the terms of the mortgage, we need to compute how much money should be paid on a monthly basis. This depend on a lot of factors, for example, if the principle is tied to some base line, changes to the base line will change the amount of the principle. If only some of the amount was withdrawn, if there are late fees, balloon payment, etc. Because of that, on a monthly basis, we are going to run a computation for the expected amount due for the next month.

And, obviously, we have the actual payments that are being made.

Here is what the (highly simplified) structure looks like:

This includes all the details about the mortgage, how much was approved, the APR, etc.

The following is what the expected amount to be paid looks like:

And here we have the actual payment:

All pretty much bare bones, but sufficient to explain what is going on here.

With that in place, let’s see how we can actually make use of it, shall we?

Here are the expected payments:

Here are the mortgage payments:

The first thing we want to do is to aggregate the relevant operations on a monthly basis, since this is how mortgages usually work. I’m going to use a map reduce index to do so, and as usual in this series of post, we’ll use JavaScript indexes to do the deed.

Unlike previous examples, now we have real business logic in the index. Most specifically, funds allocations for partial payments. If the amount of money paid is less than the expected amount, we first apply it to the interest, and only then to the principle.

Here are the results of this index:

You can clearly see that mistake that were made in the payments. On March, the amount due for the loan increased (took another installment from the mortgage) but the payments were made on the old amount.

We aren’t done yet, though. So far we have the status of the mortgage on a monthly basis, but we want to have a global view of the mortgage. In order to do that, we need to take a few steps. First, we need to define an Output Collection for the index, that will allow us to further process the results on this index.

In order to compute the current status of the mortgage, we aggregate both the mortgage status over time and the amount paid by the bank for the mortgage, so we have the following index:

Which gives us the following output:

As you can see, we have a PastDue marker on the loan. At this point, we can make another payment on the mortgage, to close the missing amount, like so:

This will update the monthly mortgage status and then the overall status. Of course, in a real system (I mentioned this is highly simplified, right?) we’ll need to take into account payments made in one time but applied to different times (which we can handle by an AppliedTo property) and a lot of the actual core logic isn’t in indexes. Please don’t do mortgage logic in RavenDB indexes, that stuff deserve its own handling, in your own code. And most certainly don’t do that in JavaScript. The idea behind this post is to explore how we can handle non trivial event projection using RavenDB. The example was chosen because I assume most people will be familiar with it and it wasn’t immediately obvious how to go about actually solving it.

If you want to play with this, you can import the following file (Settings > Import Data) to get the documents and index definitions.

Feb 21 2019

The first database I ever built (20 years ago)

time to read 3 min | 468 words

Tweet Share Share 5 comments

Tags:

development

I was reminiscing about some old code that I wrote a long while ago, in the heyday of ASP and when the dotcom bubble was just starting to inflate. At the time, I was either still at high school or just graduated and I was fascinated by the ability to write web applications. I wrote quite a few of them, as I recall. Thankfully, none of them ever made it to this day and age. I remember one project in particular that I was quite proud of. I wrote a bunch of BBS / forum systems. One version used an Access file as the database. IIRC, that is literally how I learned SQL for the first time.

The other BBS system is what I’m here to talk about today. You couldn’t always get Access, and having it installed on the server was PITA. Especially given that I was pretty much limited to hosts that offered free hosting only. So I decided to write a BBS system that had no dependencies whatsoever and can be deployed on any host that could handle ASP. Note that this is ASP classic, .NET is still 2 years away from alpha status at this time and Java is for applets.

I decided that I would write everything through file I/O, but that was quite complex. I needed something that would help me out. Then I realized that I could use ASP itself to help me. Instead of having to pull data at runtime from a file, parse it, process it and so on, I could lean on ASP itself for that.

Trigger warning: This code is newly written, but I still remember the shape of it quite well. This may cause you seizures. For the full beauty of this piece of code, you need to consider that this is a very small piece of a much larger codebase (all in a single file, of course) but it is a very much a ~~reprehensive~~ representative example.

I’ll give you a moment to study the code. It deserve that much of your attention.

What you see here is a beautiful example of using code as data and data as code, self modifying code and some really impressive (even if I say so myself) capabilities of my past self to dig himself way into the hole.

Bonus points if you can point out all the myriad of issue that this code has. You can safely leave aside maintainability, I never had to maintain it, but over twenty year have passed, and I still remember the complexity involved in keeping all the states in my head.

And that was the first time that I actually wrote my own dedicated database.

Feb 20 2019

Technical marketing from the other side

time to read 2 min | 330 words

Tweet Share Share 2 comments

Tags:

I spent the last couple of days in the O’Reilly Architecture Conference and HIMSS (Healthcare Information and Management Systems Society) Conference. During that time, I had the chance of listening to quite a few technical marketing spiels.

Some of them were technically very impressive, but missed the target by a planet or two. I came up with a really nice analogy for how such presentations do a great disservice for their purpose.

Consider the following:

This non-steroidal drug has been clinically tested and FDA approved will cease the production of prostaglandins and has a significant antiplatelet effect. It’s available in tablet and syrup forms and is suitable for IVs. May cause diarrhea and/or vomiting.

This is factual (at least as much as I could make it), I assume that if you are a medical professional you might be able to work out possible uses for this drug. But the most important thing that is missing from this description? What does this do?

This is Ibuprofen and you take it to ease your headache (among many other uses). It can also protect help you avoid blood clots.

I intentionally chose this example, because it is a very obvious one (and I just came back hearing way too much medical stuff). You begin by telling me how this will ease the pain. In many ways, I consider technical marketing to be composed of the following steps:

Whatever this product can actually ease the pain.
Whatever this customer actually experience the pain.

For example, if you are promising to have a faster than light bullet-train to Mars, that is going to cast some… doubt on your claims. On the other hand, it doesn’t matter to me if you can cut down my commute time in half if I can get to work while not leaving my house.

If the customer experienced the pain and believe that you can actually help there, you are most of the way there. All that is left is just negotiating, barrier removal, etc.

Feb 19 2019

RavenDB Go client is now available for preview

time to read 1 min | 89 words

Tweet Share Share 0 comments

Tags:

raven

I’m really happy to announce that we are very near to releasing an official Go client for RavenDB.

You can read the API docs or go over the examples and what we most need right now is people who aren’t familiar with the code to take it for a spin and see if they can break it.

I would really appreciate any feedback you have on the new client.

Feb 18 2019

Production PostmortemThis data corruption bug requires 3 simultaneous race conditions

time to read 10 min | 1813 words

Tweet Share Share 6 comments

Tags:

This is a sordid tale of chance and mystery and the nasty tricks that Murphy can play on you.

A few customers reported an error similar to the following one:

Invalid checksum for page 1040, data file Raven.voron might be corrupted, expected hash to be 0 but was 16099259854332889469

One such case might be a disk corruption, but multiple customers reporting it is an indication of a much bigger problem. That was a trigger for a STOP SHIP reaction. We consider data safety a paramount goal of RavenDB (part of the reason why I’m doing this Production Postmortem series), and we put some of our most experienced people on it.

The problem was, we couldn’t find it. Having access to the corrupted databases showed that the problem occurred on random. We use Voron in many different capacities (indexing, document storage, configuration store, distributed log, etc) and these incidents happened across the board. That narrowed the problem to Voron specifically, and not bad usage of Voron. This reduced the problem space considerably, but not enough for us to be able to tell what is going on.

Given that we didn’t have a lead, we started by recognizing what the issue was and added additional guards against it. In fact, the error itself was a guard we added, validating that the data on disk is the same data that we have written to it. The error above indicates that there has been a corruption in the data because the expected checksum doesn’t match the actual checksum from the data. This give us an early warning system for data errors and prevent us from proceeding on erroneous data. We have added this primarily because we were worried from physical disk corruption of data, but it turns out that this is also a great early warning system for when we mess up.

The additional guards were primarily additional checks for the safety of the data in various locations on the pipeline. Given that we couldn’t reproduce the issue ourselves, and none of the customers affected were able to reproduce this, we had no idea how to go from there. Therefor, we had a team that kept on trying different steps to reproduce this issue and another team that added additional safety measures for the system to catch any such issue as early as possible.

The additional safety measures went into the codebase for testing, but we still didn’t have any luck in figuring out what we going on. We went from trying to reproduce this by running various scenarios to analyzing the code and trying to figure out what was going on. Everything pointed to it being completely impossible for this to happen, obviously.

We got a big break when the repro team managed to reproduce this error when running a set of heavy tests on 32 bits machines. That was really strange, because all the reproductions to date didn’t run on 32 bits.

It turns out that this was a really lucky break, because the problem wasn’t related to 32 bits at all. What was going on there is that under 32 bits, we run in heavily constrained address space, which under load, can cause us to fail to allocate memory. If this happens at certain locations, this is considered to be a catastrophic error and requires us to close the database and restart it to recover. So far, this is pretty standard and both expected and desired reaction. However, it looked like sometimes, this caused an issue. This also tied to some observations from customers about the state of the system when this happened (low memory warnings, etc).

The very first thing we did was to test the same scenario on the codebase with the new checks added. So far, the repro team worked on top of the version that failed at the customers’ sites, to prevent any other code change from masking the problem. With the new checks, we were able to confirm that they actually triggered and caught the situation early. That was a great confirmation, but we still didn’t know what was going on. Luckily, we were able to add more and more checks to the system and run the scenario. The idea was to trip over a guard rail as early as possible, to allow us to inspect what actually caused it.

Even with a reproducible scenario, that was quite hard. We didn’t have a reliable method of reproducing it, we had to run the same set of operations for a while to hopefully reproduce this scenario. That took quite a bit of time and effort. Eventually, we figured out what was the root cause of the issue.

In order to explain that, I need to give you a refresher on how Voron is handling I/O and persistent data.

Voron is using MVCC model, in which any change to the data is actually done on a scratch buffer, this allow us to have snapshot isolation at very little cost and give us a drastically simplified model for working with Voron. Other important factors include the need to be transactional, which means that we have to make durable writes to disk. In order to avoid doing random writes, we use a Write Ahead Journal. For these reasons, I/O inside Voron is basically composed of the following operations:

Scratch (MEM) – copy on write data for pages that are going to be changed in the transaction. Usually purely in memory. This is how we maintain the Isolated and Atomic aspects on ACID.
Journal (WAL) – sequential, unbuffered, writes that include all the modifications to the transaction. This is how we maintain the Atomic and Durability aspects in ACID.
Flush (MMAP)– copy data from the scratch buffers to the data file, which allow us to reuse space in the scratch file.
Sync – (FSYNC) – ensure that the data from a previous flush is stored in durable medium, allow us to delete old journal files.

In Voron 3.5, we had Journal writes (which happen on each transaction commit) at one side of the I/O behavior and flush & sync as the other side. In Voron 4.0, we actually split it even further, meaning that journal writes, data flush and file sync are all independent operations which can happen independently.

Transactions are written to the journal file one at a time, until it reach a certain size (usually about 256MB), at which point we’ll create a new journal file. Flush will move data from the scratch buffers to the data file and sync will ensure that the data that was moved to the data file is durably stored on disk, at which point you can safely delete the old journals.

In order to trigger this bug, you needed to have the following sequence of events:

Have enough transactions happen quickly enough that the flush / sync operations are lagging by more than a single file behind the transaction rate.
Have a transaction start a new journal file while the flush operation was in progress.
Have, concurrently, the sync operation complete an operation that include that last journal file. Sync can take a lot of time.
Have another flush operation go on while the sync is in progress, which will move the flush target to the new journal file.
Have the sync operation complete, which only synced some of the changes that came from that journal, but because the new flush (which we didn’t sync yet) already moved on from that journal, mistakenly believe that this journal file is completed done and delete it.

All of these steps, that is just the setup for the actual problem, mind you.

In this case, we are prepared to have to this issue, but we aren’t yet to actually experience it. This is because what happened is that the persistent state (on disk) of the database is now suspect, if a crash happens, we will miss the oldest journal that still have transactions that haven’t been properly persisted to the data file.

Once you have setup the system properly, you aren’t done, in terms of reproducing this issue. We now have a race, the next flush / sync cycle is going to fix this issue. So you need to have a restart of the database within a very short period of time.

For additional complexity, the series of steps above will cause a problem, but even if you crash in just the right location, there are still some mitigating circumstances. In many cases, you are modifying the same set of pages in multiple transactions, and if the transactions that were lost because of the early deletion of the journal file had pages that were modified in future transactions, these transactions will fill up the missing details and there will be no issue. That was one of the issues that made it so hard to figure out what was going on. We needed to have a very specific set of timing between three separate threads (journal, flush, sync) that create the whole, then another race to restart the database at this point before Voron will fix itself in the next cycle, all happening just at the stage that Voron moves between journal files (typically every 256MB of compressed transactions, so not very often at all) and with just the right mix of writes to different pages on transactions that span multiple journal files.

These are some pretty crazy requirements for reproducing such an issue, but as the saying goes: One in a million is next Tuesday.

What made this bug even nastier was that we didn’t caught it earlier already. We take the consistency guarantees of Voron pretty seriously and we most certainly have code to check if we are missing transactions during recovery. However, we had a bug in this case. Because obviously there couldn’t be a transaction previous to Tx #1, we aren’t checking for a missing transaction at that point. At least, that was the intention of the code. What was actually executing was a check for missing transactions on every transaction except for the first transaction on the first journal file during recovery. So instead of ignoring just the the check on Tx #1, we ignored it on the first tx on all recoveries.

Of course, this is the exact state that we have caused in this bug.

Sigh.

We added all the relevant checks, tightened the guard rails a few more times to ensure that a repeat of this issue will be caught very early and provided a lot more information in case of an error.

Then we fixed the actual problems and subject the database to what in humans would be called enhanced interrogation techniques. Hammers were involved, as well as an irate developer with penchant to pulling the power cord at various stages just to see what will happen.

We have released the fix in RavenDB 4.1.4 stable release and we encourage all users to upgrade as soon as possible.

Feb 15 2019

When people leave

time to read 6 min | 1059 words

Tweet Share Share 10 comments

Tags:

hibernating-practices

I talk a lot about the hiring process that we go through, but there is also the other side of that. When people leave us. Hibernating Rhinos has been around for about a decade, in that time it grew from a single guy operation to a company that cross the bridge from small to medium business a couple of years ago.

When I founded the company, I had a pretty good idea of what I wanted to have. Actually, I had a very clear idea of what I didn’t want to have. The things that I didn’t want to carry over to my own company. For example, on call for 24/7 or working hours that exceed the usual norms or being under constant pressure. By and large, looking back at our history and where we are today, I think that we did a pretty good job at upholding these values.

But that isn’t the topic of this post. I wanted to talk about people leaving the company. Given the time that we are in business, we actually have very little turnover. Oh, we had people come and go, and I had to fire people who weren’t pulling their weight. But those were almost always people who were at the company for a short while (typically under a year).

In the past six months, we had two people leave that were with us for three and seven years (about three months apart from one another). That is a very different kind of separation. When I was told that they intend to leave, I was both sad and happy. I was sad because I hated to lose good people, I was happy because they were going to very good places.

After getting over my surprised, I sat down and planned for their leaving. Israel has a month notice requirement, so we had the time to do things properly. I was careful to check (very gently) whatever this is a reversible decision and once I confirmed that they had made the decision, I carried on with that.

My explicit goals for that time were:

Make sure that they are leaving on good terms and great spirits.
Ensure proper handoff of their current tasks.
Provide guidance about current and past tasks.
Map area of responsibilities and make sure that they are covered after they are gone.

The last three, I believe, are pretty common goals when people are leaving, but the most important piece was the first one. What does this mean?

I wrote each of them a recommendation letter. Note that they both already had accepted positions elsewhere at that time, so it wasn’t something they needed. It is something that they might be able to make use of in the future, and it was something that I wanted to do, formally, as an appreciation for their work and skills.

As an aside, I have an open invitation to my team. I’ll provide both recommendation letters and serve as a reference in any job search they have, while they are working for us. I sometimes get CVs from candidates that explicitly note: “sensitive, current employer isn’t aware”. I don’t want to be the kind of place that you have to hide from.

We also threw each of them a going away party, with the entire company stopping everything and going somewhere to celebrate.

I did that for several reasons. First, each of them, in very different ways, contributed significantly to RavenDB. It was a joy to work with them, I don’t see any reason why it shouldn’t be a joy to see them go. I can certainly say that not saying goodbye properly would have created a bad taste for the entire thing, and that is something that I don’t want.

Second, and a bit more cold minded, I want to leave the door open to have them come back again. After so much time in the company, the amount of knowledge that they have in their head is a shame to lose for good. But even if they never come back, that is still a net benefit, because…

Third, there is the saying about “if you love someone, let them go…”. I think that a really good way to make people want to leave is to make it hard to do so. By making the separation easy and cordial, the people who stay know that they don’t need to fear or worry about things if they want to see what else is available for them.

The last few statements came out a bit colder than I intended them to be, but I can’t really think about a good way to phrase the intention that would sound like that. I don’t like that these people left, and I would much rather have them stay. But I started out from the assuming that they are going to leave, and the goal is to make the best out of that.
I was careful to not apply any pressure on them to stay regardless. In fact, in one case, I upfront apologized to the person on the way out, saying: “I want you to know that I’m not pressuring you to stay not because I want you to go, but because I respect your decision to leave and don’t want to make it awkward”.

Fourth, and coming back to the what I want to have as a value for the company, is the idea that I wouldn’t mind at all to be a place where people retire from. In fact, I decidedly want that to be the case. And we do a lot of work to ensure that we are the kind of place that you can be at for long period of times (investing in our people, working on cool stuff, ensuring that grunt work is shared and minimized, etc). However, I would also take great pride in being the place that would be a launching pad to people’s careers.

In closing, people are going to leave. If it is because of something that you can control, that should be a warning sign and something that you should look at to see if you can do better. If it is out of your hands, you should accept it as given and make the best of it.

I was very sad to see them go, and I wish them all the best in their future endeavors.

Oren Eini

Oren Eini

CEO of RavenDB

Workflow designThe long haul

Workflow designWhat you shouldn’t be looking for

Bug of the day

RavenDB Customers Portal

Data modeling with indexesEvent sourcing–Part III–time sensitive data

The first database I ever built (20 years ago)

Technical marketing from the other side

RavenDB Go client is now available for preview

Production PostmortemThis data corruption bug requires 3 simultaneous race conditions

When people leave

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed