Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,950 | Comments: 49,488

filter by tags archive
time to read 2 min | 252 words

In my previous post, I asked you to find the bug in the following code:

This code looks okay, at a glance, but it turns out that this is a really nasty data corruption bug waiting to happen. Here is what the problematic usage looks like:

Do you see the error now?

If the operation will time out, an exception will be raised, but the underlying operation isn’t over. We are using a shared pool, so the buffer we use may be handed over to someone else. At this point, we do something with the buffer, but the pending I/O operation will read data into this buffer, meaning that this is probably going to be garbage in it when we actually use it.

To actually happen, you need to have a timeout operation, reuse of the buffer and the I/O operation completing at just the wrong time. So a sequence of highly unlikely events that would assuredly happen within an hour of pushing something like that to production. For fun, this will reliably happen the moment you have some network issues. So imagine that you have a slow node, which then cause memory corruption, which end up being a visible bug (instead of maybe aborted request) very rarely, and with no indication on how this happened.

How do you fix this? Like this:

This will use a cancellation token, which will cause the operation to be aborted at the stream level, meaning that we can safely reuse values that we passed the underlying stream.

time to read 2 min | 400 words

Hadi’s had an interesting Tweet:

This sounds reasonable on the surface, but it runs into hard issues with the vast differences between any amount of money vs. free. There have been quite a few studies on this. A good reading on the subject can be found here. Take a note of the Amazon experience. Shipping that is free increases sales. Shipping that is 0.2$ does not increase sales. Note that from a practical standpoint, there is no real difference between the prices, but it matters, quite a lot.

Leaving aside the psychology of free, there are also other issues. Let’s say that there is a tool that can save someone on my team 10 minutes a day, it is priced at 1$ / year. By any measure you care to name, that is more than worth it.

But the process of actually getting this purchased is likely to be more expensive than the actual cost of the tool. In my case, the process is “go talk to Oren”. In other environments, that may involve “talk to your boss, that will submit it to accounts payable, which will pay it in 60 days”.

And at that point, charging 1$ just doesn’t matter. You could charge 50$, and the “cost” for the person making the purchase would be pretty much the same. Note that this is for corporate environment, the situation is (slightly) different for sales directly to consumers, but not significantly so. Free stills trumps everything.

When talking about freemium models, you hear quotes that are bad. For example, Evernote had a < 2% conversion rate, and that is a wildly successful example. Dropbox has a rate of about 4%, and the average seems to be 1%. And that is for businesses who are focused on optimizing the freemium funnel. That takes a lot of time and effort, mind you.

I don’t think that there is a practical option for turning this around, and I say that as someone who would love it if that were at all possible.

time to read 2 min | 245 words

I’ll be writing a lot more about our RavenDB C++ client, but today I was reviewing some code and I got a reply that made me go: “Ohhhhh! Nice”, and I just had to blog about it.

image

This is pretty much a direct transaction of how you’ll write this kind of query in C#, and the output of this is a RQL query that looks like this:

image

The problem is that I know how the C# version works. It uses Reflection to extract the field names from the type, so we can figure out what fields you are interested in. In C++, you don’t have Reflection, so how can this possibly work?

What Alexander did was really nice. Given that the user already have to provide us with the serialization routine for this type (so we can turn the JSON into the types that will be returned). Inside the select_fields() call, he constructed an empty object, serialize that and then use the field names in the resulting JSON to figure out what fields we want to project from the Users documents.

It make perfect sense, it require no additional work from the user and it gives us consistent API. It is also something that I would probably never think to do.

time to read 2 min | 274 words

After trying (and failing) to use rustls to handle client authentication, I tried to use rust-openssl bindings. It crapped out on me with a really scary link error. I spent some time trying to figure out what was going on, but given that it said that I wanted to write Rust code, not deal with link errors, I decided to see if the final alternative in the Rust eco system will work, native-tls package.

And… that is a no go as well. Which is sad, because the actual API was quite nice. The reason it isn’t going to work? The native-tls package just has no support for client certificate authentication when running as a server, so not usable for me.

That leaves me with strike three out of three:

  • rustls – native Rust API, easy to work with, but doesn’t allow to accept arbitrary client certificates, only ones from known issuers.
  • rust-openssl – I have build this on top of OpenSSL before, so I know it works. However, trying to build it on Windows resulted in link errors, so that was out.
  • native-tls – doesn’t have support for client certificates, so not usable.

I think that at this point, I have three paths available to me:

  • Give up and maybe try doing something else with Rust.
  • Fork rustls and add support for accepting arbitrary client certificates. I’m not happy with this because it requires changing not just rustls but also probably webpki package and I’m unsure if the changes I have in mind will not hurt the security of the system.
  • Try to fix the OpneSSL link issue.

I think that I’ll go with the third option, but this is really annoying.

time to read 2 min | 328 words

In my previous post, I asked about the following code and what its output will be:

As it turns out, this code will output two different numbers:

  • On Debug – 134,284,904
  • On Release – 66,896

The behavior is consistent between these two modes.

I was pretty sure that I knew what was going on, but I asked to verify. You can read the GitHub issue if you want the spoiler.

I attached to the running program in WinDBG and issued the following command:

We care about the last line. In particular, we can see that all the memory is indeed in the byte array, as expected.

Next, let’s dump the actual instances that take so much space:

There is one large instance here that we care about, let’s see what is holding on to this fellow, shall we?

It looks like we have a reference from a local variable. Let’s see if we can verify that, shall we? We will use the clrstack command and ask it to give us the parameters and local variables, like so:

The interesting line is 16, which shows:

image

In other words, here is the local variable, and it is set to null. What is going on? And why don’t we see the same behavior on release mode?

As mentioned in the issue, the problem is that the JIT introduce a temporary local variable here, which the GC is obviously aware of, but WinDBG is not. This cause the program to hold on to the value for a longer period of time than expected.

In general, this should only be a problem if you have a long running loop. In fact, we do in some case, and in debug mode, that actually caused our memory utilization to go through the roof and led to this investigation.

In release mode, these temporary variables are rarer (but can still happen, it seems).

time to read 4 min | 612 words

The fallacies of distributed computing is a topic that is very near and dear to my heart. These are a set of assertions describing false assumptions that distributed applications invariably make.

The first two are:

  • The network is reliable.
  • Latency is zero.

Whenever I talk about distributed computing, the fallacies come up. And they trip people up, over and over and over again. Even people who should know better.

Which is why I read this post with horror. That was mostly for the following quote:

As networks become more redundant, partitions become an increasingly rare event. And even if there is a partition, it is still possible for the majority partition to be available. Only the minority partition must become unavailable. Therefore, for the reduction in availability to be perceived, there must be both a network partition, and also clients that are able to communicate with the nodes in the minority partition (and not the majority partition).

Now, to be clear, Daniel literally has a PHD in CS and has published several papers on the topic. It is possible that he is speaking in very precise terms that don’t necessary match to the way I read this statement. But even so, I believe that this statement is absolutely and horribly wrong.

A network partition is rare, you say? This reading from 2014 paper for ACM Queue shows that this is anything but. Oh, sure, in the grand scheme of things, a network partition is an extremely rare event in a properly maintained data center, let’s say that this is a 1 / 500,000 chance for that happening (rough numbers from the Google Chubby paper). That still gives you 61 outages(!) in a few weeks.

Go and read the ACM paper, it makes for fascinating reading, in the same way you can’t look away from a horror movie however much you want to.

And this is talking just about network partitions. The problem is that from the perspective of the individual nodes, that is not nearly the only reason why you might get a partition:

  • If running a server using a managed platform, you might hit a stop the world GC collection event. In some cases, this can be minutes.
  • In an unmanaged language, your malloc() may be doing maintenance tasks and causing an unexpected block in a bad location.
  • You may be swapping to disk.
  • The OS might have decided to randomly kill your process (Linux OOM killer).
  • Your workload has hit some critical point (see the Expires section) and cause the server to wait a long time before it can reply.
  • Your server is on a VM that was moved between physical machines.
  • A certificate expired on one machine, but not on others, meaning that it can contact others, but cannot be contacted directly (except that already existing connections still work).

All of these are before we consider the fact that we are dealing with imperfect software and that there may be bugs, that humans are tinkering with the system (such as deploying a new version) and mess things up, etc.

So no, I utterly reject the idea that partitions are rare events in any meaningful manner. Sure, they are rare, but a million to one event? We can do million packets per second. That means that something that is incredibly rare can still happen multiple times a day. In practice, you need to be aware that your software will be running in a partition, and that you will need a way to handle that.

And go read the fallacies again, maybe print them and stick them on a wall somewhere near by. If you are working with a distributed system, it is important to remember these fallacies, because they will trip you up.

time to read 1 min | 162 words

The bug from yesterday would only show when a particular query is being run concurrently, and not always then.

Here is the code that is responsible for the bug:

It is quite hard to see, because it is so subtle. The code here create a cached lambda that is global for the process. The lambda takes the current engine, the object to transform return the converted object.

So far, so good, right?

Except that in this case,  the lambda is capturing the engine parameter that is passed to the function. The engine is single threaded, and must not be used concurrently. The problem is that the code already handles this situation, and the current engine instance is passed to the lamda, where it is never used. The original engine instance is being used concurrently, violating its invariants and causing errors down the line.

The fix was to simply use the current engine instance that was passed to us, but this was really hard to figure out.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB 5.0 (2):
    21 Jan 2020 - Exploring Time Series–Part II
  2. Webinar (2):
    15 Jan 2020 - RavenDB’s unique features
  3. Challenges (2):
    03 Jan 2020 - Spot the bug in the stream–answer
  4. Challenge (55):
    02 Jan 2020 - Spot the bug in the stream
  5. re (26):
    27 Dec 2019 - Writing a very fast cache service with millions of entries
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats