Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 7,087 | Comments: 49,877

filter by tags archive
time to read 2 min | 371 words

I posed a potential problem for a job interview. Given the following function, generate another key that has the same shard:

In other words, give “users/71” and a prefix of “orders/654”, generate a key that would be placed on the same shard as “users/71”. The answer in this case can be: “orders/654-vaueaa”.

In order to answer the question, we need to understand what is going on here. The function above is a fancy way to extract 16 bits of information from the key using a cryptographically hash function. MD5 is no longer considered secured, but given the fact that I’m needing just 16 bits, that is not an issue. The code above is slightly more complex than needed, I could simplify it to this and have the same effect (but not the same result, mind):

The need to generate a matching shard id is another way to say that we need a hash collision. Given that the key space is 2^16, and that we can assume that any mutation to the key will result effectively random changes to the result, we can simply generate different keys and try to see if they match. Here is a simple way to do so:

We are effectively throwing a dice and seeing if this match. So it is a probability game to wait until we have a collision. The actual implementation isn’t that important, what is interesting is to talk about the implications here:

  • Are there better ways to go about doing something like this? Not really, given that MD5 isn’t that broken.
  • How much time will it take to generate a shard id match? The answer, usually around 64K tries. But why is interesting. The birthday attack issues don’t play here, because we don’t need to match to multiple items, just one. So we role the dice and see if we match on the value.
  • Can we speed this up? Using a different hash function would probably help, yes.
  • What other ways do we have to handle this? Different shard id generation would allow much better alternative.

The last question is where we get into more interesting details about system design, ergonomics of the choices we make and get to see how the candidate actually thinks.

time to read 1 min | 118 words

Here is something that came up that I liked as task to handle in a job interview. We have the following function, generating a shard id from a document key:

There is a need to create a new document that must reside on the same shard as another document. The task is to write a function that would generate that id.

For example:

  • Key to match: users/71
  • Document id to modify: orders/654

The result can be: orders/654-22261 or orders/654-vaueaa

In other words, both orders/654-vaueaa and users/71 will have the same shard.

I like this because it is very easy to explain and has simple coding requirements. But it allows to go deep into several different fields of knowledge to understand the candidate’s solution.

time to read 3 min | 523 words

A RavenDB user called us with a very strange issue. They are running on RavenDB 3.5 and have run out of disk space. That is expected, since they are storing a lot of data. Instead of simply increasing the disk size, they decided to take the time and simply increase the machine overall capacity. They moved from a 16 cores machine to a 48 cores machine with a much larger disk.

After the move, they found out something worrying. RavenDB now used a lot more CPU. If previous the average load was around 60% CPU utilization, now they were looking at 100% utilization on a much more powerful machine. That didn’t make sense to us, so we set out to figure out what was going on. A couple of mini dumps and we were able to figure out what was going on.

It got really strange because there was the following interesting observation:

  • Under minimal load / idle – no CPU at all
  • Under high load – CPU utilization in the 40%
  • Under medium load – CPU utilization at 100%

That was strange. When there isn’t enough load, we are at a 100%? What gives?

The culprit was simple: BlockingCollection.

“Huh”, I can hear you say. “How can that be?”

A BlockingCollection should not be the cause of high CPU, right? It is in the name, it is blocking. Here is what happened. That blocking collection is used to manage tasks, and by default we are spawning threads to handle that at twice the number of available cores. All of these threads are sitting in a loop, calling Take() on the blocking collection.

The blocking collection internally is implemented as using a SemaphoreSlim, which call Wait() and Release() on the values as needed. Here is the Release() method notifying waiters:

image

What you can see is that if we have more than a single waiter, we’ll update all of them. The system in question had 48 cores, so we had 96 threads waiting for work. When we add an item to the collection, all of them will wake and try to pull an item from the collection. Once of them will succeed, and then rest will not.

Here is the relevant code:

image

As you can imagine, that means that we have 96 threads waking up and spending a full cycle just spinning. That is the cause of our high CPU.

If we have a lot of work, then the threads are busy actually doing work, but if there is just enough work to wake the threads, but not enough to give them something to do, they’ll set forth to see how hot they can make the server room.

The fix was to reduce the number of threads waiting on this queue to a more reasonable number.

The actual problem was fixed in .NET Core, where the SemaphoreSlim will only wake as many threads as it has items to free, which will avoid the spin storm that this code generates.

time to read 2 min | 254 words

We run a lot of benchmarks internally and sometimes it feels like there is a roaming band of performance focused optimizers that go through the office and try to find under utilized machines. Some people mine bitcoin for fun, in our office, we benchmark RavenDB and try to see if we can either break a record or break RavenDB.

Recently a new machine was… repurposed to serve as a benchmarking server. You can call it a right of passage for most new machines here, I would say. The problem with that machine is that the client would error. Not only would it fail, but at the exact same interval. We tested that from multiple clients and from multiple machines and found that every 30 minutes on the dot, we’ll have an outage that lasted under one second.

Today I come to the office to news that the problem was found:

image

It seems that after 30 minutes of idle time (no user logged in), the machine would turn off the ethernet, regardless of if there are active connections going on. Shortly afterward it would be woken up, of course, but it would be down just enough time for us to notice it.

In fact, I’m really happy that we got an error. I would hate to try to figure out latency spikes because of something like this, and I still don’t know how the team found the root cause.

time to read 2 min | 252 words

In my previous post, I asked you to find the bug in the following code:

This code looks okay, at a glance, but it turns out that this is a really nasty data corruption bug waiting to happen. Here is what the problematic usage looks like:

Do you see the error now?

If the operation will time out, an exception will be raised, but the underlying operation isn’t over. We are using a shared pool, so the buffer we use may be handed over to someone else. At this point, we do something with the buffer, but the pending I/O operation will read data into this buffer, meaning that this is probably going to be garbage in it when we actually use it.

To actually happen, you need to have a timeout operation, reuse of the buffer and the I/O operation completing at just the wrong time. So a sequence of highly unlikely events that would assuredly happen within an hour of pushing something like that to production. For fun, this will reliably happen the moment you have some network issues. So imagine that you have a slow node, which then cause memory corruption, which end up being a visible bug (instead of maybe aborted request) very rarely, and with no indication on how this happened.

How do you fix this? Like this:

This will use a cancellation token, which will cause the operation to be aborted at the stream level, meaning that we can safely reuse values that we passed the underlying stream.

time to read 2 min | 400 words

Hadi’s had an interesting Tweet:

This sounds reasonable on the surface, but it runs into hard issues with the vast differences between any amount of money vs. free. There have been quite a few studies on this. A good reading on the subject can be found here. Take a note of the Amazon experience. Shipping that is free increases sales. Shipping that is 0.2$ does not increase sales. Note that from a practical standpoint, there is no real difference between the prices, but it matters, quite a lot.

Leaving aside the psychology of free, there are also other issues. Let’s say that there is a tool that can save someone on my team 10 minutes a day, it is priced at 1$ / year. By any measure you care to name, that is more than worth it.

But the process of actually getting this purchased is likely to be more expensive than the actual cost of the tool. In my case, the process is “go talk to Oren”. In other environments, that may involve “talk to your boss, that will submit it to accounts payable, which will pay it in 60 days”.

And at that point, charging 1$ just doesn’t matter. You could charge 50$, and the “cost” for the person making the purchase would be pretty much the same. Note that this is for corporate environment, the situation is (slightly) different for sales directly to consumers, but not significantly so. Free stills trumps everything.

When talking about freemium models, you hear quotes that are bad. For example, Evernote had a < 2% conversion rate, and that is a wildly successful example. Dropbox has a rate of about 4%, and the average seems to be 1%. And that is for businesses who are focused on optimizing the freemium funnel. That takes a lot of time and effort, mind you.

I don’t think that there is a practical option for turning this around, and I say that as someone who would love it if that were at all possible.

time to read 2 min | 245 words

I’ll be writing a lot more about our RavenDB C++ client, but today I was reviewing some code and I got a reply that made me go: “Ohhhhh! Nice”, and I just had to blog about it.

image

This is pretty much a direct transaction of how you’ll write this kind of query in C#, and the output of this is a RQL query that looks like this:

image

The problem is that I know how the C# version works. It uses Reflection to extract the field names from the type, so we can figure out what fields you are interested in. In C++, you don’t have Reflection, so how can this possibly work?

What Alexander did was really nice. Given that the user already have to provide us with the serialization routine for this type (so we can turn the JSON into the types that will be returned). Inside the select_fields() call, he constructed an empty object, serialize that and then use the field names in the resulting JSON to figure out what fields we want to project from the Users documents.

It make perfect sense, it require no additional work from the user and it gives us consistent API. It is also something that I would probably never think to do.

time to read 2 min | 274 words

After trying (and failing) to use rustls to handle client authentication, I tried to use rust-openssl bindings. It crapped out on me with a really scary link error. I spent some time trying to figure out what was going on, but given that it said that I wanted to write Rust code, not deal with link errors, I decided to see if the final alternative in the Rust eco system will work, native-tls package.

And… that is a no go as well. Which is sad, because the actual API was quite nice. The reason it isn’t going to work? The native-tls package just has no support for client certificate authentication when running as a server, so not usable for me.

That leaves me with strike three out of three:

  • rustls – native Rust API, easy to work with, but doesn’t allow to accept arbitrary client certificates, only ones from known issuers.
  • rust-openssl – I have build this on top of OpenSSL before, so I know it works. However, trying to build it on Windows resulted in link errors, so that was out.
  • native-tls – doesn’t have support for client certificates, so not usable.

I think that at this point, I have three paths available to me:

  • Give up and maybe try doing something else with Rust.
  • Fork rustls and add support for accepting arbitrary client certificates. I’m not happy with this because it requires changing not just rustls but also probably webpki package and I’m unsure if the changes I have in mind will not hurt the security of the system.
  • Try to fix the OpneSSL link issue.

I think that I’ll go with the third option, but this is really annoying.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Open Source & Money (2):
    19 Nov 2020 - Part II
  2. Webinar recording (10):
    28 Oct 2020 - Advanced Search Scenarios in RavenDB
  3. re (27):
    27 Oct 2020 - Investigating query performance issue in RavenDB
  4. Reminder (10):
    25 Oct 2020 - Online RavenDB In Action Workshop tomorrow via NDC
  5. Podcast (3):
    17 Aug 2020 - #SoLeadSaturday with Oren Eini
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats