The smallest bugs, the biggest problems – Part II

time to read 2 min | 237 words

Originally posted at 11/22/2010

In a previous post, I talked about how I found the following (really nasty) bug in RavenDB’s managed storage (which is still considered unstable, btw):

When deleting documents in a database that contains more than 2 documents, and  the document(s) deleted are deleted in a certain order, RavenDB would go into 100% CPU. The server would still function, but it would always think that it had work to do, even if it didn’t have any.

Now, I want to talk about the actual bug.

image

What I did wrong here is to reuse the removed and value parameters in the second call to TryRemove. That call is internal, and is only needed to properly balance the tree, but what it ended up doing is always return the removed/value from the right side of the tree.

Compounding the problem is that I only actually used the TryRemove value in a single location, and even then, it is a mistake. Take a look:

image

That meant that I actually looked for the problem in the secondary indexes for a while, before realizing that the actual problem was elsewhere.