Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 285 words

In a recent PR, I run into this code, which is used in query generation to decide if we need to quote a particular alias. The code itself is pretty straightforward and easy to follow:

image

image

It also have two distinct issues. First, there is the allocation because of the ToUpper call and second, we are doing O(N) search on the alias array every single time.

I asked for a change, to use HashSet and to use the OrdinalIgnoreCase comparer.

Here is the change I got back:

image

image

This is exactly what I asked for, and it is very subtly wrong. We are now saving an allocation, which is great, but the problem is with the Contains method.

image

This looks okay, but this is not HashSet.Contains, instead, this is an extension method from Enumerable.Contains, which iterates over the set and compare each value.

The fix is also simple:

image

image

And now we don’t have O(N) anymore.

Although I’ll admit that for such small size, it probably doesn’t matter.

time to read 2 min | 247 words

The following set of issues all fall into code that is used within the scope of a single assembly, and that is important. I’m writing this blog post before I got the chance to talk to the dev in question, so I’m guessing about intent.

image

This change is likely motivated by the fact that callers are not expected to make a modification to the resulting dictionary.

That said, this is used between different components in the same assembly, and is never exposed outside. That means that we have a much higher trust between the components, and reading IReadOnlyDictionary means that we need to spend more cycles trying to figure out who you are trying to protect from.

Equally important, in this case, the Dictionary methods can be called without any virtual call overhead, while the IReadOnlyDictionary needs interface dispatch to work.

image

This is a case that is a bit more subtle. The existingData is a variable that is passed to a method. The problem is that in this case, no one is ever going to send null, and sending a null is actually an error.

In this case, if we did get a null, I would rather that the code would immediately crash with “what just happened?” rather than limp along with bad data.

time to read 1 min | 90 words

This is part of a PR related to making sure that disposing once works. It contains this code:

image

This loses critically important information. Namely, the stack trace of the original exception. That leaves aside the issue that an aggregate exception may contain multiple exceptions as well.

In general, and I know this is old hat, whenever you see “throw e;” or “throw e.InnerException;” of any kind, you should always treat it as a bug.

time to read 1 min | 110 words

I had to reject the following change in a recent PR. IN this context, the flags and conflicted.Flags are the same, and that wasn’t the problem. Can you spot the issue?

image

The problem is that the second version does an allocation. It does this silently, and you need to know about this issue to know that this happens. There is good discussion on this in this StackOverflow question.

It looks like this has been fixed in the JIT for CoreCLR and will be part of the 2.1 release when it is out.

time to read 3 min | 453 words

There is a reason why people talk about idiomatic code. Code that is idiomatic to the language matches what it expect and it generally faster / easier to work with for both developers and the compiler / runtime.

During a PR review, I run into this code:

image

The idiomatic manner for writing this code would have been any of:

  • “@id” == property
  • Constants.Documents.Metadata.Id == property
  • property.Equals(Constants.Documents.Metadata.Id)
  • Constants.Documents.Metadata.Id.Equals(property)

I can argue that the second option is the most idiomatic, and that the 3rd option can fail with Null Reference Exception if the property is null, but all of them are pretty clear.

Now, RavenDB has a lot on non idiomatic code, usually when we need to get more performance. For example:

image

This is code that is doing very much what is done above, but it does this on the raw byte buffer, and it knows that it is accessing UTF8 characters, so we can do some nice optimizations there to compare by just doing two instructions.

Indeed, when queried, the developer answered:

Most of the time its going to be false and comparing ints is cheaper than strings

There are several problems with this. First, this particular piece of code isn’t in a part of the code that is extremely performance sensitive. The string buffer work above is for processing requests from the network, a piece of code that can be called tens and hundreds of thousands of times per second. Performance there matters, a lot.  This code is meant to be called as part of streaming results to the user, so it is likely to handle very large volume of data. Performance there matters, for sure, but we need to consider how much it matters.

Second, let us peek into what will actually happen if we drop the property.Length check. The call will end up calling to the native string routines in the CLR, and the relevant portion is:

image

In other words, this check is already going to happen, we didn’t really save anything from making it.

Third, and the most subtle of them all. This check is using a check against a constant, whose value is “@id”. It also check that the property .Length is equal to 3. The whole point of using a constant is that we need to replace it in just one location. But in this case, we will likely change the constant value, not realize that there is a hardcoded length elsewhere in the code and fail miserably with hard to explain behavior.

time to read 1 min | 129 words

During reviewing a PR I run into what seemed like a strange thing. Take a look at this change:

image

This came with its own exception class, and left me pretty confused. Why would I want to have something like that?

Here we have some error handling code that doesn’t seem to add any additional value. Everything in the error here can be deducted from the details of the exception that will be thrown if we did nothing.

The fact we throw a specialized exception might be meaningful, but looking at the code, this isn’t actually used for anything.

Like all code, error handling needs to justify itself, and this one doesn’t pass the bar.

time to read 2 min | 228 words

During code review I run into these two sections, which raised a flag. Can you tell why?

image

image

The problem with this type of code is two fold. First, we add optional parameters, to reduce the number of breaking changes we have. The problem with that is that we already have parameters on the call, and eventually you’ll get to something like this:

image

Which is the queen of optional parameters method, and you can probably guess how it looks internally.

In the first case, we can add the new optional parameter to the… options variable that we are already sending this method. This way, we don’t have to worry about breaking changes, and we already have a way to setup options, determine defaults, etc.

In the second case, we are passing two bools to the method, and there isn’t a preexisting parameters object. Instead of creating one, we can use a Flags enum, whose bits we can set to determine what exactly the behavior of this method should be. That is generally much easier to maintain in the long run.

time to read 2 min | 202 words

I run into the following in a PR I recently reviewed. Take a look and try to figure out why I’m pointing this bits out:

image_thumb[2]

image_thumb[3]

Those are bad errors. In the sense that they are hiding information. Sure, in 90% of the cases the user just put it the backup location or the restore location, so they know what the application is referring to. But the other 10% is looking at the exception from the logs, having “I got an error” support call or all sort of weird stuff.

In particular, one of the most annoying things is when the user and the application disagree on a relative path. If the user believes that the relative path is based on one location and the application another, hilarity ensues. But not in a fun way.

A better way to prevent all of those would be to just include the actual locations (fully resolved, mind) in the error. It doesn’t prevent support calls, but it does make them a lot easier to solve.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}