Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 234 words

We had a need to generate unique ids for connections in our db. We used the following code:

public class NotificationsClientConnection : IDisposable
{
    private static int _counter = 0;

    private readonly WebSocket _webSocket;
    private readonly DocumentDatabase _documentDatabase;
    private readonly DateTime _startedAt;


    public NotificationsClientConnection(WebSocket webSocket, DocumentDatabase documentDatabase)
    {
        _webSocket = webSocket;
        _documentDatabase = documentDatabase;
        _startedAt = SystemTime.UtcNow;
    }

    public int Id => Interlocked.Increment(ref _counter);

    public TimeSpan Age => SystemTime.UtcNow - _startedAt;
}

Do you see the bug? It is a subtle one.

When you are ready, peek below.

.

.

.

.

.

.

.

.

.

.

.

.

.

The issue is in the following line:

    public int Id => Interlocked.Increment(ref _counter);

The intent was to create a field that would be initialized on startup. Unfortunately, the usage of the lambda symbol "=>" instead of assignment "=" turned that into a computed property, which will return a new value on every call.

time to read 3 min | 583 words

In my previous post I showed the following code, and asked what the bug was, and what the implications of that would be.

class Program
{
    private Timer nextcheck;
    public event EventHandler ServerSigFailed;
    static void Main(string[] args)
    {
        var program = new Program();
        if(program.ValidateServerSig() == false) 
            return;
        program.DoOtherStuff();
    }

    public bool ValidateServerSig()
    {
        nextcheck = new Timer(state => ValidateServerSig());

        var response = DoRequest("http://remote-srv/signature");
        if(response.Failed)
        {
            var copy = ServerSigFailed;
            if(copy!=null) copy(this, EventArgs.Empty);
            return false;
        }
        var result = Utils.CheckPublic KeySignatureMatches(response);
        if(result.Valid)
        if(response.Failed)
        {
            var copy = ServerSigFailed;
            if(copy!=null) copy(this, EventArgs.Empty);
            return false;
        }
        
        // setup next check
        nextcheck.Change(TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(15));
        return true;
    }
}

The answer is quite simple, look at the first line of the ValidateServerSig function. It setups a new timer that recursively call this function. So every 15 seconds, we are going to have a new timer to run this function.

And 15 seconds after that, we are going to have 4 timers, and 15 seconds after that…

But the interesting thing here that those are timers, so they get scheduled on the thread pool. So while the growth rate of the number of tasks is phenomenal, in practice, they don't get run all together.So after 20 minutes of this, it will only run about 800 times. Because the thread pool is growing slowly, we'll end up with a much slower growth rate, but eventually the thread pool queues are going to be filled with nothing but this task. And since we typically also process requests on the thread pool, eventually requests cannot be handled, because the threads are always busy running the validation code.

In the real codebase, the timing was 5 minutes, and this issue typically manifested itself only after several weeks of continuous running.

time to read 2 min | 395 words

The following code contains a bug, can you spot it? 

class Program
{
    private Timer nextcheck;
    public event EventHandler ServerSigFailed;
    static void Main(string[] args)
    {
        var program = new Program();
        if(program.ValidateServerSig() == false) 
            return;
        program.DoOtherStuff();
    }

    public bool ValidateServerSig()
    {
        nextcheck = new Timer(state => ValidateServerSig());

        var response = DoRequest("http://remote-srv/signature");
        if(response.Failed)
        {
            var copy = ServerSigFailed;
            if(copy!=null) copy(this, EventArgs.Empty);
            return false;
        }
        var result = Utils.CheckPublic KeySignatureMatches(response);
        if(result.Valid)
        if(response.Failed)
        {
            var copy = ServerSigFailed;
            if(copy!=null) copy(this, EventArgs.Empty);
            return false;
        }
        
        // setup next check
        nextcheck.Change(TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(15));
        return true;
    }
}

What are the implications of this bug?

time to read 1 min | 178 words

Originally posted at 4/12/2011

Can you spot the NRE hiding in the code?

if (parentObject!= null)
{
    lock (parentObject)
    {
        if (parentObject != null)
        {
            properties = properties.Clone();
            parentObject[parentKey] = this;
            parentObject = null;
        }
    }
}
time to read 3 min | 573 words

Originally posted at 11/22/2010

This method has a bug, a very subtle one. Can you figure it out?

public IBinarySearchTree<TKey, TValue> TryRemove(TKey key, out bool removed, out TValue value)
{
    IBinarySearchTree<TKey, TValue> result;
    int compare = comparer.Compare(key, theKey);
    if (compare == 0)
    {
        removed = true;
        value = theValue;
        // We have a match. If this is a leaf, just remove it 
        // by returning Empty.  If we have only one child,
        // replace the node with the child.
        if (Right.IsEmpty && Left.IsEmpty)
            result = new EmptyAVLTree<TKey, TValue>(comparer, deepCopyKey, deepCopyValue);
        else if (Right.IsEmpty && !Left.IsEmpty)
            result = Left;
        else if (!Right.IsEmpty && Left.IsEmpty)
            result = Right;
        else
        {
            // We have two children. Remove the next-highest node and replace
            // this node with it.
            IBinarySearchTree<TKey, TValue> successor = Right;
            while (!successor.Left.IsEmpty)
                successor = successor.Left;
            result = new AVLTree<TKey, TValue>(comparer, deepCopyKey, deepCopyValue, successor.Key, 
                successor.Value, Left, Right.TryRemove(successor.Key, out removed, out value));
        }
    }
    else if (compare < 0)
        result = new AVLTree<TKey, TValue>(comparer, deepCopyKey, deepCopyValue, 
            theKey, theValue, Left.TryRemove(key, out removed, out value), Right);
    else
        result = new AVLTree<TKey, TValue>(comparer, deepCopyKey, deepCopyValue,
            theKey, theValue, Left, Right.TryRemove(key, out removed, out value));
    return MakeBalanced(result);
}
time to read 3 min | 412 words

The follow code is part of RavenDB HiLo implementation:

        private long NextId()
        {
            long incrementedCurrentLow = Interlocked.Increment(ref currentLo);
            if (incrementedCurrentLow > capacity)
            {
                lock (generatorLock)
                {
                    if (Thread.VolatileRead(ref currentLo) > capacity)
                    {
                        currentHi = GetNextHi();
                        currentLo = 1;
                        incrementedCurrentLow = 1;
                    }
                }
            }
            return (currentHi - 1)*capacity + (incrementedCurrentLow);
        }

It contains a bug, can you see it? I took a long time to figure it out, I am ashamed to say.

BTW, you can safely assume that GetNextHi is correct.

time to read 1 min | 183 words

I was working with a client about a problem they had in integrating EF Prof to their application, when my caught the following code base (anonymized, obviously):

public static class ContextHelper
{
     private static Acme.Entities.EntitiesObjectContext _context;

     public static Acme.Entities.EntitiesObjectContext CurrentContext
     {
           get { return _context ?? (_context = new Acme.Entities.EntitiesObjectContext()); }
      }

}

That caused me to stop everything and focus the client’s attentions on the problem that this code can cause.

What were those problems?

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}