When a race condition is what you want…

time to read 4 min | 683 words

I have an interesting situation that I am not sure how to resolve. We need to record the last request time for a RavenDB database. Now, this last request time is mostly used to show the user, and to decide when a database is idle, and can be shut down.

As such, it doesn’t have to be really accurate ( a skew of even a few seconds is just fine ). However, under load, there are many requests coming in (in the case presented to us, 4,000 concurrent requests), and they all need to update the last request.

Obviously, in such a scenario, we don’t really care about the actual value. But the question is, how do we deal with that? In particular, I want to avoid a situation where we do a lot of writes to the same value in an unprotected manner, mostly because it is likely to cause contentions between cores.

Any ideas?

It is actually fine for us to go slightly back (so thread A at time T+1 and thread B at time T+2 running concurrently, and the end result is T+1), which is why I said that a race is fine for us. But what I want to avoid is any form of locking / contention.

I wrote the following test code:

class Program
{
    static void Main(string[] args)
    {
        var threads = new List<Thread>();

        var holder = new Holder();

        var mre = new ManualResetEvent(false);

        for (int i = 0; i < 2500; i++)
        {
            var thread = new Thread(delegate()
            {
                mre.WaitOne();
                for (long j = 0; j < 500*1000; j++)
                {
                    holder.Now = j;
                }
            });
            thread.Start();
            threads.Add(thread);
        }

        mre.Set();

        threads.ForEach(t => t.Join());


        Console.WriteLine(holder.Now);
    }
}

public class Holder
{
    public long Now;
}

And it looks like it is doing what I want it to. This creates a lot of contention on the same value, but it is also giving me the right value. And again, the value of right here is very approximate. The problem is that I know how to write thread safe code, I’m not sure if this is a good way to go about doing this.

Note that this code (yes, even with 2,500 threads) runs quite fast, in under a second. Trying to use Interlocked.Exchange is drastically more expensive, and Interlocked.CompareExchange is even worse.

But it is just not sitting well with me.

Tweet Share Share 28 comments

Tags:

development

Comments

16 Jul 2014
09:25 AM

Duarte Nunes

The usual solution is to have per-thread counters and then aggregate them on reads; in this case, the read would scan the sequence of Holders and report the highest. If you know the number of threads and it's static, you can have a simple array of Holders. Otherwise, use ThreadLocals that register a WeakReference to the thread-private Holder in a thread-safe globally accessible data structure to satisfy reads. By the way, your loop is probably being optimized away; that's why it's faster than it should due to contention.

16 Jul 2014
09:30 AM

Ayende Rahien

Duarte, That is a good idea, about the per thread, the problem is that then I might have 2,500 items. And having to sort / screen through on read. The loops isn't optimized away, I added a += there, so it has to go through it, and it has roughly the same perf.

16 Jul 2014
10:31 AM

Peter

You could use Monitor.TryEnter and only update the value if you manage to get a lock but my guess that will be even slower than Interlocked.Exchange...

16 Jul 2014
10:47 AM

Quintonn

What about a publish/subscribe scenario, where each thread publishes, or broadcasts a message, and a different thread subscribes to that event and updates the "now" long value? That way the threads that broadcast, don't have to worry about locks. But the listening thread can manage access to the value.

16 Jul 2014
11:26 AM

Luaan

What if you didn't update the last access time on every iteration of the cycle? Perhaps you could keep a local counter of operations and for every Xth operation you'd set the (shared) last access variable. And of course, any time there's a wait long enough. Stopwatch might also be useful, though I'd expect it to be a measurable performance hit. As for synchronizing the access, I don't see a problem with it as is, as long as the write is atomic - worst case scenario, you'll write a value that's slightly lagging - which should really be much less than a second even in your case.

16 Jul 2014
11:32 AM

tobi

Well, this code is course very fast because you only synchronize 2500 times which is nothing. You are not really testing threading here. Not even CPU internals.

You could have one datetime value on an isolated cache line per core (not per thread). There is a Windows API for querying the current core number.

That strategy avoids cache line pinging like a shared mutable cache line does.

16 Jul 2014
11:33 AM

Rafal

Maybe check the modification date of a database file?

16 Jul 2014
11:36 AM

Ayende Rahien

Rafal, That would require _IO_.

16 Jul 2014
11:47 AM

Fabian Wetzel

        for (int i = 0; i < 500; i++)
        {
            var thread = new Thread(delegate()
            {
                mre.WaitOne();
                long? lastReportedJ = null;
                for (long j = 0; j < 500*1000; j++)
                {
                    if (!lastReportedJ.HasValue || j - lastReportedJ.Value > 10000) //skew of "a second"
                    {
                        holder.Now = j;
                        lastReportedJ = j;
                    }
                }
            });
            thread.Priority = ThreadPriority.BelowNormal;
            thread.Start();
            threads.Add(thread);
        }

This is close to not setting holder.now at all regarding the time it takes but it will be mostly accurate if j will be of DateTime.Now because if the timespan since the last call is high, it will set the last accessed time on every call.

16 Jul 2014
11:57 AM

Fabian Wetzel

Okay so, in case you are me, you can simply improve the timing by starting WITHOUT debugging :-/

CTRL+F5

16 Jul 2014
12:09 PM

OmariO

Make it volatile and it will be fast enough (especially on x86 procs) and correct enough for your scenario.

16 Jul 2014
13:53 PM

alex

With 2500 concurrent requests (or 4000 as mentioned for the case presented to you), is this timestamp the only "shared data" area of contention between these requests?

If you are already tracking other request metrics (e.g. #requests/time unit), could this information be already present?

16 Jul 2014
15:59 PM

Dariel Marlow

Perhaps using Rx with a BehaviorSubject with a Buffer of whatever time you want?

http://www.introtorx.com/content/v1.0.10621.0/02_KeyTypes.html#BehaviorSubject

16 Jul 2014
16:58 PM

Mark Miller

Duarte's mention of per-thread counters reminded me of an article I read on partitioning: http://kejser.org/synchronisation-in-net-part-4-partitioned-data-structures/.

So I don't think he meant scanning a sequence of 2500 holders - but 1 per CPU core. This is much more scale-able and sounds to me like it would be just what you're looking for.

A nice plus is the article above actually shows benchmarks for a number of synchronization techniques. It's a 4 part article if you're interested in the details of each technique. But if you just want the solution - the link I provided (part 4) is all you really need.

16 Jul 2014
20:00 PM

Jon Norton

If I understand right, your concern is primarily about the fact that the cores won't be able to properly cache the value and not the total number of writes you're making. I would approach that problem by allowing each thread to write to its own value and having a timer that periodically updates the global value.

Something like this would work.

class Program { static void Main(string[] args) { var threads = new List<Thread>();

    var holder = new Holder();
    var localHolders = new List<Holder>();

    var mre = new ManualResetEvent(false);

    for (int i = 0; i < 2500; i++)
    {
        var localHolder = new Holder();
        localHolders.Add(localHolder);
        var thread = new Thread(delegate()
        {
            mre.WaitOne();
            for (long j = 0; j < 500*1000; j++)
            {
                localHolder.Now = j;
            }
        });
        thread.Start();
        threads.Add(thread);
    }

    mre.Set();
    var timer = new Timer(() => { holder.Now = localHolders.Max(lh => lh.Now); }, TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(5));

    threads.ForEach(t => t.Join());


    Console.WriteLine(holder.Now);
}

}

public class Holder { public long Now; }

16 Jul 2014
20:15 PM

Johannes

Please help me out: Why is your result always correct, although according to the c# specs a write to a long is not atomic? Couldn't it happen that thread 1 writes the first few bytes of j, then thread 2 writes j completely and thread 1 writes the second part of j, so that at the end j contains the first part of thread 2's j and the second part of thread 1's j?

16 Jul 2014
21:47 PM

Ayende Rahien

Johannes, On 64 bits, setting a long is atomic.

17 Jul 2014
07:47 AM

Sasha Goldshtein

I don't get the part "but it is just not sitting well with me". You know that a write of a 64-bit value (on Windows 64-bit) is atomic. I think you also know that you don't need locking or volatile variables if you just want the last write to win.

What is it that you want, then? Do you want people to tell you that there is a slower way of doing the same thing? :-)

17 Jul 2014
10:37 AM

Ayende Rahien

Sasha, I want to make sure that there aren't any hidden traps for this. For example, on 32 bits, that would cause issues because of non atomic writes. I want to make sure that it isn't possible on a multi core system, the "winning" value can be a very old one.

17 Jul 2014
12:24 PM

Joseph N. Musser II

You could write a boxed long. The object reference would be the words size of the processor and guaranteed to be atomic on any platform.

17 Jul 2014
12:25 PM

Joseph Musser

*word size. Ability to edit has spoiled me.

17 Jul 2014
12:26 PM

Joseph N. Musser II

Afaik, if you need more than 32-bits, it's either indirection to the heap or Interlocked.Write. I wonder which outperforms?

17 Jul 2014
14:31 PM

Chris Marisic

From all that's been said about precision is not important.

I would seek to eliminate writes.

now = DateTime.UtcNow; LastWrite = LastWrite ?? now;

if(Math.Absolute(LastWrite - now) > delta) LastWrite = now;

Then you have some contention but it's much briefer basically only the exact concurrent access across cores.

Of course is Math.Absolute more costly than something as low level as Interlock? I have no idea.

17 Jul 2014
16:20 PM

Matt Nelson

Why not use something like a ConcurrentQueue (System.Collections.Concurrent) to buffer all the last accessed times, and have a seperate thread periodically grab the last value from the queue, clear it, and write just that one value?

22 Jul 2014
19:20 PM

Chris Marisic

@Matt Nelson that sounds like an awesome idea, unfortunately from what I know about the Concurrent classes in .NET they are SLOWWWW. But whether that matters for this? I have no idea.

I can't really fathom micro-performance concerns like how Ayende determined Interlocked methods aren't acceptable.

26 Jul 2014
07:12 AM

Sergey

The main idea in my implementation is to use int for storing time diff, so it should work on x86 machines.

internal class Program
{
    private static void Main(string[] args)
    {
        var threads = new List<Thread>();

        var holder = new Holder();

        var mre = new ManualResetEvent(false);

        for (int i = 0; i < 2500; i++)
        {
            var thread = new Thread(delegate()
            {
                mre.WaitOne();
                for (long j = 0; j < 500*1000; j++)
                {
                    holder.Update();
                }
            });
            thread.Start();
            threads.Add(thread);
        }

        mre.Set();

        threads.ForEach(t => t.Join());


        Console.WriteLine(holder.Now);
    }
}

public class Holder
{
    //We don't have to initialize this value with the correct date of the app start, 
    //we can use specific date in the past, let say 2010-01-01.
    public DateTime StartTime;

    public long StartTicks;

    public DateTime Now
    {
        get { return StartTime.Add(TimeSpan.FromTicks(_diffInSeconds * TimeSpan.TicksPerSecond)); }
    }

    private int _diffInSeconds;

    public Holder()
    {
        var now = DateTime.UtcNow;
        StartTicks = now.Ticks;
        StartTime = now;
    }

    public void Update()
    {
        long diff = DateTime.UtcNow.Ticks - StartTicks;
        //_diffInSeconds can store upto 68 years, also we can change it to uint
        _diffInSeconds = (int)(diff/TimeSpan.TicksPerSecond);
    }
}

31 Jul 2014
12:43 PM

Greg Young

You are optimizing a memory operation in the middle of an io bound operation. As you point out you may even have 4000 of these per second. Are you suggesting an Interlocked.Exchange is too slow to be called 4000 times/second? How many times/second can you call it?

01 Aug 2014
05:35 AM

Ayende Rahien

Greg, This isn't so much about this single effect slowing things down, but it is the desire to avoid this + other stuff. Without contention, you can call is 73 million times on my machine, with contention, 36 million times.

But I don't need that, and mostly I cared about just seeing that my race condition solution doesn't have dangerous side effects.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB