Concurrent max value

Aug 28 2015

Concurrent max value

time to read 2 min | 324 words

For a feature in RavenDB, I need to figure out the maximum number of outputs per document an index has. Now, indexing runs in multiple threads at the same time, so we ended up with the following code:

var actualIndexOutput = maxActualIndexOutput;
if (actualIndexOutput > numberOfAlreadyProducedOutputs)
{
    // okay, now let verify that this is indeed the case, in thread safe manner,
    // this way, most of the time we don't need volatile reads, and other sync operations
    // in the code ensure we don't have too stale a view on the data (beside, stale view have
    // to mean a smaller number, which we then verify).
    actualIndexOutput = Thread.VolatileRead(ref maxActualIndexOutput);
    while (actualIndexOutput > numberOfAlreadyProducedOutputs)
    {
        // if it changed, we don't care, it is just another max, and another thread probably
        // set it for us, so we only retry if this is still smaller
        actualIndexOutput = Interlocked.CompareExchange(
            ref maxActualIndexOutput, 
            numberOfAlreadyProducedOutputs,
            actualIndexOutput);
    }
}

The basic idea is that this code path is hit a lot, once per document indexed per index. And it also needs to be thread safe, so we first do an unsafe operation, then a thread safe operations.

The idea is that we’ll quickly arrive at the actual max number of index inputs, but we don’t have to pay the price of volatile reads or thread synchronization.

Tweet Share Share 10 comments

Tags:

Comments

28 Aug 2015
11:26 AM

Jahmai Lay

What if the first assignment reads a cached value of maxActualIndexOutput and then your first condition fails?

28 Aug 2015
13:25 PM

mark

Volatile loads and stores are exactly free on x86 with the exception that they force an actual memory access instruction. But you want that anyway here. As written the code does not guarantee that an access will happen in the first if which seems to be a bug. Not sure.

Can you benchmark the perf impact of this optimization? I think it's negative.

28 Aug 2015
14:14 PM

OmariO

Yes, we want to see perf numbers :)

28 Aug 2015
16:27 PM

Oren Eini

Jahmai, If it reads a cached value, it has to be smaller (the value can only go up). In that case, it might fit the if statement, so we fall to the actual thread safe check, which reads the real value.

I had to try it with 16 threads running 10,000,000 iterations. Time to run without the unsafe checks: 00:00:00.7018961

Time to run with the unsafe checks: 00:00:00.5750781

28 Aug 2015
23:07 PM

mark

Well, I guess there is some aspect to the problem that I do not understand. This should not work but it does :)

29 Aug 2015
01:49 AM

Jahmai Lay

Actually, I was considering the scenario where the value has changed but the read doesn't get the new value, so the if statement fails because the value is equal and fails the > condition. This is the likely scenario of not using a volatile read. All in all, I think that this possibility is worth just doing the volatile read every time. 125ms over 10m iterations is a micro optimization that doesn't seem worth that possibility.

29 Aug 2015
12:00 PM

Oren Eini

Jahmai, Consider that this is called per indexed document for each index. If you are indexing a lot of documents, this adds up. Not a lot, but good perf is compromised of a lot of optimizatons like that

29 Aug 2015
13:10 PM

Jahmai Lay

Perhaps I'm missing the significance of this number, but, if some occasional failures of the initial condition are acceptable, and you are convinced that the invalidation of the cpu caching for that variable happens often enough such that it doesn't matter, why bother with the volatile read at all?

31 Aug 2015
11:44 AM

Oren Eini

Jahmai, If you read the cached value, you'll read a smaller value, so you'll match the condition, then get into the thread safe value, which gives the correct value. The whole point is that most of the time, you'll quickly reach the actual max value, which doesn't change (90% of the time, it is always 1), so you don't need to go through the expensive thread safe mode, you use the cached value.

31 Aug 2015
12:07 PM

Jahmai Lay

I understand the premise, but I am not convinced from this code snippet that it's a well reasoned application of the pattern. If you read a smaller value, that condition fails, so I'm not sure what you mean there. I need to read the whole code to understand it better.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB