Concurrent max value
For a feature in RavenDB, I need to figure out the maximum number of outputs per document an index has. Now, indexing runs in multiple threads at the same time, so we ended up with the following code:
var actualIndexOutput = maxActualIndexOutput; if (actualIndexOutput > numberOfAlreadyProducedOutputs) { // okay, now let verify that this is indeed the case, in thread safe manner, // this way, most of the time we don't need volatile reads, and other sync operations // in the code ensure we don't have too stale a view on the data (beside, stale view have // to mean a smaller number, which we then verify). actualIndexOutput = Thread.VolatileRead(ref maxActualIndexOutput); while (actualIndexOutput > numberOfAlreadyProducedOutputs) { // if it changed, we don't care, it is just another max, and another thread probably // set it for us, so we only retry if this is still smaller actualIndexOutput = Interlocked.CompareExchange( ref maxActualIndexOutput, numberOfAlreadyProducedOutputs, actualIndexOutput); } }
The basic idea is that this code path is hit a lot, once per document indexed per index. And it also needs to be thread safe, so we first do an unsafe operation, then a thread safe operations.
The idea is that we’ll quickly arrive at the actual max number of index inputs, but we don’t have to pay the price of volatile reads or thread synchronization.
Comments
What if the first assignment reads a cached value of maxActualIndexOutput and then your first condition fails?
Volatile loads and stores are exactly free on x86 with the exception that they force an actual memory access instruction. But you want that anyway here. As written the code does not guarantee that an access will happen in the first if which seems to be a bug. Not sure.
Can you benchmark the perf impact of this optimization? I think it's negative.
Yes, we want to see perf numbers :)
Jahmai, If it reads a cached value, it has to be smaller (the value can only go up). In that case, it might fit the if statement, so we fall to the actual thread safe check, which reads the real value.
I had to try it with 16 threads running 10,000,000 iterations. Time to run without the unsafe checks: 00:00:00.7018961
Time to run with the unsafe checks: 00:00:00.5750781
Well, I guess there is some aspect to the problem that I do not understand. This should not work but it does :)
Actually, I was considering the scenario where the value has changed but the read doesn't get the new value, so the if statement fails because the value is equal and fails the > condition. This is the likely scenario of not using a volatile read. All in all, I think that this possibility is worth just doing the volatile read every time. 125ms over 10m iterations is a micro optimization that doesn't seem worth that possibility.
Jahmai, Consider that this is called per indexed document for each index. If you are indexing a lot of documents, this adds up. Not a lot, but good perf is compromised of a lot of optimizatons like that
Perhaps I'm missing the significance of this number, but, if some occasional failures of the initial condition are acceptable, and you are convinced that the invalidation of the cpu caching for that variable happens often enough such that it doesn't matter, why bother with the volatile read at all?
Jahmai, If you read the cached value, you'll read a smaller value, so you'll match the condition, then get into the thread safe value, which gives the correct value. The whole point is that most of the time, you'll quickly reach the actual max value, which doesn't change (90% of the time, it is always 1), so you don't need to go through the expensive thread safe mode, you use the cached value.
I understand the premise, but I am not convinced from this code snippet that it's a well reasoned application of the pattern. If you read a smaller value, that condition fails, so I'm not sure what you mean there. I need to read the whole code to understand it better.
Comment preview