Ayende @ Rahien

It's a girl

Concurrent Max

Can you think of a better way to implement this code?

private volatile Guid lastEtag;
private readonly object lastEtagLocker = new object();
internal void UpdateLastWrittenEtag(Guid? etag)
{
    if (etag == null)
        return;

    var newEtag = etag.Value.ToByteArray();

    // not the most recent etag
    if (Buffers.Compare(lastEtag.ToByteArray(), newEtag) <= 0)
    {
        return;
    }

    lock (lastEtagLocker)
    {
        // not the most recent etag
        if (Buffers.Compare(lastEtag.ToByteArray(), newEtag) <= 0)
        {
            return;
        }

        lastEtag = etag.Value;
    }
}

We have multiple threads calling this function, and we need to ensure that lastEtag value is always the maximum value. This has the potential to be called often, so I want to make sure that I chose the best way to do this. Ideas?

Comments

Will
06/12/2011 09:35 AM by
Will

You could store lastEtag as a Byte[] rather than a Guid which would save some conversions.

Ayende Rahien
06/12/2011 09:40 AM by
Ayende Rahien

I could, yes. We are actually using the etag far more often as a Guid rather than a byte array elsewhere.

Ken Egozi
06/12/2011 09:59 AM by
Ken Egozi

Depending on the read/write ratio. Off of your last paragraph I might guess that it is quite balanced, maybe even rending toward more writes than reads, then you might move the burden to the read side. i.e. - writes will add to a small (non-locking concurrent) collection of "potentially last etag" values, and reads will get the latest, and clear the collection.

You might even take advantage of a non-locking concurrent sorted-list (is there anything like that available?) , simply adding to the list on write, and getting the last on read. I'm not sure it would be very efficient though.

tobi
06/12/2011 11:50 AM by
tobi

Change the type of lastEtag to Box (reftype). Then use interlocked operations to atomically swap out the value in a retry loop:

while(true) { Box lastEtagLocal = lastEtag; if(newer) { var newEtagBox = ...; if(Interlocked.CmpXchg(ref lastEtag, newEtagBox, lastEtagLocal) == lastEtagLocal) break; } }

pseudo-code.

PS: The preview has wrong code style. PSS: "An error occurred while___ posing___ your comment"

oskark
06/12/2011 11:51 AM by
oskark

I concur with the notion of optimizing for dominant read or writes . Otherwise the SpinWait class introduced in .Net 4 is helpful for a lock free update of a field:

nternal void UpdateLastWrittenEtag (Guid? etag)

{

if(etag == null) return;

var spinWait = new SpinWait();

while (true)

{

var snapshot1 = lastEtag;

var snapshot2 = Interlocked.CompareExchange (ref lastEtag, etag, snapshot1);

if (snapshot1 == snapshot2) return;

spinWait.SpinOnce();

}

}

tobi
06/12/2011 11:52 AM by
tobi

I meant Box[Guid]. Why are you stripping html tags? That is a bad practice. Encoding is simpler, more reliable and more user friendly.

I recommend you use this lib for BBCode in comments: http://bbcode.codeplex.com/ It is very robust, easy to use and fast.

Duarte Nunes
06/12/2011 12:00 PM by
Duarte Nunes

You can use a CAS scheme like the following; however, it generates quiet a lot of garbage if writes are frequent, which may make the lock approach more adequate.

internal class Holder<T> 
{  
    internal T Value;
}

private volatile Holder<Guid> lastEtag;

internal void UpdateLastWrittenEtag(Guid? etag)
{
    if (etag == null || etag.HasValue == false)
        return;

    var newEtag = etag.Value.ToByteArray();
    Holder<Guid> newHolder;
    SpinWait sw = new SpinWait();

    do 
    {
         var last = lastEtag;

         if (Buffers.Compare(last.Value.ToByteArray(), newEtag) <= 0)
         return;

         if (newHolder == null) 
         newHolder = new Holder<Guid> { Value = etag.Value };

         if (Interlocked.CompareExchange(ref lastEtag, newHolder, last) == last) 
         return;

         sw.SpinOnce();         
    } while (true);
}
tobi
06/12/2011 12:10 PM by
tobi

I believe I found a bug in Racoon Blogs so called "whitelisting" of html (it is really blacklisting because you are deleting disallowed stuff instead of adding allowed stuff): You can get around the img-regex by typing "[[img src="javascript:..."]] with [ replaced by left angle bracket. I did not test this.

God do I hate the practice of blacklisting. There is always something missing. It almost never works.

EvereQ
06/12/2011 12:20 PM by
EvereQ

The only thing I see that may significantly improve performance is related to possible use of the "unsafe" keyword and to do comparison inline (instead of calling Buffers.Compare and ToByteArray methods that might be too expensive overhead) using pointers arithmetic directly in your method. Also if you know some additional info about your GUIDs, like for example you know that they are always generated on same PC you may optimize algorithm more, when do compare first for most significant GUID part that changed over time in your specific environment (i.e. question here where and how GUIDs are generated?) etc.

Simon Bartlett
06/12/2011 12:46 PM by
Simon Bartlett

Ayende,

Why not use a jQuery markdown editor, then people might actually use markdown: http://markitup.jaysalvat.com/home/

Also, the current editor is a headache to use on some devices; I'm writing this comment on an iPhone and I think your preview script is making it extremely slow to type.

Simon Bartlett
06/12/2011 12:48 PM by
Simon Bartlett

Sorry about the double post, maybe I should refrain from using my phone to post comments!

Daniel Grunwald
06/12/2011 12:49 PM by
Daniel Grunwald

You will have to use a reference type field (either Holder or just byte[]) - the current code is not thread-safe!

"volatile" just ensures the threads see the latest value; it doesn't ensure the access is atomic. Thus, is the current lastEtag is 00000000-0000-0000-FFFF-FFFFFFFFFFFF, and one thread is updating that to 00000000-0000-0001-0000-000000000000, then another thread might read 00000000-0000-0001-FFFF-FFFFFFFFFFFF, a value much larger than the maximum. Now if that second thread was trying to set the value to 00000000-0000-0001-0000-000000000001, it won't do that due to the incorrect read, and your maximum will be incorrect.

Ayende Rahien
06/12/2011 01:04 PM by
Ayende Rahien

Tobi, You need to solve the "only replace if newer" problem, though.

tobi
06/12/2011 01:21 PM by
tobi

Ayende, I did. You can exit the loop if your guid is not newer than the existing. In that case you just return.

tobi
06/12/2011 01:23 PM by
tobi

Basically, the trick is that you can atomically swap any amount of data by stuffing it into a ref type. Then swap the reference instead all of the data.

zvolkov
06/12/2011 07:03 PM by
zvolkov

Why do you need to compare? Can you ever get a less recent eTag? Just always overwrite.

petar repac
06/12/2011 08:31 PM by
petar repac

Ayende, just checked and seems that Guid.ToByteArray is not thread safe so you could get some weird array of bytes

Kevin Pullin
06/12/2011 09:55 PM by
Kevin Pullin

@petar - Can you describe why Guid.ToByteArray is not thread safe?

Guid appears to be an immutable struct, so unless you're doing some whacky reflection to change the private values, I don't see how this can produce invalid results.

tobi
06/12/2011 10:01 PM by
tobi

Kevin, you can just overwrite a struct variable which will gradually overwrite the fields. The memory locations are not immutable.

petar repac
06/12/2011 10:10 PM by
petar repac

@Kevin look at http://msdn.microsoft.com/en-us/library/system.guid.aspx it says that instance members are not thread safe

Kevin Pullin
06/13/2011 03:42 AM by
Kevin Pullin

Thanks tobi & petar. I was able to repro the issue, confirming the comment from Daniel.

Ayende Rahien
06/13/2011 04:38 AM by
Ayende Rahien

Zvolkov, What happen if two threads tries to update at the same time?

Ayende Rahien
06/13/2011 04:40 AM by
Ayende Rahien

Petar, There is no way ToByteArray isn't safe. Put simply, Guid is an immutable data structure, since ToByeArray is working based on immutable data structure, it is safe to use in multiple threads.

petar repac
06/13/2011 05:02 AM by
petar repac

@Ayende, jup - you are right. I should not rely on MS documentation so much, I guess.

tobi
06/13/2011 11:01 AM by
tobi

Ayende, ToByteArray is not safe if invoked on a variable that is concurrently mutated. Struct variables can be overwritten which akes the memory locations of their readonly fields change.

Ayende Rahien
06/13/2011 11:05 AM by
Ayende Rahien

Tobi, Good point, but not a scenario that is going to happen the way I am doing this.

tobi
06/13/2011 12:09 PM by
tobi

Ayende, you are reading and writing directly from/to the field. I believe this is going to happen. The problem goes away when you implement the interlocked.cmpxchg loop that was proposed above because the ref type field will be accesses atomically.

Ayende Rahien
06/13/2011 12:14 PM by
Ayende Rahien

Tobi, The code as posted cannot even compile (you can't make volatile Guid). I am using a reference to this via a pointer, that is atomic

Alois Kraus
06/13/2011 08:48 PM by
Alois Kraus

How is your Buffers.Compare implemented? Perhaps you can calulate a value from it which fits into an integer which would make the comparison cheaper. Another option is to make lastEtag threadlocal and check against your threadlocal value. If the new value is bigger only then you need to take a lock and update the global value and your thread local value. This way your threads operate most of the time on their own data which is much more data cache friendly. If you try hard to avoid CPU cache misses (code and data) you can get speedups up to a factor 20 if your tight loop fits with its data entirely into the L1 (32 KB on modern CPUs) and L2 (4-8 MB) cache.

Yours, Alois Kraus

Mark Rendle
06/13/2011 11:44 PM by
Mark Rendle

private volatile BigInteger lastEtag;

internal void UpdateLastWrittenEtag(Guid? etagGuid) { if (etagGuid == null) return; var etag = new BigInteger(etagGuid.ToByteArray()); if (etag < lastEtag) return;

lock (_sync)
{
    lastEtag = BigInteger.Max(etag, lastEtag);
}

}

Scooletz
06/14/2011 08:33 AM by
Scooletz

How about storing additionally hash of a Guid. Getting Hash from newEtag should be faster than creating an array, and, if the Guid is well-hashable not creating so much collisions. If check would go to firstly check the hash, next guid bytes, next try replace if needed.

Comments have been closed on this topic.