Ayende @ Rahien

It's a girl

Trivial Lru Cache impl

It has been a while since I actually posted some code here, and I thought that this implementation was quite nice, in that it is simple & works for what it needs to do.

 

   1: public class LruCache<TKey, TValue>
   2: {
   3:     private readonly int _capacity;
   4:     private readonly Stopwatch _stopwatch = Stopwatch.StartNew();
   5:  
   6:     private class Node
   7:     {
   8:         public TValue Value;
   9:         public volatile Reference<long> Ticks;
  10:     }
  11:  
  12:     private readonly ConcurrentDictionary<TKey, Node> _nodes = new ConcurrentDictionary<TKey, Node>();
  13:  
  14:     public LruCache(int capacity)
  15:     {
  16:         Debug.Assert(capacity > 10);
  17:         _capacity = capacity;
  18:     }
  19:  
  20:     public void Set(TKey key, TValue value)
  21:     {
  22:         var node = new Node
  23:         {
  24:             Value = value,
  25:             Ticks = new Reference<long> { Value = _stopwatch.ElapsedTicks }
  26:         };
  27:  
  28:         _nodes.AddOrUpdate(key, node, (_, __) => node);
  29:         if (_nodes.Count > _capacity)
  30:         {
  31:             foreach (var source in _nodes.OrderBy(x => x.Value.Ticks).Take(_nodes.Count / 10))
  32:             {
  33:                 Node _;
  34:                 _nodes.TryRemove(source.Key, out _);
  35:             }
  36:         }
  37:     }
  38:  
  39:     public bool TryGet(TKey key, out TValue value)
  40:     {
  41:         Node node;
  42:         if (_nodes.TryGetValue(key, out node))
  43:         {
  44:             node.Ticks = new Reference<long> {Value = _stopwatch.ElapsedTicks};
  45:             value = node.Value;
  46:             return true;
  47:         }
  48:         value = default(TValue);
  49:         return false;
  50:     }
  51: }

Comments

Patrick Huizinga
06/19/2013 09:53 AM by
Patrick Huizinga

_nodes.AddOrUpdate(key, node, (_, __) => node); What's wrong with _nodes[key] = node ?

.Take(_nodes.Count / 10) I think that should be .Take(_nodes.Count - _capacity)

Personally I would have made Reference immutable or use Interlocked to update the long.

cao
06/19/2013 11:04 AM by
cao

I don't think long running Stopwatch are a good practice. You might get into issues related to processors clock speeds.

Igor Kalders
06/19/2013 01:24 PM by
Igor Kalders

@cao: how would that be an issue in this case, where its value is only used relatively to the others? (genuinely interested)

João Bragança
06/19/2013 02:33 PM by
João Bragança

@Patrick,

I think it's cause when you hit capacity, you do not want to iterate the cache every time after that. Reduce by 10% makes sense here.

Patrick Huizinga
06/19/2013 02:56 PM by
Patrick Huizinga

@Igor Kalders,

The problem with Stopwatch is that it's tickcount processor core dependent. If you read the same stopwatch on different cores, you get different results.

var first = stopwatch.TickCount; SwitchCore(); Assert(stopwatch.TickCount > first); could fail

I don't know if the difference will continue to build up though. So that eventually two cores get so much out of sync that this cache will start to get it wrong noticeably.

Probably the best thing is to replace readonly Stopwatch _stopwatch = Stopwatch.StartNew(); with int _version; and replace each _stopwatch.ElapsedTicks with Interlocked.Increment(ref _version). That way you can also make Node.Ticks a volatile int.

Patrick Huizinga
06/19/2013 02:58 PM by
Patrick Huizinga

typo, it should've been Assert(stopwatch.TickCount >= first); could fail (missed the =)

Patrick Huizinga
06/19/2013 03:06 PM by
Patrick Huizinga

links about the caveats of Stopwatch:

http://kristofverbiest.blogspot.co.uk/2008/10/beware-of-stopwatch.html http://www.virtualdub.org/blog/pivot/entry.php?id=106

 Doug
06/19/2013 03:12 PM by
Doug

What is Reference? I haven't seen it before...

 Doug
06/19/2013 03:13 PM by
Doug

Edit: What is Reference<T>?

Karhgath
06/19/2013 03:22 PM by
Karhgath

As Cao said, I don't like the long running stopwatch and don't think that would be good practice, as Patrick Huizinga exposed.

@Patrick Huizinga The problem with simply incrementing a count is that results depends on real life data in worst case scenario. If you have items used in bursts, they will stick a long time in the cache but will not be access for a long time (think an heavy, once a day scheduled job).

Usually for a LRU-like cache, the best performance I get on average from best/worst case scenarios (for a simple implementation) is to pair a timestamp with a count. The timestamp is immutable and the count is volatile and incremented on access with Interlocked. Then I order by (Count * weightBasedOnElapsedTimeSinceTimeStamp). This allows for long cached items to be released and recreated eventually after a couple of minutes/hours/days/weeks if they are used in burst. Works great when you have scheduled tasks that uses a cached resource extensively once a day, preventing it from hogging the cache all the time.

Sometimes I also add a DelayRecycle/DoNotRecycle flag if the recreation is very costly, but often recreating a costly resource once a day is not that bad.

I believe that would be similar to what Ayende did for administrator cache in Raven, but my implement is usually more akin to the above post.

Ryan Heath
06/19/2013 07:05 PM by
Ryan Heath

@Doug Probably because volatile long is not allowed, but volatile on a reference type is, hence the Reference.

Recently I saw an implementation that used a linked list to store when an item was accessed. On access the item was moved to the head. On writing the N items from the tail were removed. I liked that there was no count variable or sorting of all items needed. Ofcource at the expense of the linked list overhead.

// Ryan

Ayende Rahien
06/19/2013 07:18 PM by
Ayende Rahien

Patrick, I am never certain what the indexer setter will do, and this is more obvious And the reason we do node.Count / 10 is to remove the bottom 10% so we don't constantly add & remove stuff from the cache.

Ayende Rahien
06/19/2013 07:23 PM by
Ayende Rahien

cao, We don't actually care what the values are, we don't even care if they are accurate, sort-of should be good enough.

Ayende Rahien
06/19/2013 07:25 PM by
Ayende Rahien

Patrick, The problem with your approach is that Interlocked require all processors to sync their memory. In contrast, we explicitly do not care for that here.

Ayende Rahien
06/19/2013 07:26 PM by
Ayende Rahien

Doug, A class that encapsulate a value. public class Reference { public T Value; }

It allows to either see the previous or current value, without fear of corrupted writes.

Rafal
06/20/2013 08:51 AM by
Rafal

there's always a need for another Lru cache implementation, but this one has an unpleasant habit of sorting the whole collection on each insert if capacity limit is reached.

Rafal
06/20/2013 08:53 AM by
Rafal

... and an atomic GetOrAdd operation would be very handy too

Ayende Rahien
06/20/2013 09:12 AM by
Ayende Rahien

Rafal, The default size I have for this is 2048, so that isn't really too bad. But note that it is NOT on each insert, it drops the last 10% on capacity breech. with 2048, it would do this sort once every 200 ops or so.

Rafal
06/20/2013 09:19 AM by
Rafal

Right, i went to conclusions too fast.

Comments have been closed on this topic.