Ayende @ Rahien

Refunds available at head office

Elegancy challenge: Cacheable Batches

Let us say that we have the following server code:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new List<JsonDocument>();
  foreach(var id in ids)
  {
    results.Add(GetDocument(id));
  }
  return result.ToArray();  
}

This method is a classic example of batching requests to the server. For the purpose of our discussion, we have a client proxy that looks like this:

public class ClientProxy : IDisposable
{
  MemoryCache cache = new MemoryCache();
  
  public JsonDocument[] GetDocument(params string[] ids)
  {
    
    // make the request to the server, for example
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using(var stream = request.GetResponse().GetResposeStream())
    {
      return  GetResults(stream);
    }
  }
  
  public void Dispose()
  {
    cache.Dispose();
  }
}

Now, as you can probably guess from the title and from the code above, the question relates to caching. We need to make the following pass:

using(var proxy = new ClientProxy())
{
  proxy.GetPopularity("ayende", "oren"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "hippo"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "aligator"); // only request aligator, 'rhino' is in the cache
  proxy.GetPopulairty("rhino", "hippo");  // don't make any request, serve from cache
  proxy.GetPopulairty("rhino", "oren");   // don't make any request, serve from cache
}

The tricky part, of course, is to make this elegant. You can modify both server and client code.

I simplified the problem drastically, but one of the major benefits in the real issue was reducing the size of data we have to fetch over the network even for partial cached queries.

Comments

Jesper Larsen-Ledet
08/24/2011 09:54 AM by
Jesper Larsen-Ledet

public JsonDocument[] GetDocument(params string[] ids) { // only ask server for uncached documents var uncachedIds = (from id in ids where !cache.Contains(id) select id).ToArray();

// make the request to the server, for example
var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", uncachedIds));
using (var stream = request.GetResponse().GetResponseStream())
{
    var results = GetResults(stream);
    // add fetched documents to cache
    foreach (var x in uncachedIds.Zip(results, (id, doc) => new { id, doc }))
    {
        cache[x.id] = x.doc;
    }
}

// return all documents form cache
return (from id in ids select (JsonDocument)cache[id]).ToArray();

}

Mark
08/24/2011 10:01 AM by
Mark
class ElegantCache<TKey, TValue>
{
    private readonly Dictionary<TKey, TValue> _cache = new Dictionary<TKey, TValue>(); 
    private readonly Func<TValue, TKey> _keyResolver;
    private readonly Func<IEnumerable<TKey>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(Func<TValue, TKey> keyResolver, Func<IEnumerable<TKey>, IEnumerable<TValue>> uncachedResolver)
    {
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(TKey[] keys)
    {
        var uncachedKeys = keys.Except(_cache.Keys).ToList();
        if (uncachedKeys.Count > 0)
        {
            foreach (var value in _uncachedResolver(uncachedKeys))
            {
                _cache.Add(_keyResolver(value), value);
            }
        }
        return keys.Join(_cache, k => k, kvp => kvp.Key, (k, kvp) => kvp.Value);
    }
}
Ayende Rahien
08/24/2011 10:10 AM by
Ayende Rahien

Jesper, This code suffers from a multi threading bug. We first check if an item exists, and only later we extract it, by that time, it might have been removed from the cache. Same for setting the item in the cache and then trying to get it from there.

Ayende Rahien
08/24/2011 10:12 AM by
Ayende Rahien

Mark, You don't have a cache, you have a memory leak, since nothing ever expires from this cache

Mark
08/24/2011 10:12 AM by
Mark

Revised: keys.Join line was abominable.

class ElegantCache<TKey, TValue>
{
    private readonly Dictionary<TKey, TValue> _cache = new Dictionary<TKey, TValue>(); 
    private readonly Func<TValue, TKey> _keyResolver;
    private readonly Func<IEnumerable<TKey>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(Func<TValue, TKey> keyResolver, Func<IEnumerable<TKey>, IEnumerable<TValue>> uncachedResolver)
    {
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(TKey[] keys)
    {
        var uncachedKeys = keys.Except(_cache.Keys).ToList();
        if (uncachedKeys.Count > 0)
        {
            foreach (var value in _uncachedResolver(uncachedKeys))
            {
                _cache.Add(_keyResolver(value), value);
            }
        }
        return keys.Select(key => _cache[key]);
    }
}
Ayende Rahien
08/24/2011 10:17 AM by
Ayende Rahien

Mark, Aside from everything else: keys.Except(_cache.Keys) Is probably much more expensive than doing a dictionary lookup for each key, since you are forcing iteration of the entire thing, which is very expensive for large number of items.

Same issue regarding the memory leak as before. Also, not safe for multi threading

Mark
08/24/2011 10:25 AM by
Mark

Non-leaking version:

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Except(values.Keys)))
            {
                values.Add(_keyResolver(value), value);
                _cache.Add(_keyResolver(value), value, _cacheItemPolicy);
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}
Mark
08/24/2011 10:31 AM by
Mark

Non-leaking, more performant, thread-safe...

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Where(k => !values.ContainsKey(k))))
            {
                var key = _keyResolver(value);
                values.Add(key, _cache.AddOrGetExisting(key, value, _cacheItemPolicy));
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}
Ayende Rahien
08/24/2011 10:39 AM by
Ayende Rahien

Mark, a) The GetValues will give you all the keys back (with the value set to null), so the code would fail. b) You are resolving those items one at a time, not all at once, which means that you just broke batching.

Ayende Rahien
08/24/2011 10:40 AM by
Ayende Rahien

Mark, In the second version, same problem in that GetValues return all keys, even not on the cache. You are still making N calls to the DB. Your call to AddOrGetExisting will get an existing (potentially older) value that has been already set in the cache. You should prefer to use the value from the server

Mark
08/24/2011 10:41 AM by
Mark

Ayende,

Your use case for the proxy showed it being used and disposed in a very short space of time, so memory leakage, thread-safety and performance around the overall size of the cache were not an obvious part of your requirements. You simplified the problem drastically, I simplified the solution likewise.

While we're in code review mode, your server method is inefficient in terms of collection management. You should create the List with the known size to prevent array copies as the list resizes its internals, or, even better, just create the array and then use a for loop to populate it:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new JsonDocument[ids.Length];
  for(int i = 0; i < ids.Length; i++)
  {
    results[i] = GetDocument(ids[i]);
  }
  return results;  
}
Ayende Rahien
08/24/2011 10:44 AM by
Ayende Rahien

Mark, I apologize, my (unvoiced) assumption was that the cache lifetime is longer than the lifetime of the proxy

Good catch with regards to the server code

Mark
08/24/2011 10:53 AM by
Mark

While I'm learning about MemoryCache, which is great, more and more requirements are rearing their heads here.

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Where(k => values[k] == null)))
            {
                var key = _keyResolver(value);
                values[key] = value;
                _cache.Set(key, value, _cacheItemPolicy);
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}

I'm not making multiple DB calls, the _uncachedResolver function takes a list of strings and returns a list of values. The function would look like:

ids =>
{
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using (var stream = request.GetResponse().GetResponseStream())
    {
        return GetResults(stream);
    }
}
Mark
08/24/2011 10:57 AM by
Mark

Also, I'm probably doing something hideous with my creation of the MemoryCache class with a GUID as the name. I suspect a better way would be to use a named region in the Default cache.

Ayende Rahien
08/24/2011 10:58 AM by
Ayende Rahien

Mark, Oh! Good point, I didn't see that, I am sorry. Last thing to fix is what to do when everything is cached, you don't need to call the server then.

Ayende Rahien
08/24/2011 10:59 AM by
Ayende Rahien

Mark, That is pretty bad, yes. Mostly because unless managed properly, caches are actually memory leaks :-) No worries for this scenario, but you might want to check how much trouble that got us with the RavenDB caching

Mark
08/24/2011 11:05 AM by
Mark

Yes, I forgot to fix that check line to account for Nulls in the dictionary that comes back from the cache. So that line should be

if (values.Any(kvp => kvp.Value == null))

and done.

Interesting to note that if you post enough comments, the Captcha stops appearing. Nice UX.

Mark
08/24/2011 11:16 AM by
Mark

I've posted that class along with how it would be used by ClientProxy as a gist: https://gist.github.com/1167813

Jesse
08/24/2011 01:00 PM by
Jesse

I rather like Mark's solution.

Matt Warren
08/24/2011 01:32 PM by
Matt Warren

Mark,

1 small bug, the line foreach (var value in _uncachedResolver(keys.Where(k => values[k] == null)))

should be foreach (var value in _uncachedResolver(keys.Where(k => !values.ContainsKey(k))))

Otherwise it throws an exception when the key isn't in a cache rather than fetching it.

Mark
08/24/2011 02:22 PM by
Mark

Matt,

MemoryCache returns all the keys in the Dictionary, with nulls where there was no matching value.

Matt Warren
08/24/2011 03:08 PM by
Matt Warren

I think that only applies if you're using the GetValues(..) method, but not if you're using the Indexer or Item property (see http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.item.aspx).

Either way it's a small detail and I like the your solution

Mark
08/25/2011 05:30 PM by
Mark

Matt: I am using GetValues.

Olcay Şeker
08/26/2011 10:16 AM by
Olcay Şeker

@Ayende, can we see your solution about this challenge?

Ayende Rahien
08/26/2011 10:18 AM by
Ayende Rahien

Olcay, Take a look at the RaccoonBlog code on github :-)

Hendry Luk
08/28/2011 04:51 PM by
Hendry Luk

@Mark, regarding list/array optimization, I think Linq would be most efficient solution. E.g. ids.Select(GetDocument).ToArray();

@Ayende, regarding thread-safetiness, why does that matter. Why is it important that 2 concurrent requests with the same id MUST make exactly 1 request? I think the performance overhead to ensure this requirement (i.e. locking) would outweigh its gain.

Ayende Rahien
08/28/2011 06:48 PM by
Ayende Rahien

Hendry, You don't need locking to do that. And the issues isn't of concurrent request, the issue is of concurrent system. 2 concurrent requests would make 2 separate requests, most probably, nothing wrong there. But you have to account for concurrency in the design because that is where it will run.

Comments have been closed on this topic.