Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 10 | Comments: 35

filter by tags archive

Elegancy challenge: Cacheable Batches

time to read 3 min | 465 words

Let us say that we have the following server code:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new List<JsonDocument>();
  foreach(var id in ids)
  {
    results.Add(GetDocument(id));
  }
  return result.ToArray();  
}

This method is a classic example of batching requests to the server. For the purpose of our discussion, we have a client proxy that looks like this:

public class ClientProxy : IDisposable
{
  MemoryCache cache = new MemoryCache();
  
  public JsonDocument[] GetDocument(params string[] ids)
  {
    
    // make the request to the server, for example
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using(var stream = request.GetResponse().GetResposeStream())
    {
      return  GetResults(stream);
    }
  }
  
  public void Dispose()
  {
    cache.Dispose();
  }
}

Now, as you can probably guess from the title and from the code above, the question relates to caching. We need to make the following pass:

using(var proxy = new ClientProxy())
{
  proxy.GetPopularity("ayende", "oren"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "hippo"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "aligator"); // only request aligator, 'rhino' is in the cache
  proxy.GetPopulairty("rhino", "hippo");  // don't make any request, serve from cache
  proxy.GetPopulairty("rhino", "oren");   // don't make any request, serve from cache
}

The tricky part, of course, is to make this elegant. You can modify both server and client code.

I simplified the problem drastically, but one of the major benefits in the real issue was reducing the size of data we have to fetch over the network even for partial cached queries.


Comments

Jesper Larsen-Ledet

public JsonDocument[] GetDocument(params string[] ids) { // only ask server for uncached documents var uncachedIds = (from id in ids where !cache.Contains(id) select id).ToArray();

// make the request to the server, for example
var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", uncachedIds));
using (var stream = request.GetResponse().GetResponseStream())
{
    var results = GetResults(stream);
    // add fetched documents to cache
    foreach (var x in uncachedIds.Zip(results, (id, doc) => new { id, doc }))
    {
        cache[x.id] = x.doc;
    }
}

// return all documents form cache
return (from id in ids select (JsonDocument)cache[id]).ToArray();

}

Mark
class ElegantCache<TKey, TValue>
{
    private readonly Dictionary<TKey, TValue> _cache = new Dictionary<TKey, TValue>(); 
    private readonly Func<TValue, TKey> _keyResolver;
    private readonly Func<IEnumerable<TKey>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(Func<TValue, TKey> keyResolver, Func<IEnumerable<TKey>, IEnumerable<TValue>> uncachedResolver)
    {
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(TKey[] keys)
    {
        var uncachedKeys = keys.Except(_cache.Keys).ToList();
        if (uncachedKeys.Count > 0)
        {
            foreach (var value in _uncachedResolver(uncachedKeys))
            {
                _cache.Add(_keyResolver(value), value);
            }
        }
        return keys.Join(_cache, k => k, kvp => kvp.Key, (k, kvp) => kvp.Value);
    }
}
Ayende Rahien

Jesper, This code suffers from a multi threading bug. We first check if an item exists, and only later we extract it, by that time, it might have been removed from the cache. Same for setting the item in the cache and then trying to get it from there.

Ayende Rahien

Mark, You don't have a cache, you have a memory leak, since nothing ever expires from this cache

Mark

Revised: keys.Join line was abominable.

class ElegantCache<TKey, TValue>
{
    private readonly Dictionary<TKey, TValue> _cache = new Dictionary<TKey, TValue>(); 
    private readonly Func<TValue, TKey> _keyResolver;
    private readonly Func<IEnumerable<TKey>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(Func<TValue, TKey> keyResolver, Func<IEnumerable<TKey>, IEnumerable<TValue>> uncachedResolver)
    {
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(TKey[] keys)
    {
        var uncachedKeys = keys.Except(_cache.Keys).ToList();
        if (uncachedKeys.Count > 0)
        {
            foreach (var value in _uncachedResolver(uncachedKeys))
            {
                _cache.Add(_keyResolver(value), value);
            }
        }
        return keys.Select(key => _cache[key]);
    }
}
Ayende Rahien

Mark, Aside from everything else: keys.Except(_cache.Keys) Is probably much more expensive than doing a dictionary lookup for each key, since you are forcing iteration of the entire thing, which is very expensive for large number of items.

Same issue regarding the memory leak as before. Also, not safe for multi threading

Mark

Non-leaking version:

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Except(values.Keys)))
            {
                values.Add(_keyResolver(value), value);
                _cache.Add(_keyResolver(value), value, _cacheItemPolicy);
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}
Mark

Non-leaking, more performant, thread-safe...

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Where(k => !values.ContainsKey(k))))
            {
                var key = _keyResolver(value);
                values.Add(key, _cache.AddOrGetExisting(key, value, _cacheItemPolicy));
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}
Ayende Rahien

Mark, a) The GetValues will give you all the keys back (with the value set to null), so the code would fail. b) You are resolving those items one at a time, not all at once, which means that you just broke batching.

Ayende Rahien

Mark, In the second version, same problem in that GetValues return all keys, even not on the cache. You are still making N calls to the DB. Your call to AddOrGetExisting will get an existing (potentially older) value that has been already set in the cache. You should prefer to use the value from the server

Mark

Ayende,

Your use case for the proxy showed it being used and disposed in a very short space of time, so memory leakage, thread-safety and performance around the overall size of the cache were not an obvious part of your requirements. You simplified the problem drastically, I simplified the solution likewise.

While we're in code review mode, your server method is inefficient in terms of collection management. You should create the List with the known size to prevent array copies as the list resizes its internals, or, even better, just create the array and then use a for loop to populate it:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new JsonDocument[ids.Length];
  for(int i = 0; i < ids.Length; i++)
  {
    results[i] = GetDocument(ids[i]);
  }
  return results;  
}
Ayende Rahien

Mark, I apologize, my (unvoiced) assumption was that the cache lifetime is longer than the lifetime of the proxy

Good catch with regards to the server code

Mark

While I'm learning about MemoryCache, which is great, more and more requirements are rearing their heads here.

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Where(k => values[k] == null)))
            {
                var key = _keyResolver(value);
                values[key] = value;
                _cache.Set(key, value, _cacheItemPolicy);
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}

I'm not making multiple DB calls, the _uncachedResolver function takes a list of strings and returns a list of values. The function would look like:

ids =>
{
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using (var stream = request.GetResponse().GetResponseStream())
    {
        return GetResults(stream);
    }
}
Mark

Also, I'm probably doing something hideous with my creation of the MemoryCache class with a GUID as the name. I suspect a better way would be to use a named region in the Default cache.

Ayende Rahien

Mark, Oh! Good point, I didn't see that, I am sorry. Last thing to fix is what to do when everything is cached, you don't need to call the server then.

Ayende Rahien

Mark, That is pretty bad, yes. Mostly because unless managed properly, caches are actually memory leaks :-) No worries for this scenario, but you might want to check how much trouble that got us with the RavenDB caching

Mark

Yes, I forgot to fix that check line to account for Nulls in the dictionary that comes back from the cache. So that line should be

if (values.Any(kvp => kvp.Value == null))

and done.

Interesting to note that if you post enough comments, the Captcha stops appearing. Nice UX.

Mark

I've posted that class along with how it would be used by ClientProxy as a gist: https://gist.github.com/1167813

Jesse

I rather like Mark's solution.

Matt Warren

Mark,

1 small bug, the line foreach (var value in _uncachedResolver(keys.Where(k => values[k] == null)))

should be foreach (var value in _uncachedResolver(keys.Where(k => !values.ContainsKey(k))))

Otherwise it throws an exception when the key isn't in a cache rather than fetching it.

Mark

Matt,

MemoryCache returns all the keys in the Dictionary, with nulls where there was no matching value.

Matt Warren

I think that only applies if you're using the GetValues(..) method, but not if you're using the Indexer or Item property (see http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.item.aspx).

Either way it's a small detail and I like the your solution

Mark

Matt: I am using GetValues.

Olcay Şeker

@Ayende, can we see your solution about this challenge?

Ayende Rahien

Olcay, Take a look at the RaccoonBlog code on github :-)

Hendry Luk

@Mark, regarding list/array optimization, I think Linq would be most efficient solution. E.g. ids.Select(GetDocument).ToArray();

@Ayende, regarding thread-safetiness, why does that matter. Why is it important that 2 concurrent requests with the same id MUST make exactly 1 request? I think the performance overhead to ensure this requirement (i.e. locking) would outweigh its gain.

Ayende Rahien

Hendry, You don't need locking to do that. And the issues isn't of concurrent request, the issue is of concurrent system. 2 concurrent requests would make 2 separate requests, most probably, nothing wrong there. But you have to account for concurrency in the design because that is where it will run.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The case of the memory eater and high load - 2 days from now
  2. Production postmortem: The case of the lying configuration file - 3 days from now
  3. Production postmortem: The industry at large - 4 days from now
  4. The insidious cost of allocations - 5 days from now
  5. Find the bug: The concurrent memory buster - 6 days from now

And 4 more posts are pending...

There are posts all the way to Sep 10, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    14 Aug 2015 - The case of the man in the middle
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats