Elegancy challenge: Cacheable Batches

time to read 3 min | 465 words

Let us say that we have the following server code:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new List<JsonDocument>();
  foreach(var id in ids)
  {
    results.Add(GetDocument(id));
  }
  return result.ToArray();  
}

This method is a classic example of batching requests to the server. For the purpose of our discussion, we have a client proxy that looks like this:

public class ClientProxy : IDisposable
{
  MemoryCache cache = new MemoryCache();
  
  public JsonDocument[] GetDocument(params string[] ids)
  {
    
    // make the request to the server, for example
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using(var stream = request.GetResponse().GetResposeStream())
    {
      return  GetResults(stream);
    }
  }
  
  public void Dispose()
  {
    cache.Dispose();
  }
}

Now, as you can probably guess from the title and from the code above, the question relates to caching. We need to make the following pass:

using(var proxy = new ClientProxy())
{
  proxy.GetPopularity("ayende", "oren"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "hippo"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "aligator"); // only request aligator, 'rhino' is in the cache
  proxy.GetPopulairty("rhino", "hippo");  // don't make any request, serve from cache
  proxy.GetPopulairty("rhino", "oren");   // don't make any request, serve from cache
}

The tricky part, of course, is to make this elegant. You can modify both server and client code.

I simplified the problem drastically, but one of the major benefits in the real issue was reducing the size of data we have to fetch over the network even for partial cached queries.

Tweet Share Share 27 comments

Tags:

Challanges

Comments

24 Aug 2011
09:54 AM

Jesper Larsen-Ledet

public JsonDocument[] GetDocument(params string[] ids) { // only ask server for uncached documents var uncachedIds = (from id in ids where !cache.Contains(id) select id).ToArray();

// make the request to the server, for example
var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", uncachedIds));
using (var stream = request.GetResponse().GetResponseStream())
{
    var results = GetResults(stream);
    // add fetched documents to cache
    foreach (var x in uncachedIds.Zip(results, (id, doc) => new { id, doc }))
    {
        cache[x.id] = x.doc;
    }
}

// return all documents form cache
return (from id in ids select (JsonDocument)cache[id]).ToArray();

}

24 Aug 2011
10:01 AM

Mark

class ElegantCache<TKey, TValue>
{
    private readonly Dictionary<TKey, TValue> _cache = new Dictionary<TKey, TValue>(); 
    private readonly Func<TValue, TKey> _keyResolver;
    private readonly Func<IEnumerable<TKey>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(Func<TValue, TKey> keyResolver, Func<IEnumerable<TKey>, IEnumerable<TValue>> uncachedResolver)
    {
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(TKey[] keys)
    {
        var uncachedKeys = keys.Except(_cache.Keys).ToList();
        if (uncachedKeys.Count > 0)
        {
            foreach (var value in _uncachedResolver(uncachedKeys))
            {
                _cache.Add(_keyResolver(value), value);
            }
        }
        return keys.Join(_cache, k => k, kvp => kvp.Key, (k, kvp) => kvp.Value);
    }
}

24 Aug 2011
10:10 AM

Ayende Rahien

Jesper, This code suffers from a multi threading bug. We first check if an item exists, and only later we extract it, by that time, it might have been removed from the cache. Same for setting the item in the cache and then trying to get it from there.

24 Aug 2011
10:12 AM

Ayende Rahien

Mark, You don't have a cache, you have a memory leak, since nothing ever expires from this cache

24 Aug 2011
10:12 AM

Mark

Revised: keys.Join line was abominable.

class ElegantCache<TKey, TValue>
{
    private readonly Dictionary<TKey, TValue> _cache = new Dictionary<TKey, TValue>(); 
    private readonly Func<TValue, TKey> _keyResolver;
    private readonly Func<IEnumerable<TKey>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(Func<TValue, TKey> keyResolver, Func<IEnumerable<TKey>, IEnumerable<TValue>> uncachedResolver)
    {
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(TKey[] keys)
    {
        var uncachedKeys = keys.Except(_cache.Keys).ToList();
        if (uncachedKeys.Count > 0)
        {
            foreach (var value in _uncachedResolver(uncachedKeys))
            {
                _cache.Add(_keyResolver(value), value);
            }
        }
        return keys.Select(key => _cache[key]);
    }
}

24 Aug 2011
10:17 AM

Ayende Rahien

Mark, Aside from everything else: keys.Except(_cache.Keys) Is probably much more expensive than doing a dictionary lookup for each key, since you are forcing iteration of the entire thing, which is very expensive for large number of items.

Same issue regarding the memory leak as before. Also, not safe for multi threading

24 Aug 2011
10:25 AM

Mark

Non-leaking version:

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Except(values.Keys)))
            {
                values.Add(_keyResolver(value), value);
                _cache.Add(_keyResolver(value), value, _cacheItemPolicy);
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}

24 Aug 2011
10:31 AM

Mark

Non-leaking, more performant, thread-safe...

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Where(k => !values.ContainsKey(k))))
            {
                var key = _keyResolver(value);
                values.Add(key, _cache.AddOrGetExisting(key, value, _cacheItemPolicy));
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}

24 Aug 2011
10:39 AM

Ayende Rahien

Mark, a) The GetValues will give you all the keys back (with the value set to null), so the code would fail. b) You are resolving those items one at a time, not all at once, which means that you just broke batching.

24 Aug 2011
10:40 AM

Ayende Rahien

Mark, In the second version, same problem in that GetValues return all keys, even not on the cache. You are still making N calls to the DB. Your call to AddOrGetExisting will get an existing (potentially older) value that has been already set in the cache. You should prefer to use the value from the server

24 Aug 2011
10:41 AM

Mark

Ayende,

Your use case for the proxy showed it being used and disposed in a very short space of time, so memory leakage, thread-safety and performance around the overall size of the cache were not an obvious part of your requirements. You simplified the problem drastically, I simplified the solution likewise.

While we're in code review mode, your server method is inefficient in terms of collection management. You should create the List with the known size to prevent array copies as the list resizes its internals, or, even better, just create the array and then use a for loop to populate it:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new JsonDocument[ids.Length];
  for(int i = 0; i < ids.Length; i++)
  {
    results[i] = GetDocument(ids[i]);
  }
  return results;  
}

24 Aug 2011
10:44 AM

Ayende Rahien

Mark, I apologize, my (unvoiced) assumption was that the cache lifetime is longer than the lifetime of the proxy

Good catch with regards to the server code

24 Aug 2011
10:53 AM

Mark

While I'm learning about MemoryCache, which is great, more and more requirements are rearing their heads here.

class ElegantCache<TValue> : IDisposable
{
    private readonly MemoryCache _cache = new MemoryCache(Guid.NewGuid().ToString());
    private readonly CacheItemPolicy _cacheItemPolicy;
    private readonly Func<TValue, string> _keyResolver;
    private readonly Func<IEnumerable<string>, IEnumerable<TValue>> _uncachedResolver;

    public ElegantCache(CacheItemPolicy cacheItemPolicy, Func<TValue, string> keyResolver, Func<IEnumerable<string>, IEnumerable<TValue>> uncachedResolver)
    {
        _cacheItemPolicy = cacheItemPolicy;
        _keyResolver = keyResolver;
        _uncachedResolver = uncachedResolver;
    }

    public IEnumerable<TValue> Get(string[] keys)
    {
        var values = _cache.GetValues(keys) ?? new Dictionary<string, object>();
        if (values.Count < keys.Length)
        {
            foreach (var value in _uncachedResolver(keys.Where(k => values[k] == null)))
            {
                var key = _keyResolver(value);
                values[key] = value;
                _cache.Set(key, value, _cacheItemPolicy);
            }
        }
        return keys.Select(key => values[key]).Cast<TValue>();
    }

    public void Dispose()
    {
        _cache.Dispose();
    }
}

I'm not making multiple DB calls, the _uncachedResolver function takes a list of strings and returns a list of values. The function would look like:

ids =>
{
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using (var stream = request.GetResponse().GetResponseStream())
    {
        return GetResults(stream);
    }
}

24 Aug 2011
10:57 AM

Mark

Also, I'm probably doing something hideous with my creation of the MemoryCache class with a GUID as the name. I suspect a better way would be to use a named region in the Default cache.

24 Aug 2011
10:58 AM

Ayende Rahien

Mark, Oh! Good point, I didn't see that, I am sorry. Last thing to fix is what to do when everything is cached, you don't need to call the server then.

24 Aug 2011
10:59 AM

Ayende Rahien

Mark, That is pretty bad, yes. Mostly because unless managed properly, caches are actually memory leaks :-) No worries for this scenario, but you might want to check how much trouble that got us with the RavenDB caching

24 Aug 2011
11:05 AM

Mark

Yes, I forgot to fix that check line to account for Nulls in the dictionary that comes back from the cache. So that line should be

if (values.Any(kvp => kvp.Value == null))

and done.

Interesting to note that if you post enough comments, the Captcha stops appearing. Nice UX.

24 Aug 2011
11:16 AM

Mark

I've posted that class along with how it would be used by ClientProxy as a gist: https://gist.github.com/1167813

24 Aug 2011
13:00 PM

Jesse

I rather like Mark's solution.

24 Aug 2011
13:32 PM

Matt Warren

Mark,

1 small bug, the line foreach (var value in _uncachedResolver(keys.Where(k => values[k] == null)))

should be foreach (var value in _uncachedResolver(keys.Where(k => !values.ContainsKey(k))))

Otherwise it throws an exception when the key isn't in a cache rather than fetching it.

24 Aug 2011
14:22 PM

Mark

Matt,

MemoryCache returns all the keys in the Dictionary, with nulls where there was no matching value.

24 Aug 2011
15:08 PM

Matt Warren

I think that only applies if you're using the GetValues(..) method, but not if you're using the Indexer or Item property (see http://msdn.microsoft.com/en-us/library/system.runtime.caching.memorycache.item.aspx).

Either way it's a small detail and I like the your solution

25 Aug 2011
17:30 PM

Mark

Matt: I am using GetValues.

26 Aug 2011
10:16 AM

Olcay Şeker

@Ayende, can we see your solution about this challenge?

26 Aug 2011
10:18 AM

Ayende Rahien

Olcay, Take a look at the RaccoonBlog code on github :-)

28 Aug 2011
16:51 PM

Hendry Luk

@Mark, regarding list/array optimization, I think Linq would be most efficient solution. E.g. ids.Select(GetDocument).ToArray();

@Ayende, regarding thread-safetiness, why does that matter. Why is it important that 2 concurrent requests with the same id MUST make exactly 1 request? I think the performance overhead to ensure this requirement (i.e. locking) would outweigh its gain.

28 Aug 2011
18:48 PM

Ayende Rahien

Hendry, You don't need locking to do that. And the issues isn't of concurrent request, the issue is of concurrent _system_. 2 concurrent requests would make 2 separate requests, most probably, nothing wrong there. But you have to account for concurrency in the design because that is where it will run.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB