﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Alex commented on Challenge: Minimum number of round trips</title><description>A bit lengthy due to introduction of Post and PostRequest classes ...


        public class Post
        {
            public Post(string id, string content)
            {
                Id = id;
                Content = content;
            }
            public string Id { get; set; }
            public string Content { get; set; }
        }
        public class PostRequest
        {
            public PostRequest(string id)
            {
                Id = id;
            }
            public string Id { get; private set; }
            public Post CreatePost(string content)
            {
                return Result = new Post(Id, content);
            }
            public Post Result { get; private set; }
        }
        public static void ShardedGet()
        {
            string[] posts = {"posts/1234", "post/3214", "posts/1232", "posts/1238", "posts/1232"};
            var shardedRequest = posts
                .Distinct()
                .Select(p =&gt; new PostRequest(p))
                .SelectMany(r =&gt; GetAppropriateUrls(r.Id)
                    .Distinct()
                    .Select(u =&gt; new { request = r, Url = u })
                    )
                .GroupBy(m =&gt; m.Url)
                .SelectMany(g =&gt; SendRequest(g.Key, g.Select(s =&gt; s.request)));

            Console.WriteLine(string.Join("\n", shardedRequest.Select(p =&gt; p.Content)));
        }
        public static IEnumerable&lt;Post&gt; SendRequest(string url, IEnumerable&lt;PostRequest&gt; requests)
        {
            return requests
                .Where(r =&gt; r.Result == null)
                .Select(r =&gt; r.CreatePost("Bla Bla " + r.Id + " from url " + url))
                .ToArray();
        }
</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment25</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment25</guid><pubDate>Thu, 20 Oct 2011 01:30:57 GMT</pubDate></item><item><title>Valera Kolupaev commented on Challenge: Minimum number of round trips</title><description>@Mike
Not exactly, we know at least something about distribution of a documents in server.
At fists, for example we know that document A is located at Server 1, 2 or 3 with probability of 33%.
After attempt to fetch document from Server 1, it would be 50% for Server 2 and 3.
This information can be applied as entry point for MST algorithm.</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment24</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment24</guid><pubDate>Tue, 18 Oct 2011 17:15:04 GMT</pubDate></item><item><title>Roger commented on Challenge: Minimum number of round trips</title><description>Perhaps I'm being naive, but the usual meta-goal for reducing network round-trips is to optimize for the lowest overall elapsed time due to network latency effects.

That can often be achieved by issuing overllapping asynchronous parallel queries to multiple servers (not that hard to do with .Net parallel Task support these days).

Usually that gives you a total elapsed time slightly greater than the longest running query.

Most of the above-proposed methods introduce serial execution dependencies and are 'stateful'; hence not readily executed in parallel and thus will likely result in longer elapsed times than a naive 'stateless' parallel algorithm, except for some 'edge cases' where the 'stateful' algorithm will perform better.

A 'naive' general algorithmic approach would be:

Step 1 - coalesce the queries into 'batches' with 1 batch/ target server (querying *all servers in a group* where a target document can be present in any of a group of several servers).....I liked the LINQ approach to that problem.

Step 2 - If only 1 batch, issue the query *synchronously* (no point in adding extra overhead for 'parallel' execution!), return results when complete

Step 3 - otherwise, take the output batches from step 1 and execute the required queries as asynchronous  parallel tasks (1 per server), with a 'join point' when all queries are complete.  Note that execution threads need to enforce the rule that existence of a document 'trumps' any 'not-found' results for a document.

Step 4 - Post-process the aggregated query results looking for situations where the net outcome of queries for specific documents did not find the document and generate any related exceptions (since your example searches by primary Key, not finding a document on *any target server* is probably invalid (?)).

Step 5 - return aggregated results.

To avoid 'premature optimization', Implement the 'naive' algorithm first, then benchmark using a reasonably-sized set of 'random' queries  and perform *low-level* optimizations to fix any 'bottlenecks' identified by the benchmark....rinse and repeat using the same benchmark data.</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment23</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment23</guid><pubDate>Tue, 18 Oct 2011 06:10:46 GMT</pubDate></item><item><title>Mike commented on Challenge: Minimum number of round trips</title><description>@Valera - hmm... attempting to graph something that you have no knowledge about seems to be an exercise in futility

@Betty - great suggestion... this should do the job

// sorry for any formatting issues
// assume MakeRequest returns the "ids" returned from the request

var servers = ids.Distinct()
  .Select(x =&gt; new {
    Id = x, 
    Servers = GetAppropriateUrls(x),
  })
  .SelectMany(x =&gt; x.Servers, (x, c) =&gt; new {
    Id = x.Id, 
    Server = c, 
    ServerCount = x.Servers.Count(),
  })
  .OrderBy(x =&gt; x.ServerCount)
  .GroupBy(x =&gt; x.Server, (k, c) =&gt; new {
    Url = k,
    Concerns = c.Select(a =&gt; a.Id)
  });

var resolved = new List&lt;string&gt;();
foreach (var server in servers)
{
  var concerns = server.Concerns
    .Except(resolved).ToArray();

  if (concerns.Length == 0)
    continue;

  var response = MakeRequest(server.Url, concerns);
  resolved.AddRange(response);
}</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment22</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment22</guid><pubDate>Tue, 18 Oct 2011 05:26:01 GMT</pubDate></item><item><title>Valera Kolupaev commented on Challenge: Minimum number of round trips</title><description>@Betty
IMO, this is a good suggestion, however, it would't help in general case. I bet there is a counterexample too for your's algorithm.
To come up with least number of round-trips,  its required to apply some sort of graph search algorithms.
For example do a http://en.wikipedia.org/wiki/Minimum_spanning_tree , 
if we encode documents as nodes, and connect nodes, who are on the same servers.

The problem is, that MST seems like n^n (rough guess) complexity, so it might be overkill, to search optimal solution.</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment21</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment21</guid><pubDate>Mon, 17 Oct 2011 21:02:54 GMT</pubDate></item><item><title>Betty commented on Challenge: Minimum number of round trips</title><description>Simply sorting by largest possible result set wouldn't be optimal in some cases. You should pull out any documents with only one possible server and perform them first (along with requesting any other documents that may be on that server), then perform the remaining requests largest -&gt; smallest.

example that breaks the largest -&gt; smallest algorithm (servers are numbers, documents are letters)
1: a,b,c
2: a,e
3: b,f
4: c,g

4 requests vs 3 requests.</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment20</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment20</guid><pubDate>Mon, 17 Oct 2011 20:36:42 GMT</pubDate></item><item><title>Valera Kolupaev commented on Challenge: Minimum number of round trips</title><description>There is not enough information to determine optimal sequence of server calls.
So, my best bet is to use a dynamic programming technique to call server, which could return largest number of documents each time.
While this is not guaranteeing to find a best possible solution, it will minimize a number of round trips, compared to naive solution.

For better results, it might be good idea to add some heuristics, based on run-time usage data, to pick "better" servers. 

For more complex loading schemes, it's possible to find subset of servers to call, using, for example Minimum Spanning Tree algorithm.

First pass, F# solution
====================================================================
let mapToUrl (id : System.String) = 
    let lastChar = id.[(String.length id) - 1]
    match lastChar with
    | '4' -&gt; ["http://srv-4"; "http://srv-backup-4"]
    | '2' -&gt; ["http://srv-2"]
    | '8' -&gt; ["http://srv-backup-4"]

let docsList = ["posts/1234"; "post/3214"; "posts/1232"; "posts/1238"; "posts/1232"]
let fetch server docs =
    printfn "Fetching:\t%s - %A" server docs
    let serverContents = match server with
        | "http://srv-4" -&gt; ["posts/1234"]
        | "http://srv-backup-4" -&gt; ["post/3214"; "posts/1238"]
        | "http://srv-2" -&gt; ["posts/1232"; "posts/1232"]
    let r = serverContents |&gt; Set.ofSeq |&gt; Set.intersect (Set.ofSeq docs)
    printfn "Fetched:\t%A" r
    r

let rec bestCandidateRec (docsList : string list) processedServers = 
    let (srv, docParis) = 
        docsList 
        |&gt; List.ofSeq 
        |&gt; Seq.collect (fun d -&gt; mapToUrl d |&gt; Seq.map (fun x -&gt; (x, d))) 
        |&gt; Seq.filter (fun (srv, _) -&gt; not (Set.contains srv processedServers))
        |&gt; Seq.groupBy fst 
        |&gt; Seq.maxBy (fun(k, v) -&gt; Seq.length v)
    let docs = docParis |&gt; Seq.map snd
    let fetched = fetch srv docs
    let remaining = (Set.difference (Set.ofSeq docsList) (fetched))
    printfn "Remaining:\t%A" remaining
    if not (Set.isEmpty remaining) then
        bestCandidateRec (Set.toList remaining) (processedServers |&gt; Set.add srv)

bestCandidateRec docsList Set.empty 
====================================================================
Fetching:       http://srv-backup-4 - seq ["posts/1234"; "post/3214"; "posts/1238"]
Fetched:        set ["post/3214"; "posts/1238"]
Remaining:      set ["posts/1232"; "posts/1234"]
Fetching:       http://srv-2 - seq ["posts/1232"]
Fetched:        set ["posts/1232"]
Remaining:      set ["posts/1234"]
Fetching:       http://srv-4 - seq ["posts/1234"]
Fetched:        set ["posts/1234"]
Remaining:      set []
Press any key to continue . . .
====================================================================</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment19</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment19</guid><pubDate>Mon, 17 Oct 2011 14:51:03 GMT</pubDate></item><item><title>Valera Kolupaev commented on Challenge: Minimum number of round trips</title><description>With given knowledge it's impossible to get optimal request sequence, because we don't know where document is located (on main or backup server).
So, solution with Dynamic Programming might fit well.
The idea to load document with iteration, and pick server, who could return the largest number of documents on each turn:


F# Solution:
================================================================
let mapToUrl (id : System.String) = 
    let lastChar = id.[(String.length id) - 1]
    match lastChar with
    | '4' -&gt; ["http://srv-4"; "http://srv-backup-4"]
    | '2' -&gt; ["http://srv-2"]
    | '8' -&gt; ["http://srv-backup-4"]

let docsList = ["posts/1234"; "post/3214"; "posts/1232"; "posts/1238"; "posts/1232"]
let fetch server docs =
    printfn "Fetching:\t%s - %A" server docs
    let serverContents = match server with
        | "http://srv-4" -&gt; ["posts/1234"]
        | "http://srv-backup-4" -&gt; ["post/3214"; "posts/1238"]
        | "http://srv-2" -&gt; ["posts/1232"; "posts/1232"]
    let r = serverContents |&gt; Set.ofSeq |&gt; Set.intersect (Set.ofSeq docs)
    printfn "Fetched:\t%A" r
    r

let rec bestCandidateRec (docsList : string list) processedServers = 
    let (srv, docParis) = 
        docsList 
        |&gt; List.ofSeq 
        |&gt; Seq.collect (fun d -&gt; mapToUrl d |&gt; Seq.map (fun x -&gt; (x, d))) 
        |&gt; Seq.filter (fun (srv, _) -&gt; not (Set.contains srv processedServers))
        |&gt; Seq.groupBy fst 
        |&gt; Seq.maxBy (fun(k, v) -&gt; Seq.length v)
    let docs = docParis |&gt; Seq.map snd
    let fetched = fetch srv docs
    let remaining = (Set.difference (Set.ofSeq docsList) (fetched))
    printfn "Remaining:\t%A" remaining
    if not (Set.isEmpty remaining) then
        bestCandidateRec (Set.toList remaining) (processedServers |&gt; Set.add srv)
        
bestCandidateRec docsList Set.empty
================================================================
Result:
Fetching:       http://srv-backup-4 - seq ["posts/1234"; "post/3214"; "posts/1238"]
Fetched:        set ["post/3214"; "posts/1238"]
Remaining:      set ["posts/1232"; "posts/1234"]
Fetching:       http://srv-2 - seq ["posts/1232"]
Fetched:        set ["posts/1232"]
Remaining:      set ["posts/1234"]
Fetching:       http://srv-4 - seq ["posts/1234"]
Fetched:        set ["posts/1234"]
Remaining:      set []
</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment18</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment18</guid><pubDate>Mon, 17 Oct 2011 14:01:30 GMT</pubDate></item><item><title>Yan Cui commented on Challenge: Minimum number of round trips</title><description>Dan,
That's a nice idea, even without formally specifying that one server should have priority another, for this example at least you can save the hit to srv-4 by ordering the groups by the number of ids it contains:

idsDict
.Keys
// map each Id into Id-Url pairs, one for each url
.SelectMany(id =&gt; GetAppropriateUrls(id).Select(url =&gt; new { Id = id, Url = url }))
// group by the url to get a Url-Id[] mapping
.GroupBy(m =&gt; m.Url)
.OrderByDescending(m =&gt; m.Count())
// filter groups so only proceed if a group has ids it needs to retrieve
// note this works thanks to delay execution
.Where(gr =&gt; gr.Any(m =&gt; !idsDict[m.Id]))

Doing so prints out (plus minor modification to 'MakeRequest' method):
posts/1234,post/3214,posts/1238 retrieved from http://srv-backup-4
posts/1232 retrieved from http://srv-2</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment17</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment17</guid><pubDate>Mon, 17 Oct 2011 13:54:53 GMT</pubDate></item><item><title>Dan Turner commented on Challenge: Minimum number of round trips</title><description>Sorry, save a hit to http://srv-4</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment16</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment16</guid><pubDate>Mon, 17 Oct 2011 13:01:24 GMT</pubDate></item><item><title>Dan Turner commented on Challenge: Minimum number of round trips</title><description>A further optimisation to Yan's solution would be to consider the order in which urls are hit. For e.g. any Load containing an Id ending in "8" means http://srv-backup-4 MUST be hit. If we do this first we MIGHT also get those ending in "4" for free and save a hit to http://srv-backup-4.
</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment15</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment15</guid><pubDate>Mon, 17 Oct 2011 12:59:01 GMT</pubDate></item><item><title>Ayende Rahien commented on Challenge: Minimum number of round trips</title><description>Dave,
Oh, you can get multiple ids in a single request, yes</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment14</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment14</guid><pubDate>Mon, 17 Oct 2011 12:24:37 GMT</pubDate></item><item><title>Dave commented on Challenge: Minimum number of round trips</title><description>Well, why is there a limitation of one document per GET request in the first place? Loading means retrieving, so one can argue that there's no reason why a request couldn't return multiple documents? 

Just thinking outside the box..</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment13</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment13</guid><pubDate>Mon, 17 Oct 2011 11:54:25 GMT</pubDate></item><item><title>Yan Cui commented on Challenge: Minimum number of round trips</title><description>Ayende,
I introduced a dummy 'MakeRequest' method here that will tell you that you've got all the keys you ask for:

// Define other methods and classes here
// dummy request method that returns all the ids you can ask for
public static string[] MakeRequest(string url, string[] ids)
{
	Console.WriteLine("{0} retrieved", string.Join(",", ids));
	return ids;
}

And the rest of the solution goes like this:

var ids = new[] { "posts/1234", "post/3214", "posts/1232", "posts/1238", "posts/1232" };
	
// build up a dictionary of the distinct post ids to keep track of 'if' it has been retrieved
var idsDict = ids.Distinct().ToDictionary(id =&gt; id, id =&gt; false);

var idGroups = 
	idsDict
	.Keys
	// map each Id into Id-Url pairs, one for each url
	.SelectMany(id =&gt; GetAppropriateUrls(id).Select(url =&gt; new { Id = id, Url = url }))
	// group by the url to get a Url-Id[] mapping
	.GroupBy(m =&gt; m.Url)
	// filter groups so only proceed if a group has ids it needs to retrieve
	// note this works thanks to delay execution
	.Where(gr =&gt; gr.Any(m =&gt; !idsDict[m.Id]))
	// final transformation to make it easier for the loop
	.Select(gr =&gt; 
		new { 
			Url = gr.Key,
			Ids = gr.Select(m =&gt; m.Id).Where(id =&gt; !idsDict[id]).ToArray()
			});

foreach (var idGroup in idGroups)
{
	var retrieved = MakeRequest(idGroup.Url, idGroup.Ids);
	
	// mark the ids as 'retrieved'
	retrieved.ToList().ForEach(id =&gt; idsDict[id] = true);
}

Running this prints:

posts/1234,post/3214 retrieved
posts/1238 retrieved
posts/1232 retrieved
</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment12</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment12</guid><pubDate>Mon, 17 Oct 2011 11:47:11 GMT</pubDate></item><item><title>Ayende Rahien commented on Challenge: Minimum number of round trips</title><description>Yan,
Yes, and if there aren't any more ids to check, don't make the request</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment11</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment11</guid><pubDate>Mon, 17 Oct 2011 11:29:35 GMT</pubDate></item><item><title>Yan Cui commented on Challenge: Minimum number of round trips</title><description>Ayende,
Just to clarify, assuming that the requests are to be made synchronously, and the first request was to srv-4 and posts/1234 was found there, now you want to exclude posts/1234 from the request to srv-backup-4 because you already have it?</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment10</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment10</guid><pubDate>Mon, 17 Oct 2011 11:21:25 GMT</pubDate></item><item><title>Ayende Rahien commented on Challenge: Minimum number of round trips</title><description>Yan,
Nice, now handle the case of an item with multiple Url found on the first item and shouldn't be in the second request to the url</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment9</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment9</guid><pubDate>Mon, 17 Oct 2011 11:16:59 GMT</pubDate></item><item><title>Yan Cui commented on Challenge: Minimum number of round trips</title><description>How about something like this then?

var ids = new[] { "posts/1234", "post/3214", "posts/1232", "posts/1238", "posts/1232" };
	
var requests = 
    ids.SelectMany(id =&gt; GetAppropriateUrls(id).Select(url =&gt; new { Id = id, Url = url }))
	.GroupBy(m =&gt; m.Url);</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment8</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment8</guid><pubDate>Mon, 17 Oct 2011 11:00:45 GMT</pubDate></item><item><title>Ayende Rahien commented on Challenge: Minimum number of round trips</title><description>Stuart,
Yes</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment7</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment7</guid><pubDate>Mon, 17 Oct 2011 10:56:15 GMT</pubDate></item><item><title>Stuart Cam commented on Challenge: Minimum number of round trips</title><description>@Ayende

So a case '4' could be found in either "http://srv-4" _OR_ "http://srv-backup-4", as opposed to "http://srv-4" _AND_ "http://srv-backup-4"?</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment6</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment6</guid><pubDate>Mon, 17 Oct 2011 10:55:16 GMT</pubDate></item><item><title>Ayende Rahien commented on Challenge: Minimum number of round trips</title><description>Stuart,
posts/1234 may be in _either_ srv-4 or srv-backup-4, you need to check both</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment5</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment5</guid><pubDate>Mon, 17 Oct 2011 10:54:15 GMT</pubDate></item><item><title>Stuart Cam commented on Challenge: Minimum number of round trips</title><description>var distinctIds = ids.Distinct();
var allShardsAndServers = distinctIds.SelectMany(id =&gt; GetAppropriateUrls(id).Select(server =&gt; new { id, server }));
var rankedByPopularServer = allShardsAndServers.GroupBy(i =&gt; i.server).OrderByDescending(g =&gt; g.Count());
var idsForShards = distinctIds.Select(id =&gt; rankedByPopularServer.SelectMany(g =&gt; g).First(g =&gt; g.id == id));

Gives the ouput:

{ id = posts/1234, server = http://srv-backup-4 }
{ id = post/3214, server = http://srv-backup-4 }
{ id = posts/1232, server = http://srv-2 }
{ id = posts/1238, server = http://srv-backup-4 }</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment4</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment4</guid><pubDate>Mon, 17 Oct 2011 10:51:37 GMT</pubDate></item><item><title>Ayende Rahien commented on Challenge: Minimum number of round trips</title><description>Anton &amp; Stuart,
When you have multiple urls for a single id, that document reside in _one_ of those urls, not any of them.
</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment3</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment3</guid><pubDate>Mon, 17 Oct 2011 10:46:49 GMT</pubDate></item><item><title>Stuart Cam commented on Challenge: Minimum number of round trips</title><description>Anton, you can save a request to http://srv-4 by loading the posts from http://srv-backup-4.

You might want to consider finding posts that can be serviced from common servers too...

The final result would be:

http://srv-2 - posts/1232 posts/1232 
http://srv-backup-4 - posts/1238 posts/1234 post/3214 
 
Saves one round-trip</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment2</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment2</guid><pubDate>Mon, 17 Oct 2011 10:31:43 GMT</pubDate></item><item><title>Anton Gogolev commented on Challenge: Minimum number of round trips</title><description>Assuming that when we have two or more shards a single post can be freely loaded from any:

&lt;pre&gt;
    var ids = new[] {"posts/1234", "post/3214", "posts/1232", "posts/1238", "posts/1232"};
    var allEligibleShards = ids.SelectMany(id =&gt; GetAppropriateUrls(id)).Distinct().OrderBy(url =&gt; url);
    var idsForShard = ids.
        Select(id =&gt; new { id, shard = GetAppropriateUrls(id).OrderBy(url =&gt; url).Intersect(allEligibleShards).First() }).
        GroupBy(x =&gt; x.shard, x =&gt; x.id);
&lt;/pre&gt;

This code produces the following results:

&lt;pre&gt;
http://srv-4 - posts/1234 post/3214 
http://srv-2 - posts/1232 posts/1232 
http://srv-backup-4 - posts/1238 
&lt;/pre&gt;</description><link>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment1</link><guid>http://ayende.com/120833/challenge-minimum-number-of-round-trips#comment1</guid><pubDate>Mon, 17 Oct 2011 10:28:24 GMT</pubDate></item></channel></rss>