Code through the looking glassAll you need is a dictionary (and O(N) )

time to read 2 min | 362 words

The first question that ask in the coding task goes something like this: "Given a CSV file containing users' data, who is large, but can fully fit into memory, we want to be able to search by email address very quickly and get all the matching user ids. Optimize for fast queries over fast startup".

The intent is basically that you'll read the file and load that into a dictionary, then use that to perform queries.

This candidate has done just that, although things started being strange pretty much from the get go…

Dictionary<int, string> emails = new Dictionary<int, string>();

That seemed like a pretty strange way to set things up, but we have seen crazier. At this point I thought that they are probably storing the hash code of the email in the dictionary, and the string value is the concatenated list of all the matching ids.

The truth was… far more interesting. Here is the code for querying:

public Dictionary<int, int> EmailSearcher(string email)
    Dictionary<int, int> answer = new Dictionary<int, int>();
    int count = 0;
    foreach (var entry in emails)
        if (entry.Value.ToString().Equals(email, StringComparison.OrdinalIgnoreCase))
            answer.Add(count, entry.Key);

    return answer;

This code actually have multiple levels of strangeness. The obvious one is that this is doing a linear search on the dictionary, but look at the return type as well…

The candidate was well aware that this code is not able to handle large amount of information, so the candidate sent us an alternative implementation. But I'll talk about that in my next post.

More posts in "Code through the looking glass" series:

  1. (18 Mar 2016) And a linear search to rule them
  2. (17 Mar 2016) Sorting large data sets
  3. (16 Mar 2016) I'm the God of all OOP
  4. (15 Mar 2016) All you need is a dictionary (and O(N) )
  5. (14 Mar 2016) Finding today's log file