Code through the looking glassAll you need is a dictionary (and O(N) )
The first question that ask in the coding task goes something like this: "Given a CSV file containing users' data, who is large, but can fully fit into memory, we want to be able to search by email address very quickly and get all the matching user ids. Optimize for fast queries over fast startup".
The intent is basically that you'll read the file and load that into a dictionary, then use that to perform queries.
This candidate has done just that, although things started being strange pretty much from the get go…
Dictionary<int, string> emails = new Dictionary<int, string>();
That seemed like a pretty strange way to set things up, but we have seen crazier. At this point I thought that they are probably storing the hash code of the email in the dictionary, and the string value is the concatenated list of all the matching ids.
The truth was… far more interesting. Here is the code for querying:
public Dictionary<int, int> EmailSearcher(string email) { Dictionary<int, int> answer = new Dictionary<int, int>(); int count = 0; foreach (var entry in emails) { if (entry.Value.ToString().Equals(email, StringComparison.OrdinalIgnoreCase)) { answer.Add(count, entry.Key); count++; } } return answer; }
This code actually have multiple levels of strangeness. The obvious one is that this is doing a linear search on the dictionary, but look at the return type as well…
The candidate was well aware that this code is not able to handle large amount of information, so the candidate sent us an alternative implementation. But I'll talk about that in my next post.
More posts in "Code through the looking glass" series:
- (18 Mar 2016) And a linear search to rule them
- (17 Mar 2016) Sorting large data sets
- (16 Mar 2016) I'm the God of all OOP
- (15 Mar 2016) All you need is a dictionary (and O(N) )
- (14 Mar 2016) Finding today's log file
Comments
Okay, I have to ask. Did someone ever try to use a sqlite as a solution? It's very hard to beat the speed of the like function of an sqllite database. I've tried and lost on several occasions..
Or is that considered cheating ;-)
p.s. Can you please restore linebreaks in the comment preview option?
Dave, We want you to do that without the use of external libraries. And it is quite easy to beat LIKE. It has to do a lot of calculations that you can avoid if you know what you need upfront
if all data fits in memory, and you don't have multi-servers / synchronizations problem, and you don't care about startup time, the straight forward solution is just to hold a dictionary (with ignore case comparer). The dictionary data structure is one of the basic foundations in computer science and learned at the first year of any computer science degree. where do you find this candidates? do you confirm that they have a degree from known university? i have to admit, this implementation is so bizarre
Uri, Indeed, the purpose of this test is to get an implementation just like that. We explicitly note that this can fit entirely in memory. And most implementations are pretty much boring "fill the dictionary", "get the values out of the dictionary".
In this case, the candidate is a graduate of a major university, yes. To be perfectly honest, this is the kind of question that I would expect people to be able to solve in high school.
Without external libraries? Parsing a csv file is not as trivial as using reader.ReadLine and string.Split(). I would use http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader to parse the csv file, then a Dictionary<string, List<int>>.
This is what you expect form the candidate: https://gist.github.com/jesuslpm/512ddc1fecaef4644709 ?
Jesus, We make allowances for that. The particular files we hand out can be parsed using a split.
Jesus, Pretty much, that kind of code is stupidly simple, but it does the work and is quite efficient.
The whole point of this particular question is to see whatever people are actually aware of basic data structures, and if you are capable of writing a bit of code without making utter mess of things. In particular, you'll be surprised at how often we see utterly ridiculous code for reading the file itself.
Almost the same as Jesus, but I prefer Linq above using a for/foreach (though I don't know which one preforms the best)
var transformed = ( from line in userData group line by new { line.Email } into groupedByMail select new { Email = groupedByMail.Key.Email UserIds = groupedByMail.Select(a => a.Id) // Maybe even .Distinct(), but since it's an Id, it suggests it's unique }) .ToDictionary(a => a.Email, a => a.UserIds);
XiniX00, That would work as well. It is a bit more expensive, but that would certainly be an acceptable solution
Comment preview