Interview questions are always tough to design. On the one hand, you need to create something that will not be trivial to do, and on the other hand, you have a pretty much hard time limit to a reasonable solution. For example, while implementing a linked list is something that I would expect anyone to be able to do in an interview, implementing a binary tree (including the balancing), is probably not going to be feasible.
Interview tasks (that candidate can do at home) are somewhat easier, because you don’t have the same time constraints, but at the same time, if you ask for something that takes a week to write, candidates will skip the question and the position entirely. Another issue here is that if you ask a candidate to send a binary tree as a interview task, they are going to google –> copy & paste –> send, and you learn absolutely nothing*.
* Oh, sometimes you learn quite a lot, if a candidate cannot do that, they are pretty much disqualified themselves, but we could do that more easily with Fizz Buzz, after all.
So I came up with the following question, we have the following file (the full data set is 276 MB), that contains the entry / exit log to a parking lot.
The first value is the entry time, the second is the exit time and the third is the car id.
Details about this file: This is UTF8 text file with space separated values using Windows line ending.
What we need to do is to find out how much time a car spent in the lot based on this file. I started out by writing the following code:
You can find the full file here. The only additional stuff is that we measure just how much this cost us.
And this code process the 276MB file in 30 seconds, using a peak working set of 850 MB and allocating a total of 7.6 GB of memory.
I’m pretty sure that we can do better. And that is the task we give to candidates.
This has the nice advantage that this is a pretty small and compact problem, but to improve upon it you actually need to understand what is going on under the covers.
I’ll discuss possible optimizations for this next week.
More posts in "Making code faster" series:
- (24 Nov 2016) Micro optimizations and parallel work
- (23 Nov 2016) Specialization make it faster still
- (22 Nov 2016) That pesky dictionary
- (21 Nov 2016) Streamlining the output
- (18 Nov 2016) Pulling out the profiler
- (17 Nov 2016) I like my performance unsafely
- (16 Nov 2016) Going down the I/O chute
- (15 Nov 2016) Starting from scratch
- (14 Nov 2016) The obvious costs
- (11 Nov 2016) The interview question