Making code faster: The interview question

architecture (629) rss
bugs (451) rss
challenges (137) rss
community (391) rss
databases (482) rss
design (905) rss
development (673) rss
hibernating-practices (75) rss
miscellaneous (593) rss
performance (399) rss
programming (1125) rss
raven (1492) rss
ravendb.net (582) rss
reviews (184) rss

2026
- June (2)
- May (2)
- April (5)
- February (4)
- January (5)
2025
- December (8)
- November (4)
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Nov 11 2016

Making code fasterThe interview question

time to read 3 min | 424 words

Interview questions are always tough to design. On the one hand, you need to create something that will not be trivial to do, and on the other hand, you have a pretty much hard time limit to a reasonable solution. For example, while implementing a linked list is something that I would expect anyone to be able to do in an interview, implementing a binary tree (including the balancing), is probably not going to be feasible.

Interview tasks (that candidate can do at home) are somewhat easier, because you don’t have the same time constraints, but at the same time, if you ask for something that takes a week to write, candidates will skip the question and the position entirely. Another issue here is that if you ask a candidate to send a binary tree as a interview task, they are going to google –> copy & paste –> send, and you learn absolutely nothing*.

* Oh, sometimes you learn quite a lot, if a candidate cannot do that, they are pretty much disqualified themselves, but we could do that more easily with Fizz Buzz, after all.

So I came up with the following question, we have the following file (the full data set is 276 MB), that contains the entry / exit log to a parking lot.

The first value is the entry time, the second is the exit time and the third is the car id.

Details about this file: This is UTF8 text file with space separated values using Windows line ending.

What we need to do is to find out how much time a car spent in the lot based on this file. I started out by writing the following code:

You can find the full file here. The only additional stuff is that we measure just how much this cost us.

And this code process the 276MB file in 30 seconds, using a peak working set of 850 MB and allocating a total of 7.6 GB of memory.

I’m pretty sure that we can do better. And that is the task we give to candidates.

This has the nice advantage that this is a pretty small and compact problem, but to improve upon it you actually need to understand what is going on under the covers.

I’ll discuss possible optimizations for this next week.

Tweet Share Share 61 comments

Tags:

Comments

11 Nov 2016
11:09 AM

Ryan Heath

Few points:

ReadAllLines has to go, you do not want to read the entire file in memory
lambda properties have to go, you should split the line only once in the c'tor
*instead of using linq, use a dictionary to hold the sum per id, if (and only if) the dictionary is growing too large you could use files calc the sum per id

// Ryan

11 Nov 2016
11:11 AM

Oren Eini

Ryan,
What do you mean by the last one about dictionary and files?

11 Nov 2016
11:15 AM

Rytis Damalakas

Am i missing something?

ReadLine (record);
carDurations[record.carId] += record.timeTo - record.timeFrom;
repeat until EOF;

saveToFile(carDurations);

Of course we can optimise data reading, allocating strings for one line, but i don't get where are 7gb :)

11 Nov 2016
11:17 AM

Oren Eini

Rytis,
That isn't 7 GB memory used, that is 7 GB memory allocated

11 Nov 2016
11:35 AM

Jason Evans

Using lambdas for properties should have no performance impact, since the IL compiles to the same as if the property was declared without a lambda.

public long Id1 => long.Parse(_line.Split(' ')[2]);

public long Id2
{
    get
    {
        return long.Parse(_line.Split(' ')[2]);
    }
}

.property instance int64 Id1() 
{
  .get instance int64 ConsoleApplication1.Record::get_Id1() 
} // end of property Record::Id1

.property instance int64 Id2() 
{
  .get instance int64 ConsoleApplication1.Record::get_Id2() 
} // end of property Record::Id

11 Nov 2016
11:38 AM

ren

Read a line at a time and keep a Dictionary<int, int> /<car id, time in ms>/ and update dictionary accordingly, fast and easy!

11 Nov 2016
12:04 PM

Sergey

You shouldn't read whole file at a time.
Read it line by line :)

11 Nov 2016
12:11 PM

Ryan Heath

Hmm, the markdown didn't work?

Obviously the way to go is to use a dictionary, but if there are a lot of unique ids, you could write intermediate sums to a few files, merge the files until you have one file.

But that in it self is probably a whole different domain problem altogether ...

// Ryan

11 Nov 2016
12:22 PM

Ryan Heath

@ Jason, the properties are accessed multiple times, each time you will split a string resulting in small strings.
Each time allocating memory that you are going to throw away ...

// Ryan

11 Nov 2016
12:23 PM

kem

replacing File.ReadAllLines with File.ReadLines
String.Split with substring
and using AsParallel and Dictionary<long,long> for results reduces
memory usage from 832MB to 0.39MB and
time from 45 seconds to 12 seconds

11 Nov 2016
12:24 PM

Tomasz

How much time is given to candidate to do it on interview ?

11 Nov 2016
12:26 PM

Uri

it really depends what are your limits, if you want to do small memory usage, you would have to read the file twice. if you want performance assuming you don't have memory limit for the dictionary - just hold them by dictionary id and add the sums.

11 Nov 2016
12:33 PM

Jesús López

Read the file in binary mode, records are fixed size, so you can read each line into the same byte array. Manually parse each record from the byte array taking advantage of ASCII table: digit = byte - 48 (digits in UTF8 are encoded like in ASCII). Use a dictionary<int, long> for aggregating:

foreach(record in records) {
	long existingDuration = 0;
	dic.TryGetValue(record.id, out existingDuration);
	dic[record.id] = existingDuration + record.Duration;
}

11 Nov 2016
12:51 PM

Alwin

First thing I would try and measure is change to File.ReadLines.
Second to change all Record props to get; private set; and set the props in the Record ctor.

11 Nov 2016
12:52 PM

Jesús López

Where can I find the 276MB file?

11 Nov 2016
12:53 PM

Jason Evans

@Ryan Ahh, right - I wasn't talking about the implementation of the property itself (or how they are used), simply that using a lambda to declare it results in the same IL as not using one, in response to this comment:

lambda properties have to go

Initially, when I read that comment, it seemed that lambda syntax was being singled out as an issue. I appreciate now that you meant the usage of the properties, potentially in multiple calls, could be a perf problem that needs looking into.

11 Nov 2016
13:56 PM

hangy

I'm not sure which order .NET uses to parse dates with DateTime.Parse, but as the date format seems to be fixed, it should be faster to use DateTime.ParseExact instead of having .NET guess the correct format.

DateTime.ParseExact("2015-01-01T14:01:02", "yyyy-MM-dd'T'HH:mm:ss", System.Globalization.CultureInfo.InvariantCulture, System.Globalization.DateTimeStyles.AssumeLocal)

Apart from the fact that the timestamps and car id should probably not be parsed every time the property is accessed, it is most likely even worse that _line.Split(' ') is run on every getter. :)

11 Nov 2016
15:03 PM

Van

Since the content size is fixed, use a stream reader and read only the text needed instead of relying on ReadAllLines or ReadLines. That will remove a lot of string allocations from read lines and splits.

11 Nov 2016
15:29 PM

Alex Davidson

Before even running a profiler, I'd fix the obvious algorithmic inefficiency and process line-by-line with running totals.
Next step might be to eliminate any repeated work, eg. Split calls, assuming any remained after the first step.
Then I'd apply the profiler.

My gut says the DateTime.Parse calls are likely to cost more CPU time than string allocations.
While we are doing a lot of allocations, I would by now expect I/O to be dominating in the profiler.

Regarding the use of a fixed-size buffer: the file is specified to be UTF-8, not necessarily ASCII. It's probably reasonable to assume that it contains no non-ASCII characters, but I'd still want clarification on that point before assuming I could handle it as such!

11 Nov 2016
15:59 PM

Oren Eini

Tomasz,
Typically, 10 - 60 minutes if done on site, and a few days if at home.
The quality of the solution has to match, though

11 Nov 2016
16:00 PM

Oren Eini

Jesús,
That is a possibility, and explicitly why the file is designed in such a manner.
However, you'll probably be surprised to learn that at this point, dictionary accesses are 50%+ of your time

11 Nov 2016
16:02 PM

Oren Eini

Jesus,
https://drive.google.com/file/d/0B-GYDT6Flp-saDFrdXQzQVRFZlE/view?usp=sharing

11 Nov 2016
16:07 PM

Arseny Kapoulkine

Is it possible to get the 276 MB file in question?

11 Nov 2016
16:08 PM

Oren Eini

Arseny,
See the comment,
https://drive.google.com/file/d/0B-GYDT6Flp-saDFrdXQzQVRFZlE/view?usp=sharing

11 Nov 2016
18:55 PM

Adam Weigert

About 45MB max memory usage and 20 seconds or so under the profiler...

static void Main(string[] args)
{
    var path = args[0];

    var summary = new Dictionary<string, TimeSpan>();

    const int TimestampSize = 19;
    const int IdSize = 8;
    const int NewLineSize = 2;
    const int EnterOffset = 0;
    const int ExitOffset = EnterOffset + TimestampSize + 1;
    const int IdOffset = ExitOffset + TimestampSize + 1;

    var buffer = new char[48];
    var str = new StringBuilder(TimestampSize, TimestampSize);

    using (var stream = File.OpenRead(path))
    using (var reader = new StreamReader(stream))
    {
        while (!reader.EndOfStream)
        {
            reader.ReadBlock(buffer, 0, buffer.Length);

            str.Clear();
            str.Append(buffer, EnterOffset, TimestampSize);

            var enter = DateTime.ParseExact(str.ToString(), "yyyy-MM-ddTHH:mm:ss", CultureInfo.InvariantCulture);

            str.Clear();
            str.Append(buffer, ExitOffset, TimestampSize);

            var exit = DateTime.ParseExact(str.ToString(), "yyyy-MM-ddTHH:mm:ss", CultureInfo.InvariantCulture);

            str.Clear();
            str.Append(buffer, IdOffset, IdSize);

            var id = str.ToString();
            var duration = exit - enter;

            if (summary.ContainsKey(id))
                summary[id] += duration;
            else
                summary.Add(id, duration);

            reader.ReadBlock(buffer, 0, NewLineSize);
        }
    }

    using (var stream = File.CreateText("summary.txt"))
        foreach (var stat in summary)
            stream.WriteLine($"{stat.Key:D10} {stat.Value:c}");
}

11 Nov 2016
23:01 PM

Michael

So I had some spare time tonight and since I've been optimizing code all week I thought I give it a try.

This is how the given code performed on my laptop (Dell M6700 i7-3940XM), I took the best of three runs:
Code from blog: "Took: 38.051 ms and allocated 9.661.701 kb with peak working set of 1.167.448 kb"
Code from the 7-Zip File: "Took: 33.425 ms and allocated 4.587.464 kb with peak working set of 1.260.084 kb"

So here is my code. I went a little crazy. It's neither very readable nor maintainable, but you did not ask for pretty, you asked for fast and memory efficient:

    public class MySolution6
    {
        public const int cRecordSize = 50;
        public const byte cZero = (byte) '0';

        static void Main( string[] args )
        {
#if !CORECLR
            AppDomain.MonitoringIsEnabled = true;
#endif
            byte[] readBuffer = new byte[ cRecordSize ];
            var sp = Stopwatch.StartNew();

            int len;
            Dictionary<int, Record> records = new Dictionary<int, Record>();
            using( FileStream f = new FileStream( args[ 0 ], FileMode.Open, FileAccess.Read, FileShare.Read, 65536, FileOptions.SequentialScan ) )
            {
                using( BinaryReader b = new BinaryReader( f, Encoding.ASCII ) )
                {
                    while( ( len = b.Read( readBuffer, 0, 50 ) ) > 0 )
                    {
                        int id;
                        long ticks;
                        ProcessInput( ref readBuffer, out ticks, out id );
                        Record r1;
                        if( records.TryGetValue( id, out r1 ) )
                        {
                            r1.Ticks += ticks;
                            records[ r1.Id ] = r1;
                        }
                        else
                        {
                            records.Add( id, new Record
                            {
                                Id = id,
                                Ticks = ticks
                            } );
                        }
                    }
                }
            }
            byte[] output = new byte[ 23 ];
            using( FileStream f = new FileStream( "summary_mySolution6.txt", FileMode.Create, FileAccess.Write, FileShare.None, 16384, FileOptions.SequentialScan ) )
            {
                using( BinaryWriter b = new BinaryWriter( f ) )
                {
                    foreach( var r in records.Values )
                    {
                        int i = 0;
                        WriteOutput( r, ref output, ref i );
                        b.Write( output, 0, i );
                    }
                }
            }

#if CORECLR
            Console.WriteLine( $"Took: {sp.ElapsedMilliseconds:#,#} ms and a peak working set of {Process.GetCurrentProcess().PeakWorkingSet64 / 1024:#,#} kb" );
#else
            Console.WriteLine( $"Took: {sp.ElapsedMilliseconds:#,#} ms and allocated {AppDomain.CurrentDomain.MonitoringTotalAllocatedMemorySize / 1024:#,#} kb with peak working set of {Process.GetCurrentProcess().PeakWorkingSet64 / 1024:#,#} kb" );
#endif

        }

        private static void ProcessInput( ref byte[] theInput, out long theTicks, out int theId )
        {
            var start = ParseDateTime( ref theInput, 0 );
            var end = ParseDateTime( ref theInput, 20 );
            theTicks = ( end - start ).Ticks;
            theId = ParseId( ref theInput, 40 );
        }

        private static DateTime ParseDateTime( ref byte[] theInput, int theIndex )
        {
            int year = ( theInput[ theIndex++ ] - cZero ) * 1000 +
                       ( theInput[ theIndex++ ] - cZero ) * 100 +
                       ( theInput[ theIndex++ ] - cZero ) * 10 +
                       ( theInput[ theIndex++ ] - cZero );
            theIndex++; // Skip -
            int month = ( theInput[ theIndex++ ] - cZero ) * 10 +
                       ( theInput[ theIndex++ ] - cZero );
            theIndex++; // Skip -
            int day = ( theInput[ theIndex++ ] - cZero ) * 10 +
                       ( theInput[ theIndex++ ] - cZero );
            theIndex++; // Skip T
            int hours = ( theInput[ theIndex++ ] - cZero ) * 10 +
                       ( theInput[ theIndex++ ] - cZero );
            theIndex++; // Skip :
            int minutes = ( theInput[ theIndex++ ] - cZero ) * 10 +
                       ( theInput[ theIndex++ ] - cZero );
            theIndex++; // Skip :
            int seconds = ( theInput[ theIndex++ ] - cZero ) * 10 +
                       ( theInput[ theIndex ] - cZero );
            return new DateTime( year, month, day, hours, minutes, seconds );
        }

        private static int ParseId( ref byte[] theInput, int theIndex )
        {
            int result = 0;
            for( int i = 0 ; i < 8 ; i++ )
            {
                result *= 10;
                result += ( theInput[ theIndex++ ] - cZero );
            }
            return result;
        }

        private static void WriteOutput( Record theRecord, ref byte[] theOutput, ref int theIndex )
        {
            WriteId( ref theRecord, ref theOutput, theIndex );
            var i = theIndex + 10;
            theOutput[ i++ ] = 32;
            var ch = TimeSpan.FromTicks( theRecord.Ticks ).ToString().ToCharArray();
            for( var j = 0 ; j < ch.Length ; j++ )
            {
                theOutput[ i + j ] = (byte) ch[ j ];
            }
            i += ch.Length;
            theOutput[ i++ ] = (byte) '\r';
            theOutput[ i++ ] = (byte) '\n';
            theIndex = i;
        }

        private static void WriteId( ref Record theRecord, ref byte[] theOutput, int theIndex )
        {
            var d = theRecord.Id;
            for( int i = 9 ; i > -1 ; i-- )
            {
                var r = d % 10;
                d = d / 10;
                theOutput[ theIndex + i ] = (byte) ( cZero + r );
            }
        }

        struct Record
        {
            public int Id { get; set; }
            public long Ticks { get; set; }
        }

    }

Here are the numbers: "Took: 1.300 ms and allocated 65.706 kb with peak working set of 33.080 kb"
So compared to the blog version it runs about 29,5 times faster, allocates 0.7% of the memory and has 3% of the peak working set.

Maybe I should give some explanations:

We know it's (probably) a machine-generated file, records length is fixed, 50 bytes per line including newline characters.
The file is UTF-8 (no BOM), but it's just using ASCII characters, so we can treat is as binary, one char = one byte.
Opening the File with FileOptions.SequentialScan and larger buffer size might make things a little faster.
The date format is fixed, it easy to parse it ourselves. No need for DateTime.Parse. Some goes for Id.
Id is 8 digits, so we don't need a long. Int is sufficient.
There is no need for Record to be a class type, we can make it a struct.
We don't need to store Start and End timestamps since we are interested in the duration anyway. But since we need to do a little Math, we store ticks.
Linq grouping isn't really fast, much better to store our records in a dictionary and do the grouping operation (i.e. the addition) ourselves.
Our output file is UTF-8, but we just use ASCII characters. So we can write ASCII characters as bytes.
We know each output line is max. 23 bytes, so allocate once and reuse.
Don't try this at home.

12 Nov 2016
05:14 AM

Arseny Kapoulkine

Nice question! Here's my solution: https://gist.github.com/zeux/999654a6ce2a32e219a2ed0f1f0fa935 (sorry, I don't work in .NET any more...)

It clocks at around 185 ms (=> 1.5 GB/s). It should be possible to make it faster but unfortunately VS 15 Preview 5 doesn't seem to have a functioning profiler and I've already spent an hour on this so I'll leave it as is :)

mktime() is surprisingly slow. Based on Michael's results .NET's DateTime is actually much better - if I remove the same-day optimization my code takes 2.8 sec.

Record validation is compiled out since it's in an assertion; compiling it in reduces the time to 320 ms. Validation code could be optimized as well.

12 Nov 2016
05:16 AM

Arseny Kapoulkine

P.S. mmap is actually pretty useless here - it saves about 20 ms based on my measurements. Not really worth it but I left it there because why not.

12 Nov 2016
08:16 AM

Oren Eini

Adam,
At a guess, you'll find that quite a lot of your performance is going on string alloc & parsing.
But a lot of that is also going to the dictionary

12 Nov 2016
08:17 AM

Oren Eini

Michael,
You can probably reduce the time further by removing the new DateTime calls, they are expensive.
And you can write the date time without generating a string each time

12 Nov 2016
08:21 AM

Oren Eini

Arseny,
Nice, I wasn't away that you could do overloading based on the size of the array, but this make the code a pleasure to read.
Note that in .NET, a lot of the performance goes to actually writing this out as string, though. So I'll probably need to optimize that as well

12 Nov 2016
09:15 AM

Oren Eini

Arseny,

However, note that your code has a major difference in behavior.
You are assuming that the number of ids is small. And in fact, the higher id value in the sample file is 203,220.
That means that your array is under 1 MB in size, and you save all the dictionary lookups.
However, if you consider the size of the id, it has 8 chars, so it can be a max of 99,999,999, in which case you'll be using 400 MB of RAM or so.
Doable, but quite expensive.

12 Nov 2016
09:15 AM

Paul

Looks interesting but I cannot find the time to try.
Will it make a lot of difference if you check if the date is identical on byte level (which it probably is a lot of the times in a parking garage), and then compare the time manually on byte level?
This will probably save a lot of DateTime allocation

12 Nov 2016
09:17 AM

Oren Eini

Arseny,
Oh, and you output the time in seconds, rather than the TimeSpan format (that actually matter a lot for perf) :-)

12 Nov 2016
11:03 AM

Oren Eini

Paul,
That is an interesting idea, I was able to take that and optimize things even further, thanks

12 Nov 2016
11:34 AM

URi

This thread became very interesting! , I'm sure there are even more optimization such as write c++ external dll and include asm functions so all parsing and dictionary costs can be even more efficient and with less cpu instructions.

In this case , many of the costs are going to the dictionary's lookups so maybe a smarter dictionary implementation can same some more

Very interesting!

12 Nov 2016
11:38 AM

Uri

Another assumption, we can get the file size and compute how many lines there are, so do the smart dictionary allocations before and with all the required size, this might give another boost

12 Nov 2016
15:17 PM

alex

Well the first question should be what the performance goals are, in terms of wall time, memory usage and cpu load. But given the question we could assume that we want the best possible performance in all these aspects balanced with priority to wall time first, memory usage second and cpu load last.

We have fixed size, fixed format records as input: two timestamps with seconds resolution and an 8 digit id with the alphabet '0'..'9'.

Requested output is an unordered sequential access collection with entries consisting of:

the 8 digit id; could be represented by a uint (it requires a maximum of 7 nibbles, 27 bits to be exact, to represent 0-99999999).
a duration with a seconds resolution; could be represented by a uint (assuming more than 3200 years duration is no longer realistic).

I would:

use a memmap over the input file, parse byte records with some level of format validity checking.
custom timestamp parsing to a seconds number, duration the difference between them.
explore a number of output building options:
- if we don't mind allocating around 380 MB, we could also simply use an array indexed by the Id (this would not be memory or cache friendly but give us an interesting baseline measurement).
- the quick easy solution would be to track them in a Dictionary<uint, uint>.
- we could also note that the information content of an id is max 27 bits and use a (custom) cache conscious compressed trie.
parallellize work if we find that a significant portion of it is CPU bound.

It is an interesting enough question. I may give this a go.

12 Nov 2016
16:06 PM

Arseny Kapoulkine

Oren,

Re max id size - yeah. Standard hash map is significantly slower and I wasn't up for writing a custom one (which would still be slower of course, but not as much). 400 MB is not too much, and if you preallocate 400 MB the speed doesn't tank dramatically (goes to ~250 ms or so). Obviously you will need 400 MB if the car ids are dense and occupy the entire range - in fact, my solution in this case will occupy far less memory than any other, because for any dictionary you'll pay a very hefty price - OTOH 4b for key, 4b for value, 8b for "next" pointer assuming chained hashing, 8b for the pointer in the bucket array, 8b for the allocation overhead (not on all systems) - so 4x to 6x what I have. Even with a dense hash table it'll still be 2x-3x what I have. So my solution is better for the worst case, which is the case you should be optimizing for :)

Re: timespan output format - fair point. I was pretty surprised that the output portion does not seem to take any meaningful time in my case - I'm used to printf being slow. As I said I don't have a working profiler which limits my ability to analyze this code :( anyway, I think this is fixable without changing the resulting time that much because printf basically has to do roughly the same operations when formatting the number that you'd need to do here, it just does it in decimal as opposed to base 60. I'll update the code to match the output format and we'll see what the impact is.

12 Nov 2016
16:13 PM

Arseny Kapoulkine

Ah, wait, I am not counting the time it takes to output :( with printf it's actually 230 ms. Damn it.

12 Nov 2016
16:45 PM

Arseny Kapoulkine

Updated gist (https://gist.github.com/zeux/999654a6ce2a32e219a2ed0f1f0fa935) with new output code and new validation code - 175 ms now with both output and validation enabled (validation doesn't check that your time format is precisely right, but it does check all separators and digits for sanity).

12 Nov 2016
17:52 PM

Uri

@Arseny, this is quite impressive! I think you optimized almost everything

13 Nov 2016
09:28 AM

Michael

Ok, picking off where I left, I optimized my solution a bit further.
12. Loose the struct and go for a Dictionary<int,long>, pretty obvious
13. Loose the BinaryReader and BinaryWriter and read and write directly to the filestream
14. If StartDate and EndDate are the same, compute the Ticks directly using

int s1 = ( ( ( hours * 60 ) + minutes ) * 60 ) + seconds;
int s2 = ( ( ( hours2 * 60 ) + minutes2 ) * 60 ) + seconds2;
return ( s2 - s1 ) * TimeSpan.TicksPerSecond;

We are under a second now: Took: 896 ms and allocated 60.807 kb with peak working set of 29.616 kB Not bad. But let's not stop yet.
15. When writing the output, compute the timespan string ourselves (ripped straight from TimeSpanFormatter)
16. Use DateTime just when the dates have different year (just about 900 cases).
Now the allocated memory is down to 21.914 kB.
It's time for the gloves to come off. Dictionary<,> has to go.
17. We are storing the values in a LinkList<long[1024][1024]>. Array indices are computed with >> and &. Long[1024] are allocated on the go as needed. Output is written straight from the data (no Enumerator)
18. Some finishing touches: Unroll the Loop when parsing the id, remove some ProcessInput function and move it's contents to the inner read loop

And here are the numbers: Took: 442 ms and allocated 1.681 kb with peak working set of 13.528 kB (net462, .net core is a little faster)
I'm now 86x faster than the original solution, allocating just 0,02% of memory and have a peak working set that just 1% of the original.
And still i'm using no unsafe code. I know I could loose the LinkedList<> for the given data set, but right now the solution is correct, fast and memory efficient for any well-formed Input file, even if it is 10x the size, so removing it would be kind of cheating.

Thing's tried that did not pay off:

Structure Magic
SIMD with System.Numeric.Vector
Multithreading

PS: Bringing C++ to a C# fight. So unfair! ;-)

13 Nov 2016
09:33 AM

Michael

Source code: http://pastebin.com/jHvYNDBc

13 Nov 2016
12:10 PM

Anders Strömberg

This was a quite interesting assignment. I did not reach the optimization levels some of the others in this post but I got down to 3 993 kb with peak working set of 13 320 kb. For anyone interested my code is at gist.

Some people seem to roll their own date calculations but that feels quite risky to me. There are so many things to consider such as leap years, daylight savings time (depending on country), leap seconds and even events such as Russia getting rid of daylight savings time.

14 Nov 2016
05:07 AM

Arseny Kapoulkine

Anders, I thought about DST and I believe there are two issues with considering it:

You don't know what country the data is from, so you can't even figure it out
If you did know what country the data is from, the date/time format is inherently ambiguous if the time happens to be the hour of the DST switch since you don't know if it's the first occurrence of that hour

Michael, nice! This actually matches my past experience pretty well - my rule of thumb from the days of working with .NET was that tuned .NET code is roughly 3x slower than tuned C++ code for compute-bound problems, assuming .NET code does not use unsafe. I was thinking of a two-level array structure, guess I'll go ahead and do that as well.

14 Nov 2016
05:29 AM

Arseny Kapoulkine

https://gist.github.com/zeux/90a49b85c8cfdf04ffa5489ec8916271 - 135 ms. Two-level array on its own makes no difference for the time, but allows a nicer structure for multi-threading; this is using 4 threads on a 2-core HT system (same time as 2 threads, really).

14 Nov 2016
07:42 AM

Jesús López

@Ayende. The dictionary can be replaced by a 800Mb array of longs, it would be faster but it would require more memory:

var durations = new long[100000000];
foreach(record in records) {
    durations[record.id] += record.Duration;
}

14 Nov 2016
07:52 AM

Jesús López

I think an array of int's would be enough

14 Nov 2016
14:07 PM

Michael

@Anders: You are absolutely right regarding the date calculations. It's really problematic, that's why didn't include it in my first version. However with the given data set there is no difference between using DateTime or calculating it directly, so I mainly was interested in how fast I could make it. At least I adjusted for leap years and used DateTime for cases when the start is in the old year and the end is in the new year to account for leap seconds. I would not do such an optimization in production code unless I know it's safe. Let's just pretend it's UTC and they forgot or decided to leave out the Z in the ISO 8061 format. ;-)

@Arseny: That's a really great solution! I might try your multi-threading approach if I find the time. I tried multi-threading before, but your approach is much better. I guess I was thinking to complicated.

There is one more thing I'd like to share. I tried using unsafe code to do a poor man's variant of SIMD. I pinned the memory for my read buffer and then obtained a ulong* pointer to do some calculations. First I convert the ASCII digits to numbers (-'0'), then I convert the Id to an Int32. Please note that the constants for substractions and the Int32 conversion AND masks are adjusted for the little-endianness of Intel CPUs. This operation makes the code run a little faster, but it's allocates more memory (because of pinned memory?) and peak working set is higher.

ulong* u = ulongPtr;
// Convert ASCII digits to numbers
*u++ -= 0x0030300030303030;// => 2015-01- = xxxx0xx0
*u++ -= 0x3030003030003030;// => 01T16:44 = xx0xx0xx
*u++ -= 0x3030303000303000;// => :31 2015 = 0xx0xxxx
*u++ -= 0x3000303000303000;// => -01-01T1 = 0xx0xx0x
*u++ -= 0x0030300030300030;// => 9:09:14. = x0xx0xx0
*u -= 0x3030303030303030;// => 00043064 = xxxxxxxx

// Convert Id to Int32
const ulong lm = 0x0f000f000f000f00;
const ulong hm = 0x000f000f000f000f;
*u = ( ( *u & hm ) * 2560 ) + ( *u & lm );
byte* b = (byte*) u;
int id = *( b + 1 ) * 10000000 + *( b + 3 ) * 100000 + *( b + 5 ) * 1000 + *( b + 7 );

14 Nov 2016
15:18 PM

Salman Ahmed

Oren,

How can I obtain a copy of that 267Mb data file?

14 Nov 2016
17:27 PM

alex

With some unsafe code (see the approach I outlined earlier) and a significant amount of validity checking. It looks like performance might not be too far from a c++ implementation (@Arseny) and not much faster than optimized c# code that does not require "unsafe" constructs (@Michael). Numbers for the code in this gist:

Running Alt.FlexArray ... Took: 260 ms and allocated 1,872 kb with peak working set of 295,324 kb
Running Reference ... Took: 40,252 ms and allocated 4,586,020 kb with peak working set of 1,267,760 kb

So the Reference version is 155 times slower, allocates 2450 times more memory and has a 4.29 times higher peak working set.

Using the "FlexArray" (basically a single level trie, or "jagged array" using the first 12 bits for its first level) performs significantly better than a regular pre-allocated fixed size array uint[100000000]or a Dictionary<uint,uint>.

I would expect that a proper cache conscious trie implementation would improve this even more, because (I think) we are seeing primarily processor cache effects making "FlexArray" faster.

I also expect performance improvements from @Michael's "bit hacking magic" above. It is similar to the approach I used in an earlier "Ayende Challenge" w.r.t. Etag parsing (see https://ayende.com/blog/169796/excerpts-from-the-ravendb-performance-team-report-etags-and-evil-code-part-i).

14 Nov 2016
19:51 PM

Michael

@alex: Great solution! I ran it on my computer and it was a little bit faster with about 225ms. But I took a page out of your book and improved my solution and ended up with 157ms. If you want to know how, check the next blog entry (The obvious costs).

15 Nov 2016
13:26 PM

DNF

on a consumer grade SSD, only read this file from disk will take about 0.5 sec (550 MB/sec / 276MB). so you need to have a RAM disk to make it faster.
you should be able to parallelize it if you have a read block of 64K (or so) and process it in one thread (record size is 50 bytes) . So you have one thread which is reading and storing what read into kind of a queue ( disruptor pattern, non blocking ) and others reading 64K blocks from that queue and processing it in own thread data storage - again non blocking. Once threads are done, you combine results from each thread into final result. this assumes you can read as fast as you can process data.

15 Nov 2016
13:36 PM

Oren Eini

DNF,
Because we are running this multiple times, the typical scenario is that the file is in the FS cache, so it isn't subject to the I/O limitation

15 Nov 2016
14:11 PM

DNF

@Ayende: I knew that. The point was that IO may impact tests/processing and depending how it impacts you will need to change strategy for parsing. The other point that it should be possible to have parallel algorithm which is faster than single threaded if you can avoid blocking/sharing data between threads, since others were getting opposite results. With the cache or RAM disk you can probably open file for read in each thread, and seek to position (increment position by read block size 64K or more, seek, read, process, in each thread) which can the only thing shared across threads. Obviously you don't need more threads than you have CPUs.

16 Nov 2016
23:23 PM

alex

The cost of performing a reasonable amount of validation of the input (input values that are in range of their respective domains and formats that match expectation), for the solution I posted above is around 50 ms, i.e. around 20% of total run time. I would still consider that well worth it.

21 Nov 2016
19:48 PM

Dan

Do you allow candidates to browse the internet when working on this question?

22 Nov 2016
13:02 PM

Oren Eini

Dan,
Yes, of course

06 Jan 2017
17:54 PM

Brad Wood

I took this log file as an exercise in my ongoing learning of F#. The language doesn't matter, but I was shocked by this huge perf gain just due to the overall algorithm. The problem was to return the time span for a given id/date. Here's how the exercise progressed:

Create tuples out of the lines (granted, wouldn't scale to larger file). Pick the correct line by checking for id first, then date. 25 seconds.
Same as first try except simply using a different method on the F# sequence type (the first method apparently didn't short-circuit after matching). 12 seconds.
Same as above method, but parallelize the file into chunks. Was quite surprised to see so little gain; would've thought that the thread overhead would be minimal compared to gains. Worked best at 6-15 chunks: 6 seconds.
Gather lines as simple string arrays splitting on space. Only return those line/arrays that contain the matching id. Reduce to matching date. This ran subsecond, nearly instantaneous. I was incredulous and had to debug to ensure that it really was searching the entire file.

02 Feb 2017
10:09 AM

Adam

FizzBuzz can get pretty interesting sometimes!

http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Making code fasterThe interview question

More posts in "Making code faster" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Making code faster" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication