The cost of abstraction - Part III

time to read 3 min | 479 words

Here is another thing to note, computers are fast. I can't tell you how fast because it would take too long. Thinking about micro performance is a losing proposition. Sasha has asked an important question:

Now assume you have a non-functional requirement saying that you must support 1,000,000 messages per second inserted into this queue. Would you still disregard the fact using an interface is a BAD decision?

My answer, yes. The reason for that? Let us take a look at the slowest method I could think of to do in process queue:

public static void Main(string[] args)
{
	Queue<string> queue = new Queue<string>();
	ThreadPool.QueueUserWorkItem(delegate(object state)
	{
		Stopwatch startNew = Stopwatch.StartNew();
		for (int i = 0; i < 100000000; i++)
		{
			lock(queue)
			{
				bool queueEmpty = queue.Count == 0;
				queue.Enqueue("test");
				if(queueEmpty)
					Monitor.Pulse(queue);
			}
		}
		Console.WriteLine("Done publishing in: " + startNew.ElapsedMilliseconds);
	});
	ThreadPool.QueueUserWorkItem(delegate(object state)
	{
		Stopwatch startNew = Stopwatch.StartNew();
		for (int i = 0; i < 100000000; i++)
		{
			lock (queue)
			{
				while (queue.Count == 0) 
					Monitor.Wait(queue);
                                queue.Dequeue();
			}
		}
		Console.WriteLine("Done reading in: " + startNew.ElapsedMilliseconds);
	});
	Console.ReadLine();
}

On my machine, this outputs:

Done publishing in: 23044 (ms)
Done reading in: 26866 (ms)

Which means that it processed one hundred million items in just 26 seconds or so.

This also puts us at close to 4 million messages per second, using the most trivial and slow performing approach possible.

In fact, using a LockFreeQueue, we get significantly worse performance (3 times as slow!). I am not quite sure why, but I don't care. Pure pumping of messages is quick & easy, and scaling to millions of messages a second is trivial.

Yes, dispatching the messages is costly, I'll admit. Changing the reader thread to use:

instance.Handle(queue.Dequeue());

Has dropped the performance to a measly 2 million messages a second. Changing the message handler to use late bound semantics, like this:

IHandler instance = Activator.CreateInstance<Handler>();
instance.Handle(queue.Dequeue());

Would drop the performance down to 300,000 per second. So it is far more interesting to see what you are doing with the message the the actual dispatching.

And yes, here, in the dispatching code, I would probably do additional work. Using GetUninitializedObject() and cached constructor I was able to raise that to 350,000 messages per second.

And yes, this is a flawed benchmark, you need to test this with multiple readers & writers to get a more real understanding what what would actually happen. But initial results are very promising, I think.

And again, computers are fast, don't look for microseconds.