Rule out the stupid stuff first: Select still ain’t broken
So, I sent a simple repro to the client and he reproduced the problem, and I was happy, because it ain’t my problem.
The good thing about sending a simple repro app to the client is that he can play with that easily, and that is how he discovered that it is my fault. In particular, it is all about the actual buffer size that we use.
If we try to send a large dataset using 4KB buffer, it would take much longer than it would take using 128KB buffer. But only when using a real network, not when running locally.
After looking at the matter for a while I figure it out. When using the default RavenDB builtin server, (based on the .NET HttpListener), it is actually flushing everything to the network card every time that you make a write. And there appears to be a non insubstantial cost of just doing that .I suspect that most of the cost is actually moving from user land to http.sys to do the work, but the problem was fairly clear.
When you have an expensive resource like that, there is a solution for it, buffering. And luckily for us, the .NET framework comes with a BufferedStream. Sadly, it uses a single buffer, and we don’t know ahead of time how much data we are going to write. Even more important, it creates its own buffers, about the only thing you can customize there is the buffer size.
So sure, we can just wrap it in a Buffered Stream and set the buffer size to 128Kb and be done with it, right? Not quite so.
The reason this is problematic is that this would allocate a buffer of 128Kb for every request, and buffers of that size goes to the Large Object Heap. Never mind that just allocating that much memory have its own performance issues.
So, here is the problem. And the question is how we deal with in. There is a good solution for allocating a lot of memory and filling up the Large Object Heap, and that is to use the BufferManager that comes with the framework. I discussed this in this post. Next, we wrote a Buffer Pool Stream, which uses a buffer taken from a buffer manager. This resolve the problem of filling up the Large Object Heap, but it created another problem if for every request, we would use up a 128Kb buffer, that would means that we would use up a lot of memory that we probably don’t need. Admittedly, even if we had a thousand concurrent requests, it would still amount to less than 130 MB, but that still bothers me.
Instead, we took a different path. We use multiple buffers of fixed sizes (4Kb, 8Kb, 16Kb, 32Kb, 64Kb, 128Kb) to buffer the response, and we switch between them on demand.
Here is how it works now:
<= 8 Kb
<= 28 Kb
<= 252 Kb
> 252 Kb
For the first 8 Kb, we will use a 4 Kb buffer, then switch to an 8 Kb buffer until we get to 28 Kb, then 16 Kb buffer all the way to 60 Kb, etc. In our tests ,this strategy showed the best usage of time vs memory on both large and small requests. And we make sure to use big buffers only when we absolutely have to.
The moral of the story, once you get your repro and see what is actually happening, dig a bit deeper. You might find some gold there.
In our case, we optimize network traffic significantly when you are running in service / debug mode. This is very relevant for a very common scenario, downloading the silverlight xap for the management studio, which should improve quite a bit now .
Is there a reason most size limits are (2^n)-4 Kb, instead of (2^n) Kb, (2^n)-1 Kb or (2^n) Kb -1 ?