Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,125 | Comments: 45,492

filter by tags archive

Stupid smart code

time to read 3 min | 549 words

We had the following code:

public static void WriteDataToRequest(HttpWebRequest req, string data)
    var byteArray = Encoding.UTF8.GetBytes(data);

    req.ContentLength = byteArray.Length;

    using (var dataStream = req.GetRequestStream())
        dataStream.Write(byteArray, 0, byteArray.Length);

And that is a problem, because it allocates the memory twice, once for the string, once for the buffer. I changed that to this:

public static void WriteDataToRequest(HttpWebRequest req, string data)
    var byteCount = Encoding.UTF8.GetByteCount(data);
    req.ContentLength = byteCount;
    using (var dataStream = req.GetRequestStream())
        if(byteCount <= 0x1000) // small size, just let the system allocate it
            var bytes = Encoding.UTF8.GetBytes(data);
            dataStream.Write(bytes, 0, bytes.Length);

        var buffer = new byte[0x1000];
        var maxCharsThatCanFitInBuffer = buffer.Length / Encoding.UTF8.GetMaxByteCount(1);
        var charBuffer = new char[maxCharsThatCanFitInBuffer];
        int start = 0;
        var encoder = Encoding.UTF8.GetEncoder();
        while (start < data.Length)
            var charCount = Math.Min(charBuffer.Length, data.Length - start);

            data.CopyTo(start, charBuffer, 0, charCount);
            var bytes = encoder.GetBytes(charBuffer, 0, charCount, buffer, 0, false);
            dataStream.Write(buffer, 0, bytes);
            start += charCount;

And I was quite proud of myself.

Then I realized that I was stupid. Why?



Lots of scanning/copying of data. Worse than zeroing and filling an array once.


btw, take the Flush out. Why are people always doing this? Why would the stream delete data on dispose?

Patrick Huizinga

There's also an overload of GetBytes that accepts a string instead of a char buffer. http://msdn.microsoft.com/en-us/library/c20ss51h.aspx

so you can remove the charBuffer and the while loop can be changed to: var charCount = Math.Min(maxCharsThatCanFitInBuffer, data.Length - start); var bytes = encoder.GetBytes(data, start, charCount, buffer, 0); // could not find any GetBytes overload that takes a bool dataStream.Write(buffer, 0, bytes); start += charCount

However, I fear this is not the stupid part of the code. For that I'll wait till tomorow.


Couldn't you just have used a StreamWriter? You can specify encoding and buffer size in its constructor and then write the string directly...

Patrick Huizinga

tobi: where exactly is the scanning happening? I can only spot operations that are / should be O(1)

And are you sure copying 100 times to an array of size 4K is slower that copying once to an array of size 400K?

Fredrik Mörk

The charBuffer seems unnecessary, given that a string can also act as a char[], so instead of shuffling those chars, you can have the encoder.GetBytes call pull the characters directly from data.

Richard Dingwall

What is the expected size of 'string data'?

Ryan Heath

You run the risk to split the string's surrogate pairs?

// Ryan

Thomas Levesque

@wouter, this doesn't allow you to know the length in bytes before you start writing the data, and headers must be sent before you start writing the body

Dave Hng

What is in 'string data'? Isn't that already a buffer of some sorts?


Patrick, GetByteCount is O(N) because UTF8 is variable length. Need to decode the entire string for this. I like the smaller chunks, however, for cache efficiency reasons.


Because you started with code that was crystal clear and anyone should be able to read & understand and replaced it with code that you need to read, reread, think about, and read again.


my guess

public virtual int GetByteCount(string s) { if (s == null) { throw new ArgumentNullException("s"); } char[] chars = s.ToCharArray(); return this.GetByteCount(chars, 0, chars.Length); }


GetRequestStream() returns a synchrounous stream.

That means until the stream is closed or flushed, data is not sent, it's stored in memory.


Writing data to a request? Huh?


I'm with wouter - my first thought was: "he's reimplemented StreamWriter.Write()".

@Thomas - ContentLength is being determined by Encoding.UTF8.GetByteCount(), not by examining the size of the encoded array (although I'm trusting Microsoft to have not implemented GetByteCount like that).


Sam, I believe there is no O(1) way to calculate GetByteCount.


@Ant - for instance, the body of a POST or PUT request.


"bytes" var is declared inside the loop so N many will be allocated. Of course these will eventually be GCed but the point of the exercise was to sacrifice simplicity for efficiency.


@tobi - my understanding is that the optimisation was intended to avoid allocating memory for the (potentially large) request body twice, not to minimise encoding time.


@JD - 'bytes' is actually an int. The joys of var and ambiguous variable names!

Patrick Huizinga

@tobi, missed that one, although I only looked inside the loop

@JD, bytes is an int, not much GC-ing there :)


You can use the lambda expression and execute it in one line:

Var reader = new StringReader(data); reader.CopyTo(req.GetRequestStream();

Rajeesh C V

If the method is not going to write big chunk of data, then you have increased the complexity instead of keeping it simple


Jarrett is the only one who's got it so far. Do you really need the extra code complexity and lack of maintainability for that performance gain? Do you have evidence that the extra memory allocation was really hurting the system?


I'm guessing this code is stupid because you probably don't have any profiling metrics that show it actually needs optimization in the first place. So, rather than fixing true bottlenecks or implementing new features, you have needlessly invested a lot of time and introduced much more complexity.

Frank Quednau

Confused by the first version in the first place. The simplest abstraction to write a string to a stream is a streamwriter, is it not?

using (var w = new StreamWriter(r.GetRequestStream(), Encoding.UTF8)) sw.Write(data);


Jarrett indeed is the first one who mentions the first thing that popped into my mind when reading this: "why change something which intent initially was perfectly obvious into something that you need to read multiple times and is a lot harder to understand for a micro-optimization that may not even necessarily be an issue?"

Doesn't seem like it's worth the extra complexity it introduces at a cost of read- and maintainability.

Bart Verkoeijen

What's the gain in performance compared to the extra complexity of the code?


@Frank Quednau - the first version is like it is due to the need to set ContentLength before writing to the stream


I believe the problem is two-fold:

1) GetByteCount(string s) converts the string into char[s.Length] - so nothing has been gained there

2) worse still, GetByteCount has to encode the characters to produce its result, then the characters are encoded a 2nd time to write to the stream

João Angelo

As other pointed out I'm on the side that the "I'm stupid" epiphany was due to realizing that you just reimplemented logic that is already implemented by StreamWriter.

Michael McDaniel

How about flushing inside the loop if you want the data streamed? Let's call it lots of "courtesy flushes"


I'm not sure I understand. In first solution you say memory is allocated twice. One for the string (parameter?) and for the buffer.

In second solution memory is allocated for string and then for buffer but multiple times. var buffer... var charBuffer... and in a while loop var bytes

So what you gain is that when data is very big you don't end up allocating a lot of memory on LOH. Instead you do a lot of small allocations if data is big.

Don't know what you mean by being stupid. I don't like the code before and after modification if data is very big. If it is not then why bother changing original code?


Tell us what you see Ayende


Use Stream.Copy

Steve Py

Jarret + 0x1000!

Seeing as these are strings being passed back on a request I cannot see them being HUGE. (many MBs) I'd expect if anything was a memory drain it would be from the server serving many requests in which case the complex code would solve nothing. Temporarily allocating a few extra K for byte arrays to maintain readable code is certainly worth it.



copies the value and doesn't reference the original char?


My guess is that since you instantiate a new "bytes" var in every loop it actually ends up holding everything in memory twice anyways(at least until the GC kicks in).


hmm... don't know; this seems a little like "select ain't broken" but... clearly i'm not the smartest guy in the room but i would think that you would want to flush the internal state of the encoder

of course i have been bitten by this type of post in the past to where it could be that there was no check for the "data" parameter being null or empty


It is almost always wrong to flush. The only reason I can come up with is to harden data on disk immediately for durability semantics. Flush = fsync. It is always slower and does not change the data written at all.

Ayende Rahien

Tobi, No, it isn't. There are a lot of reasons to flush. Reducing memory usage, for example, ensuring lower latency in network scenarios, for another.


Ayende, but data consistency is not one. Memory usage should be controlled by configuring the buffer size. You cannot release memory by flushing. The OS will flush to disk but still have the data cached. Network latency: true. Rarely needed and reduces throughput.

Ayende Rahien

Tobi, That is why I moved to stream writer there in the first place. And the flushing merely ensures that the data goes out. Without it, you are reliant on the Dispose calling Flush, and there are good reasons not of call it, sometimes. For example, you might call Dispose on the writer, which will dispose the stream, but you don't want that. One example why you want that.


req.GetRequestStream() returns an instance of internal ConnectStream class, whose Flush() method for some reason does exactly nothing...


var byteCount = Encoding.UTF8.GetByteCount(data); GetByteCount creates one more array of characters?

Ayende Rahien

Nitin, GetBytesCount does NOT create an array


Internally before counting the number of characters?

Ayende Rahien

Nitin, No, it doesn't do that

Comment preview

Comments have been closed on this topic.


  1. RavenDB 3.5 whirl wind tour: I'll have the 3+1 goodies to go, please - 3 days from now
  2. The design of RavenDB 4.0: Voron has a one track mind - 4 days from now
  3. RavenDB 3.5 whirl wind tour: Digging deep into the internals - 5 days from now
  4. The design of RavenDB 4.0: Separation of indexes and documents - 6 days from now
  5. RavenDB 3.5 whirl wind tour: Deeper insights to indexing - 7 days from now

And 10 more posts are pending...

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    05 May 2016 - Physically segregating collections
  2. RavenDB 3.5 whirl wind tour (14):
    04 May 2016 - I’ll find who is taking my I/O bandwidth and they SHALL pay
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats