Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,549

filter by tags archive

Without strings, it is a dark, cold place…

time to read 2 min | 262 words

So I set out to do some non trivial stuff with file parsing. The file format is CSV, and I am explicitly trying to do it with as few string allocations as possible.

In effect, I am basically relying on a char array that I manually manage. But as it turns out, this is not so easy. To start with, 65279 should be taken out and shot. That is the BOM marker (U+FEFF), and it is has a very nasty habit of showing up when you are mixing StreamWriter and reading from a byte stream, even when I made sure to use the UTF8 encoding anyway.

It is possible, as I said, but it is anything but nice. I set out to do non trivial stuff using this approach, but I wonder how useful this actually is. From experience, this can kill a system performance. This has been more than just my experience: http://joeduffyblog.com/2012/10/30/beware-the-string

Of course, the moment that you start dealing with your own string type, it is all back in the good bad days of C++ and BSTR vs cstr vs std::string vs. MyString vs OmgStr. For example, how do you look at the value during debug…

I am pretty sure that in general, that isn’t something that you’ll want to do. In my spike, quite a lot of the issues that came up were directly associated with this. On the other hand, this did let me do things like string pooling, efficient parsing with no allocations, etc.

But I’ll talk about that specific project in my next post.



This is an area where the DebuggerDisplay attribute is very helpful :) String indexing is really nontrivial with all Unicode stuff decorating the characters, so is it not better to use a single string as backing instead of a manual managed char array, or does this have the same issues? For your previous example, that would allocate just a single string per line.

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - 3 days from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats