Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,409

filter by tags archive

Without strings, it is a dark, cold place…


So I set out to do some non trivial stuff with file parsing. The file format is CSV, and I am explicitly trying to do it with as few string allocations as possible.

In effect, I am basically relying on a char array that I manually manage. But as it turns out, this is not so easy. To start with, 65279 should be taken out and shot. That is the BOM marker (U+FEFF), and it is has a very nasty habit of showing up when you are mixing StreamWriter and reading from a byte stream, even when I made sure to use the UTF8 encoding anyway.

It is possible, as I said, but it is anything but nice. I set out to do non trivial stuff using this approach, but I wonder how useful this actually is. From experience, this can kill a system performance. This has been more than just my experience: http://joeduffyblog.com/2012/10/30/beware-the-string

Of course, the moment that you start dealing with your own string type, it is all back in the good bad days of C++ and BSTR vs cstr vs std::string vs. MyString vs OmgStr. For example, how do you look at the value during debug…

I am pretty sure that in general, that isn’t something that you’ll want to do. In my spike, quite a lot of the issues that came up were directly associated with this. On the other hand, this did let me do things like string pooling, efficient parsing with no allocations, etc.

But I’ll talk about that specific project in my next post.


Comments

kpvleeuwen

This is an area where the DebuggerDisplay attribute is very helpful :) String indexing is really nontrivial with all Unicode stuff decorating the characters, so is it not better to use a single string as backing instead of a manual managed char array, or does this have the same issues? For your previous example, that would allocate just a single string per line.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats