Ayende @ Rahien

It's a girl

Reviewing LevelDB: Part IV: On std::string, buffers and memory management in C++

his is a bit of a side track. One of the things that is quite clear to me when I am reading the leveldb code is that I was never really any good at C++. I was a C/C++ developer. And that is a pretty derogatory term. C & C++ share a lot of the same syntax and underlying assumption, but the moment you want to start writing non trivial stuff, they are quite different. And no, I am not talking about OO or templates.

I am talking about things that came out of that. In particular, throughout the leveldb codebase, they are very rarely, if at all, allocate memory directly. Pretty much the whole codebase rely on std::string to handle buffer allocations and management. This make sense, since RAII is still the watch ward for good C++ code. Being able to utilize std::string for memory management also means that the memory will be properly released without having to deal with it explicitly.

More interestingly, the leveldb codebase is also using std::string as a general buffer. I wonder why it is std::string vs. std::vector<char>,  which would bet more reasonable, but I guess that this is because most of the time, users will want to pass strings as keys, and likely this is easier to manage, given the type of operations available on std::string (such as append).

It is actually quite fun to go over the codebase and discover those sort of things. Especially if I can figure them out on my own Smile.

This is quite interesting because from my point of view, buffers are a whole different set of problems. We don’t have to worry about the memory just going away in .NET (although we do have to worry about someone changing the buffer behind our backs), but we have to worry a lot about buffer size. This is because at some point (80Kb), buffers graduate to the large object heap, and stay there. Which means, in turn, that every time that you want to deal with buffers you have to take that into account, usually with a buffer pool.

Another aspect that is interesting with regards to memory usage is the explicit handling of copying. There are various places in the code where the copy constructor was made private, to avoid this. Or a comment is left about making a type copy-able intentionally. I get the reason why, because it is a common failing point in C++, but I forgot (although I am pretty sure that I used to know) the actual semantics of when/ how you want to do that in all cases.

Comments

Simon Skov Boisen
03/26/2013 11:38 AM by
Simon Skov Boisen

I would say that the group of objects that you make none-copyable in C++ sorta is in the same camp as those you use finalizers on in .NET, e.g. network-connections or file-handles.

Artem
03/26/2013 03:25 PM by
Artem

What's the difference between CLR buffer going to LOG versus C++ buffer going to C heap? Is main problem same in both cases?

Ayende Rahien
03/26/2013 05:15 PM by
Ayende Rahien

Artem, In C++, you have far more granular control over where that memory is. That greatly alleviate the issues. Sure, in both cases, you may get heap fragmentation, but if that is a problem, you can destroy the heap and create a new one. In .NET, that isn't possible.

Comments have been closed on this topic.