Reviewing LevelDB: Part IV: On std::string, buffers and memory management in C++

Mar 26 2013

Reviewing LevelDBPart IV

time to read 3 min | 449 words

his is a bit of a side track. One of the things that is quite clear to me when I am reading the leveldb code is that I was never really any good at C++. I was a C/C++ developer. And that is a pretty derogatory term. C & C++ share a lot of the same syntax and underlying assumption, but the moment you want to start writing non trivial stuff, they are quite different. And no, I am not talking about OO or templates.

I am talking about things that came out of that. In particular, throughout the leveldb codebase, they are very rarely, if at all, allocate memory directly. Pretty much the whole codebase rely on std::string to handle buffer allocations and management. This make sense, since RAII is still the watch ward for good C++ code. Being able to utilize std::string for memory management also means that the memory will be properly released without having to deal with it explicitly.

More interestingly, the leveldb codebase is also using std::string as a general buffer. I wonder why it is std::string vs. std::vector<char>, which would bet more reasonable, but I guess that this is because most of the time, users will want to pass strings as keys, and likely this is easier to manage, given the type of operations available on std::string (such as append).

It is actually quite fun to go over the codebase and discover those sort of things. Especially if I can figure them out on my own Smile .

This is quite interesting because from my point of view, buffers are a whole different set of problems. We don’t have to worry about the memory just going away in .NET (although we do have to worry about someone changing the buffer behind our backs), but we have to worry a lot about buffer size. This is because at some point (80Kb), buffers graduate to the large object heap, and stay there. Which means, in turn, that every time that you want to deal with buffers you have to take that into account, usually with a buffer pool.

Another aspect that is interesting with regards to memory usage is the explicit handling of copying. There are various places in the code where the copy constructor was made private, to avoid this. Or a comment is left about making a type copy-able intentionally. I get the reason why, because it is a common failing point in C++, but I forgot (although I am pretty sure that I used to know) the actual semantics of when/ how you want to do that in all cases.

0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB