A persistence problem, irony, your name is…

time to read 5 min | 902 words

The major goal that I had in mind for the profiler was online development usage. That is, do something, check the profiler, clear it, do something else, etc. One of the things that I am finding out is that people use it a lot more as a dumping ground. They push a lot of information into it, and then want to sift through that and look at how the application behave, not just a single scenario.

Surprisingly, it works quite well, especially with the recently implemented performance profiling sessions that we just run through. One scenario, however, remains stubbornly outside what the profiler can currently do. When people talk to me about it, they call it load tests profiling, or integration tests profiling. This is when you pour literally gigabytes of information into the profiler. And it works, provided you have enough memory, that is.

If you don’t have enough memory, however, you get to say hello to OutOfMemoryException.

When I dove into this problem I was sure that I would simply find that there is something stupid that I am doing wrong, and that as soon as I’ll figure it out, it will be all right. I actually did find a few places where I could optimize memory usage (reducing lambda usage in favor of cached delegates to named methods, for example), but that only shaved a few percentage points. Trying out string interning actually resulted in a huge saving in memory, but I feel that this is just a stop gag measure. I have to persist the data to disk, rather than keep it in memory.

That lead me to a very interesting problem. What I need is basically a key value store. Interestingly enough, I already wrote one. The problem is that while this would work great right now, I have future plans which means depending on Esent is an… unwise choice. Basically, I would like to be able to run on Mono and/or Silverlight and that rules out using a Windows only / full trust native dll. As they say, a bummer. That requirement also rules out using the various embedded databases as well.

I considered ignoring this requirement and handling it when the times come, but I decided that since this is going to majorly effect how I am going to use it, I can’t really afford to delay that decision. With that in mind, I set out to figure out what I needed:

  • A fast way to store / retrieve a session information along with its associated data (stack trace, alerts, statistics, etc).
  • Ability to store, at a minimum, tens of thousands of items of variable size.
  • A single file (or at least, very few files) – cannot afford to have one item per file (it usually kills the FS).
  • Support updates without re-writing the entire file.
  • Usable from Mono & Silverlight, or easily portable to them.

With that in mind, I decided to take a look at what is already out there.

  • C#-Sqlite looked like it might be the ticker. It is a C# port of the Sqlite database. Unfortunately, I took a look at the code base and it is a port to C#, the code gave me the willies. I don’t feel that I can trust it, and at any rate, it would require me to write a lot of data access code, that is a thing that I am trying to avoid :-). (And no, you can’t use NHibernate with that version, you would have to port the ADO.Net driver as well, and then you wouldn’t be able to use it in Silverlight anyway.)
  • Caching Application Block – because it looked like it had a persistent solution already. That persistent solution is based on several files per item, which is not acceptable. I already tried that route in the past, it is a good way to kill your file system.
  • SilverDB – this is an interesting code base, and a good solution for the problem it is meant to (saving relatively small amount of information to disk). However, I need to save large amounts of information, and I need to handle a lot of updates. SilverDB re-write the entire file whenever it is saving. That has too high a perf cost for my needs.
  • TheCache – I took only a brief look here, but it looks that it is too heavily focused on being a cache to be useful for my purposes.

In fact, given my requirements, it might be interesting to see what I don’t need.

  • Not reliable.
  • Not thread safe.
  • Saving is just a way to free memory.

Given that, I decided to go with the following method:

  • Custom serialization format, allowing me to save space & time using file & memory based string interning.
  • No persistent file index, that can be kept directly in memory.
  • Persisted string interning file.

As you can see, this is a very tailored solution, not something that would be generally useful, but I have great hopes for this.