Ayende @ Rahien

It's a girl

A persistence problem, irony, your name is…

The major goal that I had in mind for the profiler was online development usage. That is, do something, check the profiler, clear it, do something else, etc. One of the things that I am finding out is that people use it a lot more as a dumping ground. They push a lot of information into it, and then want to sift through that and look at how the application behave, not just a single scenario.

Surprisingly, it works quite well, especially with the recently implemented performance profiling sessions that we just run through. One scenario, however, remains stubbornly outside what the profiler can currently do. When people talk to me about it, they call it load tests profiling, or integration tests profiling. This is when you pour literally gigabytes of information into the profiler. And it works, provided you have enough memory, that is.

If you don’t have enough memory, however, you get to say hello to OutOfMemoryException.

When I dove into this problem I was sure that I would simply find that there is something stupid that I am doing wrong, and that as soon as I’ll figure it out, it will be all right. I actually did find a few places where I could optimize memory usage (reducing lambda usage in favor of cached delegates to named methods, for example), but that only shaved a few percentage points. Trying out string interning actually resulted in a huge saving in memory, but I feel that this is just a stop gag measure. I have to persist the data to disk, rather than keep it in memory.

That lead me to a very interesting problem. What I need is basically a key value store. Interestingly enough, I already wrote one. The problem is that while this would work great right now, I have future plans which means depending on Esent is an… unwise choice. Basically, I would like to be able to run on Mono and/or Silverlight and that rules out using a Windows only / full trust native dll. As they say, a bummer. That requirement also rules out using the various embedded databases as well.

I considered ignoring this requirement and handling it when the times come, but I decided that since this is going to majorly effect how I am going to use it, I can’t really afford to delay that decision. With that in mind, I set out to figure out what I needed:

  • A fast way to store / retrieve a session information along with its associated data (stack trace, alerts, statistics, etc).
  • Ability to store, at a minimum, tens of thousands of items of variable size.
  • A single file (or at least, very few files) – cannot afford to have one item per file (it usually kills the FS).
  • Support updates without re-writing the entire file.
  • Usable from Mono & Silverlight, or easily portable to them.

With that in mind, I decided to take a look at what is already out there.

  • C#-Sqlite looked like it might be the ticker. It is a C# port of the Sqlite database. Unfortunately, I took a look at the code base and it is a port to C#, the code gave me the willies. I don’t feel that I can trust it, and at any rate, it would require me to write a lot of data access code, that is a thing that I am trying to avoid :-). (And no, you can’t use NHibernate with that version, you would have to port the ADO.Net driver as well, and then you wouldn’t be able to use it in Silverlight anyway.)
  • Caching Application Block – because it looked like it had a persistent solution already. That persistent solution is based on several files per item, which is not acceptable. I already tried that route in the past, it is a good way to kill your file system.
  • SilverDB – this is an interesting code base, and a good solution for the problem it is meant to (saving relatively small amount of information to disk). However, I need to save large amounts of information, and I need to handle a lot of updates. SilverDB re-write the entire file whenever it is saving. That has too high a perf cost for my needs.
  • TheCache – I took only a brief look here, but it looks that it is too heavily focused on being a cache to be useful for my purposes.

In fact, given my requirements, it might be interesting to see what I don’t need.

  • Not reliable.
  • Not thread safe.
  • Saving is just a way to free memory.

Given that, I decided to go with the following method:

  • Custom serialization format, allowing me to save space & time using file & memory based string interning.
  • No persistent file index, that can be kept directly in memory.
  • Persisted string interning file.

As you can see, this is a very tailored solution, not something that would be generally useful, but I have great hopes for this.

Comments

Dave
12/28/2009 11:23 AM by
Dave

Have you looked at FirebirdSQL yet? It's designed as an embedded database. FirebirdSql also provides a service/daemon for client/server access.

The DotnetFirebird project (a subproject from FirebirdSql, much like mysql connector project) comes with both a Windows and a Mono 1.1.x. It also supports the compact framework so you should be able to use it with Silverlight or on a Windows Mobile phone.

http://www.firebirdsql.org/
www.firebirdsql.org/dotnetfirebird/index.html

Ayende Rahien
12/28/2009 11:31 AM by
Ayende Rahien

Dave,

I tried it a while ago, yes. It had numerous problems that made me give it up.

Dave
12/28/2009 01:18 PM by
Dave

Thinking outside the box now, but why not build the ability to persist all information to a database with a session tag?

Such a database could be handled by NHibernate and it would give some interesting features.

Storing the information to a specified database would give the whole team analyzing capabilities. Another 'feature' would be collecting the profiling data during (automated) unit- and regression tests and dynamically assign a session tag (version + date comes into mind).

Later a team member could 'open' that session and compare it with a earlier run.

Were talking about an ORM profiling tool, so specifying a database server (mssql, oracle, mysql, etc) shouldn't be a problem for most users. Even when profiling an application with an embedded database it's still posible to store the profiling data into a remote database. Visual Studio comes by default with a mssql express installation.

I'm sure if I give it some more though I can come up with some more usages.

Scott White
12/28/2009 04:15 PM by
Scott White

what if you redesign it to be an NT service so that you can disconnect the UI from the base functionality. Then you could put a website, silverlight or whatever in front of the database that the service is actually using.

Ayende Rahien
12/28/2009 05:16 PM by
Ayende Rahien

Scott,

That would make me lose something extremely valuable, the xcopy experience, and it would SERIOUSLY complicate my life.

John
12/28/2009 06:34 PM by
John

protobuf?

Patrik Hägne
12/28/2009 08:49 PM by
Patrik Hägne

I was about to ask the same question as Jenser, have you looked at Db4o? It would be interesting to see if it would cut it. I've never tried it myself but I'd like to get around to it some day.

Felix
12/29/2009 10:21 AM by
Felix

Isn't the regular SQLite fast enought ? BTW, it would be nice to have a pure managed SQLite version...

Judah Himango
12/29/2009 11:43 PM by
Judah Himango

I'm with Jenser and Patrik, you ought to give DB4O a shot.

Alex Yakunin
12/30/2009 10:13 PM by
Alex Yakunin

It's probable you'll come to the solution we work on (full-featured integrated database):

  • Key-value store = no range queries. But in your case there is at least one obvious range query kind: time range queries.

  • Sessions can be very long in case with other ORMs, so it's possible that it won't be enough to store session data as value part.

Just FYI: our original problem was looking very similar. I thought it would be nice to have a simple key-value pair storage for our local databases.

Alex Yakunin
12/30/2009 10:17 PM by
Alex Yakunin

Btw, if simple custom storage is really enough, it's definitely better to use this option - at least, because of flexibility.

E.g. if range queries are actually needed, but they can be emulated with sequential key processing (e.g. if key is minute number or something like this), key-value store seems a good option.

Ayende Rahien
12/31/2009 06:45 AM by
Ayende Rahien

Alex,

I have no range queries, so K/V store is perfect. There is no size limitation for the value

Comments have been closed on this topic.