Ayende @ Rahien

Refunds available at head office

An optimization story

I left work today very happy. There was a piece in the UI that was taking too long when run under with a real world data set. What is slow? Let us call it 40 seconds to start with. This is a pretty common operation in the UI, so that was a good place to optimize.

I wasn't there for that part, but optimizing the algorithms used reduced the time from 40 seconds to 5 - 10 seconds, and impressive amount by all accounts, but still one in which the users had to wait an appreciable amount for a common UI operation. Today we decided to tackle this issue, and see if we can optimize this further.

The root action is loading some data and executing a bit of business logic on top of that data. I checked the queries being generated, and while they weren't ideal, they weren't really bad (just not the way I would do things). At that point, we decided to isolate the issue in a test page, which would allow us to test just this function in isolation. Then, we implemented this from scratch, as plain data loading process.

The performance for that was simply amazing. 300 - 150 ms per operation, vs. 5 - 10 seconds in the optimized scenario. Obviously, however, we were comparing apples to oranges here. The real process also did a fair amount of business logic (and related data loading), which was the reason that it was slow. I looked at the requirement again, then at the queries, and despaired.

I hoped that I would be able to use a clever indexing scheme and get the 1000% perf benefit using some form of SQL. But the requirement simply cannot be expressed in SQL. And trying to duplicate the existing logic would only put us in the same position as before.

What to do... what to do...

The solution was quite simple, take the database out of the loop. For a performance critical piece of the application, we really can't afford to rely on external service (and the DB is considered one, in this scenario). I spent some time loading the data at application startup, as well as doing up front work on the data set to make it easier to work with.

This turned that operation into an O(1) operation, where O consists of a small set of in memory hash table lookups. And the performance? The performance story goes like this:

I go into the manager office, and ask him how fast he wants this piece of functionality to run. He hesitate for a moment and then says: "A second?".
I shake my head, "I can't do that, can you try again?"
"Two seconds?" He asked.
"I am sorry", I replied, "I can do five"
The I left the office and thrown over my shoulder, "oh, but it is in milliseconds".
Sometimes I have a rotten sense of humor, but the stunned silence that followed that declaration was a very pleasing.

I am lucky in that the data set is small enough to fit in memory. But I am not going to rely on that, we need to implement soft paging of the data anyway (to make the application startup time acceptable), so it will be able to handle that easily enough even when the data set that we are talking about will grow beyond the limits of memory (which I don't expect to happen in the next couple of years).

Overall, it was a very impressive optimization, even if I say so myself.

Comments

Andrey Shchekin
10/15/2008 07:11 AM by
Andrey Shchekin

Could you show a requirement that can not be expressed in SQL? This seems like an interesting challenge (I can think of several really cumbersome things, but only one inexpressible in a good way ).

Ayende Rahien
10/15/2008 07:18 AM by
Ayende Rahien

Imagine a tree structure that you need to query, now imagine that it is a sparse tree, and you need to do fixups for the tree

Virju
10/15/2008 07:35 AM by
Virju

I just saw your day 1 blog poster and i am looking at this now ... you're very much imporved in your writing skills .. especially in vocub. and narration .. keep it up dude .. hard work never fails .. great post

Albert Araalie
10/15/2008 12:02 PM by
Albert Araalie

Do you think it is advisable to try and implement a disk-output cache, where you read in data from the database and write it to a temporary files and then have subsequent calls access the data on the disk? I'm assuming the dataset returned is is pretty "large" for the asp.net cache or memory.

Just asking out...loud.....

Jon Skeet
10/15/2008 06:42 PM by
Jon Skeet

I had a similar-sounding situation. I had to make many lookups (which couldn't be predicted ahead of time) into a 100,000,000 row table. Fortunately, the nature of this table meant I was able to store it in 100MB of RAM. The performance difference was in the tens of thousands of percent, I think :)

Stephen
10/15/2008 06:49 PM by
Stephen

Albert, I actually have been working recently on implementing a good disk caching pattern for asp.net, not so much the database - but caching resulting html. The caching is not so hard, but its more about designing how dependencies can be wired up, and how they can survive over process restarts.. it lead me down the road of being able to generate what is basically an entity tag or hash, or the current state of the dependent services.. and reinforcing and invalidating this when accessing the cached outputs.. thats where I stopped however, I ended up becoming very busy and didn't find the time to architect a solution.

Caching is a really interest area in architecting solutions, as in this blog case, the results can be really rewarding when you make something a ton faster by intelligently handling data.

Scott White
10/15/2008 07:28 PM by
Scott White

Spring let's you use [Cache] attributes I would have used something similar here at my Service layer probably. I'm sure it could be implemented similarly with Castle.

Ayende Rahien
10/15/2008 09:02 PM by
Ayende Rahien

Scott,

Caching blindly is possible, but it isn't really appropriate for many scenarios.

For example, in this case, just the cost of making a single call to calculate the data for the first time is higher than the cost of loading the data and precalculating everything.

Ayende Rahien
10/15/2008 09:04 PM by
Ayende Rahien

Albert,

It really depends on the data size and what you want to do with it.

Memory is cheap, and you can put a lot into memory without feeling any sort of pain.

Implemeting disk based caching put you in a pretty bad position in terms of the cost of managing that.

Albert Araalie
10/15/2008 09:35 PM by
Albert Araalie

Memory is pretty easy to come by and the caching (into memory) process is straight forward. I guess disk based caching might be useful in cases where your web application needs to survive restarts or you need to share data across domains.

Rafal
10/16/2008 06:27 AM by
Rafal

Hello, calling this dialogue a 'sense of humour' is an abuse, but it's good you're aware of it ;)

I have found your blog yesterday and really admire the job you are doing with Boo and DSL-s. This is probably the missing piece I've been looking for and I'll be checking that. Currently I'm implementing my open source workflow engine for .Net - the 'NGinn' project at http://nginn.googlecode.com, and I need a script environment with good 'call interception' mechanism - that is, with IQuackFu. I have started with Petro Protsyk's 'Script.Net' language, which is quite useful and elastic, however the Boo seems to have it all and more.

I've got a question about the 'Brail' engine - can it be used outside the monorail framework? I would need such functionality for templating in various scenarios. How efficient is Brail?

Best regards and thanks for interesting read at your blog

Rafal

Ayende Rahien
10/16/2008 06:32 AM by
Ayende Rahien

Yes, it can be used outside of MonoRail

Mohamed Meligy
10/17/2008 09:30 AM by
Mohamed Meligy

Nice one, Oren,

I think MemCache, Velocity and such can help handle the memory caching for you in a nice way and can help in lookups if you store the data with labels and such.

Steve Smith
10/20/2008 08:50 PM by
Steve Smith

Caching is your friend. :)

Comments have been closed on this topic.