An optimization story

time to read 3 min | 597 words

I left work today very happy. There was a piece in the UI that was taking too long when run under with a real world data set. What is slow? Let us call it 40 seconds to start with. This is a pretty common operation in the UI, so that was a good place to optimize.

I wasn't there for that part, but optimizing the algorithms used reduced the time from 40 seconds to 5 - 10 seconds, and impressive amount by all accounts, but still one in which the users had to wait an appreciable amount for a common UI operation. Today we decided to tackle this issue, and see if we can optimize this further.

The root action is loading some data and executing a bit of business logic on top of that data. I checked the queries being generated, and while they weren't ideal, they weren't really bad (just not the way I would do things). At that point, we decided to isolate the issue in a test page, which would allow us to test just this function in isolation. Then, we implemented this from scratch, as plain data loading process.

The performance for that was simply amazing. 300 - 150 ms per operation, vs. 5 - 10 seconds in the optimized scenario. Obviously, however, we were comparing apples to oranges here. The real process also did a fair amount of business logic (and related data loading), which was the reason that it was slow. I looked at the requirement again, then at the queries, and despaired.

I hoped that I would be able to use a clever indexing scheme and get the 1000% perf benefit using some form of SQL. But the requirement simply cannot be expressed in SQL. And trying to duplicate the existing logic would only put us in the same position as before.

What to do... what to do...

The solution was quite simple, take the database out of the loop. For a performance critical piece of the application, we really can't afford to rely on external service (and the DB is considered one, in this scenario). I spent some time loading the data at application startup, as well as doing up front work on the data set to make it easier to work with.

This turned that operation into an O(1) operation, where O consists of a small set of in memory hash table lookups. And the performance? The performance story goes like this:

I go into the manager office, and ask him how fast he wants this piece of functionality to run. He hesitate for a moment and then says: "A second?".
I shake my head, "I can't do that, can you try again?"
"Two seconds?" He asked.
"I am sorry", I replied, "I can do five"
The I left the office and thrown over my shoulder, "oh, but it is in milliseconds".
Sometimes I have a rotten sense of humor, but the stunned silence that followed that declaration was a very pleasing.

I am lucky in that the data set is small enough to fit in memory. But I am not going to rely on that, we need to implement soft paging of the data anyway (to make the application startup time acceptable), so it will be able to handle that easily enough even when the data set that we are talking about will grow beyond the limits of memory (which I don't expect to happen in the next couple of years).

Overall, it was a very impressive optimization, even if I say so myself.

Tweet Share Share 14 comments

Tags:

Performance

Comments

15 Oct 2008
07:11 AM

Andrey Shchekin

Could you show a requirement that can not be expressed in SQL? This seems like an interesting challenge (I can think of several really cumbersome things, but only one inexpressible in a good way ).

15 Oct 2008
07:18 AM

Ayende Rahien

Imagine a tree structure that you need to query, now imagine that it is a sparse tree, and you need to do fixups for the tree

15 Oct 2008
07:35 AM

Virju

I just saw your day 1 blog poster and i am looking at this now ... you're very much imporved in your writing skills .. especially in vocub. and narration .. keep it up dude .. hard work never fails .. great post

15 Oct 2008
12:02 PM

Albert Araalie

Do you think it is advisable to try and implement a disk-output cache, where you read in data from the database and write it to a temporary files and then have subsequent calls access the data on the disk? I'm assuming the dataset returned is is pretty "large" for the asp.net cache or memory.

Just asking out...loud.....

15 Oct 2008
18:42 PM

Jon Skeet

I had a similar-sounding situation. I had to make many lookups (which couldn't be predicted ahead of time) into a 100,000,000 row table. Fortunately, the nature of this table meant I was able to store it in 100MB of RAM. The performance difference was in the tens of thousands of percent, I think :)

15 Oct 2008
18:49 PM

Stephen

Albert, I actually have been working recently on implementing a good disk caching pattern for asp.net, not so much the database - but caching resulting html. The caching is not so hard, but its more about designing how dependencies can be wired up, and how they can survive over process restarts.. it lead me down the road of being able to generate what is basically an entity tag or hash, or the current state of the dependent services.. and reinforcing and invalidating this when accessing the cached outputs.. thats where I stopped however, I ended up becoming very busy and didn't find the time to architect a solution.

Caching is a really interest area in architecting solutions, as in this blog case, the results can be really rewarding when you make something a ton faster by intelligently handling data.

15 Oct 2008
19:28 PM

Scott White

Spring let's you use [Cache] attributes I would have used something similar here at my Service layer probably. I'm sure it could be implemented similarly with Castle.

15 Oct 2008
21:02 PM

Ayende Rahien

Scott,

Caching blindly is possible, but it isn't really appropriate for many scenarios.

For example, in this case, just the cost of making a single call to calculate the data for the first time is higher than the cost of loading the data and precalculating everything.

15 Oct 2008
21:04 PM

Ayende Rahien

Albert,

It really depends on the data size and what you want to do with it.

Memory is cheap, and you can put a lot into memory without feeling any sort of pain.

Implemeting disk based caching put you in a pretty bad position in terms of the cost of managing that.

15 Oct 2008
21:35 PM

Albert Araalie

Memory is pretty easy to come by and the caching (into memory) process is straight forward. I guess disk based caching might be useful in cases where your web application needs to survive restarts or you need to share data across domains.

16 Oct 2008
06:27 AM

Rafal

Hello, calling this dialogue a 'sense of humour' is an abuse, but it's good you're aware of it ;)

I have found your blog yesterday and really admire the job you are doing with Boo and DSL-s. This is probably the missing piece I've been looking for and I'll be checking that. Currently I'm implementing my open source workflow engine for .Net - the 'NGinn' project at
http://nginn.googlecode.com, and I need a script environment with good 'call interception' mechanism - that is, with IQuackFu. I have started with Petro Protsyk's 'Script.Net' language, which is quite useful and elastic, however the Boo seems to have it all and more.

I've got a question about the 'Brail' engine - can it be used outside the monorail framework? I would need such functionality for templating in various scenarios. How efficient is Brail?

Best regards and thanks for interesting read at your blog

Rafal

16 Oct 2008
06:32 AM

Ayende Rahien

Yes, it can be used outside of MonoRail

17 Oct 2008
09:30 AM

Mohamed Meligy

Nice one, Oren,

I think MemCache, Velocity and such can help handle the memory caching for you in a nice way and can help in lookups if you store the data with labels and such.

20 Oct 2008
20:50 PM

Steve Smith

Caching is your friend. :)

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB