Performance
Performance optimizations, managed code and leaky abstractions
I run into this post from Jeff Atwood, talking about the performance difference between managed and unmanaged code: There were a lot of optimizations for this along the way, but the C++ version has soundly beaten the C# version. As expected, right? Well, yes, but with extenuating circumstances. So am I ashamed by my crushing defeat? Hardly. The managed code achieved a very good result for hardly any effort. To defeat the managed version, Raymond had to: Write his own file/io...
Why all the performance posts? The shocking truth!
I was quite amazed by the number of conspiracy theories that were brought up by this post. Some of them in the comments, some of them in private communications. The reason, the real & only one, that I had so many posts lately about performance is quite simple. I did a lot of that recently, and one aspect of perf testing that I didn’t talk about is that most perf test run takes a long time, that means that I had a lot of free time. Free time for me usually translate into posting time :-)
Patterns for reducing memory usage
Memory problems happen when you application use more memory that you would like. It isn’t necessarily paging or causing OutOfMemory, but it is using enough memory to generate complaints. The most common cases for memory issues are: Memory leaks Garbage spewers In memory nuts Framework bugs Let me take each of them in turn. Memory leaks in a managed language are almost always related to dangling references, such as in a cache with no expiration or events where you never unsubscribe. Those are usually...
Micro optimization decision process
There are some parts of our codebase that are simply going to have to be called a large number of times. Those are the ones that we want to optimize, but at the same time, unless they are ridiculously inefficient, there isn’t that much room for improvement. Let us look at this for a second: The numbers are pretty hard to read in this manner, so I generally translate it to the following table: Method name ...
Memory obesity and the curse of the string
I believe that I have mentioned that my major problem with the memory usage in the profiler is with strings. The profiler is doing a lot with strings, queries, stack traces, log messages, etc are all creating quite a lot of strings that the profiler needs to inspect, analyze and finally produce the final output. Internally, the process looks like this: On my previous post, I talked about the two major changes that I made so far to reduce memory usage, you can see them below. I introduced string interning in the parsing stage and serialized...
When mini benchmarks are important
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Donald Knuth I have expressed my dislike for micro benchmarks in the past, and in general, I still have this attitude, but sometimes, you really care. A small note, while a lot of namespaces you are going to see are Google.ProtocolBuffers, this represent my private fork of this library that was customized to fit UberProf’s needs. Some of those things aren’t generally applicable (like string interning at the...
Fighting the profiler memory obesity
When I started looking into persisting profiler objects to disk, I had several factors that I had to take into account: Speed in serializing / deserializing. Ability to intervene in the serialization process at a deep level. Size (also effect speed). The first two are pretty obvious, but the third requires some explanation. The issue is, quite simply, that I can apply some strategies to significantly reduce both speed & size of serialization by making sure that the serialization pipeline knows exactly what is going...
The operation was successful, but the patient is still dead… deferring the obvious doesn’t work
So, I have a problem with the profiler. At the root of things, the profiler is managing a bunch of strings (SQL statements, stack traces, alerts, etc). When you start pouring large amount of information into the profiler, the number of strings that it is going to keep in memory is going to increase, until you get to say hello to OutOfMemoryException. During my attempt to resolve this issue, I figured out that string interning was likely to be the most efficient way to resolve my problem. After all, most of the strings that I have to display are...
UberProf performance improvements, nothing helps if you are stupid
The following change took a while to figure out, but it was a huge performance benefit (think, 5 orders of magnitude). The code started as: private readonly Regex startOfParametersSection =
new Regex(@"(;\s*)[@:?]p0 =", RegexOptions.Compiled);
UberProf performance improvements, beware of linq query evaluation
This is a diff from the performance improvement effort of UberProf. The simple addition of .ToList() has significantly improved the performance of this function: Why? Before adding the ToList(), each time we try to run our aggregation functions on the statements enumerable, we would force re-evaluation of the filtering (which can be quite expensive). By adding ToList() I am now making the filtering run only once. There is another pretty obvious performance optimization that can be done here, can you see it? And why did I choose not to implement it?
UberProf performance improvements, or: when O(N^3 + N) is not fast enough
While working on improving the performance of the profiler, I got really annoyed. The UI times for updating sessions with large amount of statements was simply unacceptable. We already virtualized the list UI, so it couldn’t be that. I started tracking it down, and it finally came down to the following piece of code: protected override void Update(IEnumerable<IStatementSnapshot> snapshots)
{
foreach (var statementSnapshot in snapshots)
{
var found = (from model in unfilteredStatements
...
UberProf performance improvements
I mentioned before that I run into some performance problems, I thought it would be interesting to see how I solved them. The underlying theme was finding O(N) operations are eradicating them. In almost all the cases, the code is more complex, but significantly faster Old code New code public static class...
UberProf performance challenges
One of the things that the profiler is supposed to do is to handle large amount of data, so I was pretty bummed to hear that users are running into issues when they try to use the profiler in high load scenarios, such as load tests. Luckily, one of the things that the profiler can do is output to a file, which let me simulate what is going on on the customer site very easily. The first performance problem is that the profiler isn’t processing things fast enough: The second is that there are...
Transactional queuing system perf test
After running into sever performance issues when using MSMQ and transactions, I decided to run a more thorough test case. Writing 10,000 messages to MSMQ Transactional Queue, separate transactions: private static void AddData(MessageQueue q1, byte[] bytes)
{
Console.WriteLine("{0:#,#}", bytes.Length);
var sp = Stopwatch.StartNew();
for (int i = 0; i < 10000; i++)
{
using (var msmqTx = new MessageQueueTransaction())
{
...
Asynchronous order processing
One of the more common challenges that I run into when discussing the notion of async as the main communication mechanism is that there seems to be an entrenched belief that things should be synchronous. It appear to make things simpler, from a conceptual level, while making them significantly more difficult to actually implement them in a production worthy way. Arguably the most common issue that I hear about is with downloadable materials, and it can be summed up as some variation of: What do you mean we aren’t going to just start downloading stuff...
Analyzing a performance problem – Is a prisoner dangerous?
Recently I run into a performance problem in an application that I was reviewing, and I thought that it would make a great post. Since I can’t use the actual application model, I decided that I am tired of using the same old online shop model and turned to the one domain in which I am a domain expert. Prisons. Let us imagine that this is part of the Prison 10’s* Dashboard. It looks pretty simple, right? Let us talk about this as SQL, ignoring all layers in the middle. We can express this as:...
Out of process session state vs. explicit state management
The question came up in the alt.net mailing list, and I started replying there, before deciding that it would make a great post. I actually want to talk specifically about the notion that came of avoiding explicit state management in favor of out of process session state (on top of a database or memcached). The problem is that those two are not interchangeable. Using a session make it easy to preserve the illusion of statefullness on the web, but it is an illusion. In general, you have to give me pretty good reasons before I will go with an...
An allegory of an optimization story
I got a call today from a team mate about a piece of software that we wrote. We initially planned it to work with data sizes of hundreds to thousands of records, and considering that we wrote that one in a day, I considered it a great success. We didn’t pay one bit of attention to optimization, but the perf was great for what we needed. However, a new requirement came up which require us to handle hundred thousand records, and our software was working… well under the new constraints. As a matter of fact, it would take about...
How dotTrace make an application REALLY fast
I have to admit, I was sure that using Thread.Sleep(negativeInt) was an April’s Fools joke, but It looks like some part of the framework are already taking advantage of this: I must say, this make performance profiling somewhat of a challenge, since you don’t generally expect to have to improve performance by making more calls.
It depends on what you are optimizing for…
Today I was writing this code: public class FakeRandomValueGenerator : IRandomValueGenerator
{
private readonly int valueToReturn;
public FakeRandomValueGenerator(int valueToReturn)
{
this.valueToReturn = valueToReturn;
}
public int Next(int min, int max)
{
return valueToReturn;
}
}
This caused some concern to my pair, who asked me why I was hand rolling a stub instead of using a mocking...
Schema-less databases
This post about how Friend Feed is using schema-less storage for most of their work is fascinating. In the ALT.Net Seattle there was a session about that, which generated a lot of interest. My next post will have more details about the actual implementation details of doing something like that in a manner easily accessible in .Net, but just reading the post is very interesting. Another item that I found that was an interesting read, although it is far harder to read is: http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale
Reducing the cost of getting a stack trace
I am trying to find ways to reduce the cost of the stack trace used in NH Prof. The access to the stack trace is extremely valuable, but there is a significant cost of using it, so we need a better way of handling this. I decided to run a couple of experiment running this. All experiments were run 5,000 times, on a stack trace of 7 levels. new StackTrace(true) - ~600ms new StackTrace(false) - ~150ms So right there, we have a huge cost saving, but let us continue...
NH Prof: Performance implications for the profiled application
I got a couple of complaints about this, and I think that this is a really important issue. Using NH Prof slow down the profiled application. That is to be expected, since we require the application to do more work, but it is not as good as we can hope it would be. Here is the test scenario, running against the Northwind database (thanks Davy for supplying it): class Program
{
static void Main(string[] args)
{
var sp = Stopwatch.StartNew();
HibernatingRhinos.NHibernate.Profiler.Appender.NHibernateProfiler.Initialize();
ISessionFactory sessionFactory = CreateSessionFactory();
using (var session = sessionFactory.OpenSession())
{
var products = session.CreateCriteria(typeof(Product))
.SetCacheable(true).List<Product>();
}
using (var session = sessionFactory.OpenSession())
{
var products = session.CreateCriteria(typeof(Product))
.SetCacheable(true).List<Product>();
}
using (var session = sessionFactory.OpenSession())
{
var products =...
A WCF Perf Mystery
Anyone can tell me why this is taking a tad over 11 seconds? 1: class Program
2: {
3: static void Main(string[] args)
4: {
5: try
...
NH Prof: Getting to zero friction
Here is a new (passing) integration test in the NHibernate Profiler:
Just to give you an idea, here is the implementation of this rule, which has to be one of the more complex ones that we have at the moment, because the data for it comes from several sources, so we need to actually execute the logic in an event, instead of directly.
public class TooManyRowsReturnedPerQuery : IStatementProcessor{ private ProfilerConfiguration configuration; public TooManyRowsReturnedPerQuery(ProfilerConfiguration configuration) { this.configuration = configuration; } ...
NH Prof: A moment in time
I just had a major moment in using NH Prof.
I run into a problem with NHibernate and was able to use the NH Profiler in order to figure out what the problem was.
Wow!
NHProf: What is the role of the DBA?
I am now discussing what sort of reports we want to give the DBA from the NHibernate Profiler. My first thought was just to give the DBA a list of the statements that the application has executed and the number of times they were repeated. That should allow him to get enough information to use his own tools to optimize the application physical data structure.
What do you think? Is this a good scenario?
What other scenarios can you see for NH Prof in the hands of the DBA?
NHProf: The stack is not as simple as you wish it to be
One of the nicest features that NH Prof has to offer is this, allowing you to go from the query issued to the database directly to the line of code that caused this query to be generated.
A few days ago I posted that you can either build something in one day, or in three months, but nothing much in the middle. The proof of concept that convinced me that I can build NH Prof was written during a single evening, along with two pints of Guinness. The overall concept that I have now is drastically different, but it is...
NH Prof: Configuration Story
I think that I mentioned that NHibernate Profiler is working mostly by doing some smarts on top of the log output from NHibernate. That is not exactly the case, but that is close enough. The problem with working through the logs is that there are roughly 30 lines of XML that you need to deal with in order to manage this properly.
The first time I sent this to anyone else, he run into problems with the configuration because of very subtle issues. For a while now, I had a ticket saying that I need to document what the failure...
Setting expectations straight
I am currently working on getting a beta version of NH Prof out, but I run into a problem. There are several features that I intend to put into the release version that I didn't have the time to actually put it. Those are usually features that are good to have, but not necessarily important for the actual function of the tool. One of them is saving & loading the captured data. Currently, I am working on more important things than dealing with this, so I didn't implement this. However, I do want to make it clear that it...
NH Prof: A testing story
Remember that I mentioned the difference about working and production quality?
One of the things that separate the two that in production quality software, you don't need to know which buttons not to push. Here is a simple example. For a while now, if you tried to bring up two instances of NH Prof, the second one would crash. That wasn't something that you really want to show the users. Today I got back to doing NH Prof stuff, getting it ready for public beta, and I decided that the first thing to do was to tackle this easy feature.
Doing...
NH Prof Deep Dive: The Integration Test Architecture
I am getting a lot of requests to explore the actual innards of the NH Prof. I find it surprising, because I didn't think that people would actually be interested in that aspect of the tool.
But since interest was expressed, I'll do my best to satisfy the curiosity.The first topic to discuss is the integration test architecture. One of the things that the profiler is doing is to capture data from a remote process, and I wanted my integration tests to be able to test that scenario, which exposes me to things like synchronization issues, cross process communication and (not...
An NH Prof Bug Story: Why integration is tricky
Today I found out that NH Prof doesn't work with ASP.Net MVC applications. If you would have asked me, I would have sworn any oath you care to name that it would. And even after seeing the problem with my own eyes, it too me a while to track it down.
To make a long story short. Somewhere deep in the bowels of NH Prof, I made the assumption that a method is always contained in a type. I am pretty familiar with the way that the CLR works, and it seemed like a pretty reasonable assumption to make. In fact,...
NH Prof: A guided tour
NH Prof has reached the level in which I can actually talk about the features that it has in more than abstract terms. There is still a big feature area that I want to cover (which should be a nice surprise), but the basics are there, and today I had ample proof that it is maturing just nicely. I was able to deal with quite a few of the remaining tasks by applying check listing. Basically, to do X, I had to do A,B & C. Trivially simple, and quite satisfying.
Test coverage went back up to over 90% on...
NH Prof: Teaser
If you want to learn more, come to my Advanced NHibernate talk tomorrow.
This time, this is literally a snapshot of the application as it is running, and it is showing most of the surface level functionality that exists at the moment in the application.
Oh, and all the kudos for the look and feel goes to Christopher and Rob, who make it looks so easy.
NH Prof: How to detect SELECT N + 1
One of the things that the NHibernate Profiler is going to do is to inspect your NHibernate usage and suggest improvements to them.
Since I consider this to be a pretty important capability, I wanted to stream line the process as much as possible.
Here is how I detect this now:
It is not perfect, but it is pretty close.
A messaging problem: Order in the bus
In NH Prof, I have structured the application around the idea of message passing. I am not yet in the erlang world (which requires a framework that would keep hold of the state), but I am nearer than before.
The back end of the profiler is listening to the event streams generated by NHibernate. We get things like:
Session opened on thread #1
SQL Executed on thread #1
SELECT * FROM Customers
...
NHProf: Logging interception
One of the goals that I set for myself with the NHibernate Profiler is to be able to run on unmodified NHibernate 2.0. The way that I do that is by intercepting and parsing the log stream from NHibernate.
NHibernate logging is extremely rich and detailed, so anything I wanted to do so far was possible. I am pretty sure that there would come a time when a feature would require more invasive approaches, running profiler code in the client application to gather more information, but for now this is enough.
I did run into several problems with logging interception. Ideally,...
When select IS broken (or just slow)
Usually, "select" isn't broken is a good motto to follow. Occasionally, there are cases where this is case. In particular, it may not be that it is broken, it may very well be that the way it works doesn't match the things that we need it to do.
I spoke about an optimization story that happened recently, in which we managed to reduce the average time from 5 - 10 seconds to 5 - 15 milliseconds.
What we needed was to walk a tree structure, which was stored in a database, and do various interesting tree based operations on it. The most...
An optimization story
I left work today very happy. There was a piece in the UI that was taking too long when run under with a real world data set. What is slow? Let us call it 40 seconds to start with. This is a pretty common operation in the UI, so that was a good place to optimize.
I wasn't there for that part, but optimizing the algorithms used reduced the time from 40 seconds to 5 - 10 seconds, and impressive amount by all accounts, but still one in which the users had to wait an appreciable amount for a common UI...
String processing is costly, but stupidity is more costly still
Damn, this is annoying: And then I find out why: Stupid of me, really stupid of me.
Stupid Micro Benchmarking: Proxy Performance
Let us take a look at this class: public class Trivial
{
public void EmptyStandard()
{
}
public virtual void EmptyVirtual()
{
}
[MethodImpl(MethodImplOptions.NoInlining)]
public void EmptyNoInline()
{
}
[MethodImpl(MethodImplOptions.NoInlining)]
public virtual void EmptyVirtualNoInline()
{
}
}
Now let us see what is the effect of using Dynamic Proxy on performance, here is the test rig:
...
Patterns for using Distributed Hash Tables: Conclusion
Well, it looks like I finally had completed all I wanted to say about DHTs. I can now go back to talking about multi tenancy :-) The previous ones are: Distributed in memory cache / storage Patterns for using distributed hash tables: Groups Patterns for using distributed hash tables: Locking Patterns...
Patterns for Distributed Hash Tables: Range Queries
Next to last, in this series on using Distributed Hash Tables (DHT)! The previous ones are: Distributed in memory cache / storage Patterns for using distributed hash tables: Groups Patterns for using distributed hash tables: Locking Patterns for Distributed Hash Tables: Locality ...
Patterns for Distributed Hash Tables: Lookup by property
Will this series on using Distributed Hash Tables (DHT) ever end? The previous ones are: Distributed in memory cache / storage Patterns for using distributed hash tables: Groups Patterns for using distributed hash tables: Locking Patterns for Distributed Hash Tables: Locality ...
Patterns for Distributed Hash Tables: Cheap Cross Item Transactions
And yet another post in my series on using Distributed Hash Tables (DHT). The previous ones are: Distributed in memory cache / storage Patterns for using distributed hash tables: Groups Patterns for using distributed hash tables: Locking Patterns for Distributed Hash Tables: Locality ...
Patterns for Distributed Hash Tables: Item Groups
And yet another post in my series on using Distributed Hash Tables (DHT). The previous ones are: Distributed in memory cache / storage Patterns for using distributed hash tables: Groups Patterns for using distributed hash tables: Locking Patterns for Distributed Hash Tables: Locality Right now I want...
Patterns for Distributed Hash Tables: Locality
Here is another post in my series on using Distributed Hash Tables (DHT). The previous ones are: Distributed in memory cache / storage Patterns for using distributed hash tables: Groups Patterns for using distributed hash tables: Locking Now I want to talk about locality, and why it is important. First, the idea of locality is very simple. Put related items together, so...
Not all bytes weight exactly 8 bits
Or, pay attention to how you write to the disk. Here is a simple example: static void Main(string[] args)
{
var count = 10000000;
Stopwatch stopwatch = Stopwatch.StartNew();
using (var stream = CreateWriter())
using (var bw = new BinaryWriter(stream))
{
for (var i = 0; i < count; i++)
{
bw.Write(i);
}
bw.Flush();
}
stopwatch.Stop();
Console.WriteLine("Binary Writer: " + stopwatch.ElapsedMilliseconds);
stopwatch = Stopwatch.StartNew();
using (var stream = CreateWriter())
{
for (var i = 0; i < count; i++)
{
var bytes = BitConverter.GetBytes(i);
stream.Write(bytes, 0, 4);
}
stream.Flush();
}
stopwatch.Stop();
Console.WriteLine("BitConverter: " + stopwatch.ElapsedMilliseconds);
stopwatch = Stopwatch.StartNew();
using (var stream = CreateWriter())
using (var ms = new MemoryStream())
{
for (var i = 0; i < count; i++)
{
var bytes = BitConverter.GetBytes(i);
ms.Write(bytes, 0, 4);
}
var array = ms.ToArray();
stream.Write(array, 0,...
Some observations on saving to file
This post from Dare Obasanjo has led me to some very interesting reading. Broadly, the question is what we should optimize when thinking about disk access. Traditional thinking would say that we want to write as little as possible, because the disk is slow. But it turn out that this is not quite accurate in the way I used to think about it. The problem is not that the disk is slow, but that seek time is slow. Let us consider the following problem: Given a file with N integers in it, with values from...
Patterns for using distributed hash tables: Locking
Here is another post in my series on using Distributed Hash Tables (DHT). The previous ones are: Distributed in memory cache / storage Patterns for using distributed hash tables: Groups In this post, I would like to handle locking. As I mentioned in the previous post in the series, updating a single item safely can be done using optimistic concurrency techniques. Updating more than a single item is... harder. Let...
Patterns for using distributed hash tables: Groups
Yesterday's post called them distributed in memory cache / storage, but I was reminded that the proper term for what I am talking about is distributed hash tables (DHT). I presented the problem of dealing with DHT in this post, mainly, the fact that we have only key based access and no way to compose several actions into a single transaction. I'll let you go read that post for all the gory details, and continue on with some useful patterns for dealing with this issue. As a reminder, here is the API that we have: ...
Distributed in memory cache / storage
Let us start by defining the difference between cache and storage. A cache may decide to evict items whenever it feels like, a storage will only do so at well defined points. As a simple example, this is legal for a cache: PUT "foo", "item data"
result = GET "foo"
assert result is null
The cache contract means that it "might" preserve values, or it might not, it is the cache choice. A storage contract make the above code illegal. It may never happen. Most cache solutions have a way to specify priorities, including the "do not...
NMemcached: A WCF experiment
While doing a code review of NMemcached it started to bother me just how much of the application was infrastructure and argument parsing code. It shouldn't be this way. So I decided to port the whole thing to WCF and see how it works. Porting it wasn't hard, and significantly reduced the amount of code in the application. After updating all the tests and verifying that I fixed all the things I broke, I built an appropriate memcache client and started perf testing again. As a reminder, native Memcached server managed to field 10,000 reads...
Writing unreliable software
This had surprised me, to say the least. I run into a bug during stress testing for SvnBridge, after a while, it would simply get stuck. I am not the best in figuring out exactly what got an application stuck, but I generally manage to put some effort before I call the big guns. This time, I had managed to get a working theory, and prove that that the symptoms that I am seeing are consistent with my theory. I decided to dig into it a bit, and came up with interesting results. The jury...
The cost of abstraction - Part III
Here is another thing to note, computers are fast. I can't tell you how fast because it would take too long. Thinking about micro performance is a losing proposition. Sasha has asked an important question: Now assume you have a non-functional requirement saying that you must support 1,000,000 messages per second inserted into this queue. Would you still disregard the fact using an interface is a BAD decision? My answer, yes. The reason for that? Let us take a look at the slowest method I could think of to do in process queue: ...
The cost of abstraction - Part II
Sasha pointed out that I should also test what happens when you are using multiply implementation of the interface, vs. direct calls. This is important because of JIT optimizations with regards to interface calls that always resolve to the same instance. Here is the code: class Program
{
public static void Main(string[] args)
{
//warm up
PerformOp(new List<string>(101));
PerformOp(new NullList());
List<long> times = new List<long>();
Stopwatch startNew = new Stopwatch();
for (int i = 0; i < 50; i++)
{
startNew.Start();
PerformOp(new List<string>(101));
PerformOp(new NullList());
times.Add(startNew.ElapsedMilliseconds);
startNew.Reset();
}
Console.WriteLine(times.Average());
}
private static void PerformOp(List<string> strings)
{
for (int i = 0; i < 100000000; i++)
{
strings.Add("item");
if(strings.Count>100)
strings.Clear();
}
}
private static void PerformOp(NullList strings)
{
for (int i = 0; i < 100000000;...
The cost of abstraction
Sasha commented on my perf post, and he mentioned the following: But what if you're designing a messaging infrastructure for intra-application communication? It is no longer fine to hide it behind a mockable IQueue interface because that interface call is more expensive than the enqueue operation! Now, I know that an interface call is more expensive than a non virtual call. The machine needs to do more work. But exactly how much more work is there to do? I decided to find out. class Program
{
public static void Main(string[] args)
{
//warm up
PerformOp(new...
Full speed ahead, and damn the benchmarks
A while ago I posted about upfront optimizations. Sasha Goldshtein commented (after an interesting email conversation that we had): What truly boggles me is how the argument for correctness in software is not applied to performance in software. I don't understand how someone can write unit tests for their yet-unwritten code (as TDD teaches us) and disregard its performance implications, at the same time. No one in their right mind could possibly say to you, "Let's define and test for correctness later, first I'd like to write some code without thinking about functional...
Performance - The affect of reducing remote calls
I just got 3000% performance improvement. I got if by turning this: public int GetLatestVersion()
{
return SourceControlService.GetLatestChangeset(serverUrl, credentials);
}
To this:
public int GetLatestVersion()
{
const string latestVersion = "Repository.Latest.Version";
if (PerRequest.Items[latestVersion] != null)
return (int) PerRequest.Items[latestVersion];
int changeset = SourceControlService.GetLatestChangeset(serverUrl, credentials);
PerRequest.Items[latestVersion] = changeset;
return changeset;
}
PerRequest.Items maps to HttpContext.Current.Items (it is a bit more complicated than that, we have non IIS hosted version to consider, but that is the same thing).
If you are wondering what it the most critical thing that you can do to get good performance, look at remote calls in the application.
SvnBridge - Optimization Results
A few days ago I posted about profiling results of SvnBridge. I spent the last few days implementing really aggressive caching mechanism, reducing remote calls, and in general working on the performance of the application. Here are the results: I think that I am in a good shape if the XML parsing starts to be a high level item in the profiling. Oh, and just a teaser, check this out. This is a checking out the SvnBridge code from Israel (and the servers are in the US). No, it...
Profiling with dotTrace
I have a tiny feature and a bug fix that I want to implement before I am going to focus solely improving SvnBridge performance. This is a really quick analysis of a single scenario. Start dotTrace and set the application to profile then start it. There are a lot of options, but the default was always good for me. In the application, prepare it for the scenario that you are going to perform. (In SvnBridge's case, this means just setting up the server to talk to): Perform some actions...
Upfront Optimizations
Developers tends to be micro optimizers by default, in most cases. In general it is accepted that this is a Bad Thing. This is a very common quote: In my last project, I wasn't willing to allow discussion on the performance of the application until we got to final QA stages. (We found exactly two bottle necks in the application, by the way, and it cost us 1 hour and 1 day to fix them). You could say that I really believe that premature optimization is a problem. However, the one thing that I will...
Creating objects - round 2
It was pointed out that I had a skew in my test, I was also calling i.ToString() in a tight loop, which probably kills the numbers. Here is the exact same benchmark, but using a constant string value, instead of calling i.ToString() all the time. Using new - 00.0177508 seconds (down from 00.3648117 seconds) Using Activator.CreateInstance - 06.3033382 seconds (down from 06.8242636 seconds) Using GetUninitializedObject - 02.8209057 seconds (down from 03.2422335 seconds) Using specialized dynamic method - 00.0417958 seconds (down from 00.4314517 seconds) ...
Creating objects - Perf implications
Here are a few examples of how we can create objects, and the perf implications of each way. In all those tests, I have used the following class as my benchmark. public class Created
{
public int Num;
public string Name;
public Created(int num, string name)
...
Distributed cache for the CLR
Sriram has just published a post about Cacheman, a pet project his that give us a Memcached like functionality based on the CLR. I started a project like that a while ago, but eventually I decided that it would be much easier to just use Memcached. Some of the tricks that Memcached is using is explicit memory management of the layout of the cache. I assume that the GC will take care of much of that for us. Sriram has managed to get to 16,000 requests / second with his current code. Which is certainly impressive. I...
Future Query Of implemented
It took very little time to actually make this work. I knew there was a reason liked my stack, it is flexible and easy to work with. You can check the implementation here, it is about 100 lines of code. And the test for it:FutureQueryOf<Parent> futureQueryOfParents = new FutureQueryOf<Parent>(DetachedCriteria.For<Parent>());
FutureQueryOf<Child> futureQueryOfChildren = new FutureQueryOf<Child>(DetachedCriteria.For<Child>());
Assert.AreEqual(0, futureQueryOfParents.Results.Count);
//This also kills the database, because we use an in
// memory one ,so we ensure that the code is not
// executing a second query
CurrentContext.DisposeUnitOfWork();
Assert.AreEqual(0, futureQueryOfChildren.Results.Count);
Future<TNHibernateQuery>
A while ago I added query batching support to NHibernate, so you can execute multiply queries to the database in a single roundtrip. That was well and good, except that you need to know, in advance, what you want to batch. This is often the case, but not nearly enough. Fairly often, I want disparate actions that would be batched together. It just occurred to me that this is entirely possible to do. In my Rhino Igloo project, I have a lot of places where I have code very similar to this (except it can go for quite a while):...
Performance, Joins and why you should always have a profiler
I did some heavy duty import process yesterday, and we run into severe performance issue with Rhino ETL joins. Five joins with about 250,000 records on the initial left and a few tens of thousands on the rights took about 2 hours to complete. That was unacceptable, and I decided that I have to fix this issue. I had a fairly good idea about what the issue was. Rhino ETL supports nested loops joins only at the moment, which means that the join is performed as (pseudo code): for leftRow in left:
for rightRow in right:
if MatchJoinCondition(leftRow, rightRow):
yield MergeRows(leftRow, rightRow)
Obviously the...
The cost of inappropriate linqing
I read this post with interest, apparently Linq for SQL is doing something odd, because I can't quite believe the results that this guy is getting. From the post: So I dug into the call graph a bit and found out the code causing by far the most damage was the creation of the LINQ query object for every call! The actual round trip to the database paled in comparison I can't imagine what they are doing there to cause this performance characteristics. From my own experiments with Linq, it should definitely not produce this amount of efforts.
A lesson in performance
A while ago I posted a performance analysis of Sending arrays to SQL Server: Xml vs. Comma Separated Values. The context for that was sending large number of parameters to the server to be processed in an IN expression. We have hit the 2,100 parameters limit of SQL server a few times, and that became critical. The reason that we had so many items to send to the IN expression was that we do both caching and calculation on the code, and then we need to get the data from that. As it turned out, I got a call today...
Slow application startup when using log4net AdoNetAppender
Just saw my application startup shoot through the roof. It got from the merely annoying to the absolutely ridicilous. At first I cast blame around, but after a while I tried the first method of investigating, which was to attach to the process and hit pause, then inspect what it is currently doing. The stack trace pointed out to the ConfigureAndWatch method on XmlConfigurator, which was stuck on SqlConnection.Open, which lead me to figure out that my connection string was pointing at the wrong place, which led to a slow startup...
Regex vs. string.IndexOf
I send a piece of code to Justin, which dealt with doing some simple text parsing. His comment was: text.Substring(lastIndex, currentIndex - lastIndex); Dude, Regex, dude! This code reminds me of when I wrote an XML parser in ASP3 The reason that I used IndexOf there was performance, this piece of code is in the critical path, and I don't think that Regex would give me much there. But Justin said that compiled Regex is more efficient than IndexOf, so I decided to check it. Here is my quick perf test: static void Main(string[] args)
{
string testStr = "select foo,...
High performance domain models
Udi has an interesting presentation that I recommend that you go through. He is going to present it at Tech Ed (Thu Nov 8 13:30 - 14:45 Room 117). Most of the ideas are familiar to me because I have spoken to him about them before, but it represent new concepts to most people. I would preface his suggestion with the usual warning about designing for performance. Udi's points are about big systems, so consider if they are appropriate to your scenario first. A pal of mine once told me that he designs systems for an order of magnitude increase in...
Real World NHibernate: Reducing startup times for large amount of entities
The scenario that Christiaan Baes need to solve is reducing the startup time of a Win Forms application. The main issue here is that the initial load of the application should be fast, but in this case, we are feeding NHibernate about a hundred entities, so it take a few seconds to run them. I asked Christiaan to send me profiler results of the code, and it looked all right on his end, so it was time to look at NHibernate and see what she had to say about that. The test scenario was startup time for a thousands entities. I think that...
Trusting the benchmark
I was scanning this article when I read noticed this bit. A benchmark showing superior performance to another dynamic proxy implementation. I should mention in advance that I am impressed. I have built one from scratch, and am an active member of DP2 (and DP1, when it was relevant). That is not something that you approach easily. It is hard, damn hard. And the edge cases will kill you there. But there are two problems that I have with this benchmark. They are mostly universal for benchmarks, actually. First, it tests unrealistic scenario, you never use a proxy generation framework with...
The CRM Horror
It is rare that I get to the foint where I am just flat out speechless from seeing something. Today I went beyond that, I was flat out speechless and aghast. The image that you see here is a small part of the FilteredAccount view in Micorosft CRM. Yes, you got that right, a small part. I got to this from experimenting with the DB model, trying to figure out how things work. I was strongly adviced not to make any use of any sort of view that started with Filtered. "It would make you cry", they said,...
The Performance Penalty of using ASP.Net
Bret is talking about tracking an issue with Watir that appeared after the application was migrated from ASP.Net to Ruby on Rails (IE issue, apparently). I think the reason we haven’t seen this problem before is because our .Net apps have been a lot slower that Rails. Slow enough to keep this IE bug from showing up. That is certainly something that I have seen before. The main problem is not the runtime performance, it is the initial performance. If I make a change to a page, I have to wait ~30 seconds for it to load. Contrast...
Optimizing NHibernate
Aaron (Eleutian) is talking about some issues that he has with optimizing with NHibernate. So in short, I feel NHibernate (and any ORM for that matter) needs the following features to really be optimization friendly: Lazy field initialization Querying for partial objects: select u(Username, Email) from User u Read-only queries that do not get flushed. Join qualifiers (on in T-SQL) Let me try to take this in order. Lazy Field Initialization: On the surface, it looks very good, because you can do something like: Customer customer = session.Load<Customer>(15);Console.Write(customer.Name); And the OR/M would generate...
Validating Users in Active Directory Gotcha
A while ago I asked about doing Active Directory Authentication, after getting some advice, I settled on using the Acitve Directory Membership provider. It worked, but after a while we started to get really bad feedback from the users about the time that it took to login. To give you an idea, I timed it and I have an average of 14 seconds spent just making the "Membership.ValidateUser()" call. For a while I insisted that it can't be my code, after all, I was explicitly using the Microsoft recommended way of doing it. I am not an Active Directory /...
Sending arrays to SQL Server: Xml vs. Comma Separated Values
I spoke before about using the XML capabilities of SQL Server in order to easily pass list of values to SQL Server. I thought that this was a pretty good way to go, until I started to look at the performance numbers. Let us take a look at this simple query: DECLARE @ids xmlSET @ids = '<ids> <id>ALFKI</id>... <id>SPLIR</id></ids>' SELECT * FROM CustomersWHERE CustomerID IN (SELECT ParamValues.ID.value('.','NVARCHAR(20)')FROM @ids .nodes('/ids/id') as ParamValues(ID) ) This simple query has a fairly involved execution plan: This looks to me like way too much stuff for such...
The truth about string concatenation performance...
Here is a riddle, what is faster? string str = "Id: " + i; string str = string.Format("Id: {0}", i); string str = new StringBuilder().Append("Id: ").Append(i).ToString(); If you guess StringBuilder or string.Format, you are mistaken. Over 10 million iterations, the simple "Id: " + i finished in 4.7 seconds, StringBuilder in 5.7 seconds and string.Format in 7.6 seconds. The reason for that is that the compiler can optimize the + operator to a call to string.Concat, and it does it quite often when you have several parameters. The optimizations of StringBuilder only shows up if you have several concatenations,...
Answering Mats' Challenge
Mats Helander has a challenge for OR/M developers, and Mats should know, since he is behind NPersist.
Go for the post for details, but basically it is loading all Customer->Orders->OrderLines graph in 3 statements or less.
Because I am a sucker for challenges, I implemented it with ActiveRecord. In Mats' terms, the code is so simple it hurts, and yes, I cheated :-)
internal static IList<Customer> LoadCustomersOrdersAndOrderLines()
{
Customer[] customers = Customer.FindAll();
Order[] orders = Order.FindAll();
OrderLine[] orderLines = OrderLine.FindAll();
foreach (Customer customer in customers)
{
customer.Orders = new List<Order>();//avoid lazy load when adding
}
foreach (Order order in orders)
{
order.OrderLines = new List<OrderLine>();//avoid lazy load...
Dreaming in Code: Multi Linq
I was asked how we will approach the same Multi Query approach with Linq integration, here are some thoughts about it.var posts = from post in data.Posts
where post.User.Name == "Ayende"
orderby post.PublishedDate desc;
var postsCount = posts.Count();
posts.Skip(10).Take(15);
new LinqQueryBatch()
.Add(posts)
.Add(postsCount)
.Execute();//perform the query
foreach(Post p in posts)//no query
{
Console.WriteLine(p.ToString());
}
//no query
Console.WriteLine("Overall posts by Ayende: {0}", postsCount.Single() );
The LinqQueryBatch in this case doesn't need to pass delegates to process the results, it can modify the Linq Query directly, so trying to find the result will find the one that was already loaded...
Query Building In The Domain / Service Layers
Here is an interesting topic. My ideal data access pattern means that there is a single query per controller per request (a request may involve several controllers, though). That is for reading data, obviously, for writing, I will batch the calls if needed. I am making heavy use of the Multi Criteria/Query in order to make this happen. I have run into a snug with this approach, however. The problem is that some of the services also do data access*. So I may call to the authorization service to supply me with the current user and its customer, and the service...
Exceptions Usability
I just made a small change to the EnsureMaxNumberOfQueriesPerRequestModule, when it detects that the amount of queries performed goes beyond the specified value, it would also include the queries that it detected in the exception message. Very minor change, but the affect is that I can just scroll the page and say: "Oh, I have a SELECT N+1 here", directly off the exception page. On a side note, I am getting better at optimizing NHibernate based application, and I strongly suggest anyone using NHibernate to look at Multi Query in 1.2 (and Multi Criteria on the trunk) for those kind...
Efficently loading deep object graphs
Here is an interesting approach to get deep object graphs effectively. This will ensure that you will get all the relevant collections without having to lazy load them and without a huge cartesian product. Especially useful if you want to load a collection of items with the associated deep object graph.public Policy GetPolicyEagerly(int policyId)
{
IList list = ActiveRecordUnitOfWorkFactory.CurrentSession.CreateMultiQuery()
.Add(@"from Policy policy left join fetch policy.PolicyLeadAssociations where policy.Id = :policyId")
.Add(@"from...
Shocking Rob
I am posting this mainly because I want to see how far I can shock Rob Conery The exception is raised by the EnsureMaxNumberOfQueriesPerRequestModule, and it is currently set on the development level, for QA/Staging, I would probably reduce it further, although I have some pages where I Oh, and to Rob, that was a classic error of doing query per node (instead of doing a single query) (added an eager load instead of a query and was done). I am doing some performance tuning right now, and all in all, it is very boring. Find a...
Correct, then performant
Rob Conery just put up an interesting post (more on that later) where he talks about my example of optimizing a page from 32 queries to 4. His conclusion to that was: I think that's a great sample of a page that was created by a developer who clearly doesn't know their way around a tool No, it isn't. That developer was me, and that was an explicit decision made when we started working. Up to some limit ( I think it is 50 queries per page ), you can do whatever you want for development. I don't want to...
Optimization Story
Today I had to deal with optimizing a page. Under certain circumstances, it would take ~30 seconds to run. On my machine, of course, it would load nearly instantly. The first thing that I did was put a DB profiler and watch the traffic going to the DB server. In my experience, that should always be the first place to look. No, not because I am using an OR/M, because that is usually the most common place for remote calls, and that is usually has an order of magnitude higher cost than anything else that I can do in the...
Developing for Scalable Applications
Evan Hoff has an eye opening post about the results of putting simple business logic in the database. Go read it, now. Now, after you have read that, let me continue with Rob's reply to my previous post: It's in the extreme, when you throw more and more code at a complex issue rather than use a simpler approach (like a view or SP) that the ORM model breaks. I'm sure you've run into this - you must have. I disagree in part, since it all depends on what do you define as extreme. Using OR/M opens new levels of extremeness with...
Performance and Explicit Domain Models
Udi talks about limitations for DDD as a result of performance constraints. He says: Ayende and I had an email conversation that started with me asking what would happen if I added an Order to a Customer’s “Orders” collection, when that collection was lazy loaded. My question was whether the addition of an element would result in NHibernate hitting the database to fill that collection. His answer was a simple “yes”. In the case where a customer can have many (millions) of Orders, that’s just not a feasible solution. He then goes on to describe several solutions for the...
Adaptive Fetching Strategies in ORMs
Aaron has a post about some wild ideas about OR/M fetching, specifically, auto-learning fetching strategies. The idea has merit, and the required technicalities are already inside NHibernate, I can imagine a proxy that talks to to fetching strategy to inform it about accessed properties, which would give it the information needed for the next time the same query is made. But, how do you correlate queries? I have "Repository<Salary>.FindAll(Where.Salary.Employee == CurrentEmployee);" in several places, in the "list employee salaries page" and in the "calculate tax" service. Each requires a different fetching strategy. Worse, I have this hiding in a method call...
ORM and when query plans go bad
D. Mark Lindell has a few questions about sclaing ORM: How can dynamic SQL ORMs deal with the fact that your database server (a.k.a SQL Server) can decide at any point that it is going to use an alternate query plan. A simple index HINT on the join syntax can fix this problem but how is my ORM going to handle this? ...
Convenient & Easy & Slow vs Convenient & Hard & Fast
Here is an interesting quote from the guys that build Twitter:Once you hit a certain threshold of traffic, either you need to strip out all the costly neat stuff that Rails does for you... Disclaimer: I am completely ignorant about Ruby and Rails, should maybe I should just shut up. One thing that I have noticed is that all those "neat" stuff can be very costly in its default implementation, but it doesn't have to be. Let us take a look at dasBlog's macros for an instance, shall we?...
Profiling surprises
You can bet that I was a bit surprised when I saw this in the list of heavy methods in dotTrace: Then I took a look at the number of calls... Did I mention that I don't like temporal data? BTW, dotTrace rocks!
String Performance
NHibernate just got a patch that replace this line:partString.ToLower(CultureInfo.InvariantCulture).IndexOf(text); With this:partString.IndexOf(text, StringComparison.InvariantCultureIgnoreCase); This is done to increase perfromance in a slow part of the code. Looking at this, can you spot the reason for the performance increase between the two? ToLower() will create a new string (which require memory allocation) use if for a single call and discard it (which means the GC now has to collect it). The second approach doesn't create temporary objects and does not require memory allocation for potentially large object.
Perf Problem Survey
Rico Mariani has a survey about the cause of performance problems. Most of the perf problems I have run into are usually something like this:
Agency agency = Repository<Agency>.Load(15321);
Employee[] employees = agency.GetAllEmployees( Where.Employee.IsActive = true );
foreach(Employee emp in employees)
{
if(emp.ShouldGetBonusForPeriod(DateUtil.GetPreviousMonthRange(DateTime.Today))
{
Console.WriteLine("Employee {0} should get a bonus.", emp.Name);
}
}
Where Employee.ShouldGetBonusForPeriod() is calling the database.
Blog temporarily down: dasBlog Limitations
Well, it looks like I have run into some limits with dasBlog. I have over 2,000 posts, and close to 1,200 files in my content directory. It gets to the points that just opening the directory has a noticable delay. Trying to run dasBlog currently takes roughly 100% of the CPU, and takes forever. This is on my machine, by the way, so it is single user only. Trying that on the server apperantly killed it. Profiling it shows that the fault is probably...
DB Queries per page hit?
I am listening to this, and the Jim Starkey mentioned something that made me perk my ears. (It is on 14:00)There are many more queries per human interaction, when you are running a dynamic web application - for anything that is not a trivial web application - you are looking at in excess of two dozens queries per page. This seems to be like a large number of queries for page, but that is a gut feeling from someone that is still wonder about performance hit when he writes i+=1...
OR/M and Performance
I spent most of the day going over an NHibernate application that was slow. The cause was most probably large number of queries for the database. I took my own advice and enabled performance logging for the application and let it run for a few days. When I read the logs today it looks like there were a number of pages where the number of queries was outside of any reasonable proportion. When I started reading the code for this pages, I realized that the problem wasn't with mis-use of NHibernate, or rather, it...
Performance: Make The Developers Feel The Pain
Do you care about perfromance? Do you want to avoid that last three weeks of intesive performance work that require you to break apart a beautiful domain model and cut wide swathes of optimization horrors into your code?The short answer to this is that developers must feel the pain, and they must feel it in very short intervals. [Please, do not use this sentence out of context]. How to do this? Setup the system so it would fail if a given performance objective is passed. A simple example can be to add a...
From Slashdot, On Performance
I knew there was a reason I read slashdot (beside waiting for a compile cycle in VS to finish, or the cursor to move). A comment about the speed of Perl vs. C++ and Assembler.Remember kids, if your process is IO-bound, you want the fastest possible code ever to make sleeping on those system calls as efficient as possible!
The cost of eager load...
This is the profiler results from trying to understand why I got very slow
responses from the server...
Notice what is costing so much time here. The first issue was that
the system was generated huge amounts of queries, and the quick & dirty
solution was to throw caching on top of this, which had a difference of three to
five orders of magnitude. The image above is after applying the caching, by the way, and there is no database activity in this call graph.
But performance continue to suffer, and I wasn't quite sure why. A quick look
at the profiler and...
NHibernate performance concerns
Darrel investigated NHibernate and came back with a Post Traumatic Stress Disorder. The issue he had was with NHibernate's Automatic Dirty Checking. The implementation fo this feature is done by keeping the state of the object when it was loaded from the database inside the session, and comparing the initial data to the current state of the object when flushing. The problem, in Darrel's words is: From my perspective there are two major drawbacks to this approach. First, considering the data you are manipulating...
Working with deep object graphs and NHibernate
Note, this image was generate using Active Writer and represnt a domain model similar to a project I am currently working on. I spared you the prolifieration of DateTime in the model and merely put it in three or four places just to show what it was in general. (continued below) Now, this model is not showing unrelevant additonal entities (and entities attributes) but it is complete enough that you would understand the general theme of things. The requirement is finding all potential employees in the...
The little query that could... drive me crazy
I have a piece of code that has to calculate some pretty hefty stuff over a large amount of data. Unfortantely, that large amount of data took large amount of time to load. By large amount I mean, I walked away and had time for a coffee, chit chat, about three phone calls and a relaxing bout of head banging, and it still continued to pry into the database, and likely would continue to do so until the end of time or there about. This calculation has two main charactaristics: ...
Batching support in NHibernate
As of about 90 minutes ago, NHibernate has batching support. :-D All the tests are green, but there may be things that broke in exciting ways, so I encourage you to try it out and see if you can break it. This functionality exists only for SQL Server, and only on .Net 2.0 (for complaints, go directly to the ADO.Net team). You can enable this functionality by adding this to your hibernate configuration. < add...
Ruby and the critical performance path
Joel wrote about Ruby's performance, and DHH replied with a post showing how he outsourcedhe performance-intensive functions. To note, my only experiance in Ruby is writing very few Watir tests. So I can't really say anything about Ruby's perfomance first hand. I agree with DHH that this is a good thing, but I wonder about how to handle this in situations where the performance critical part is something that is core to the business logic. I'm not talking about general stuff like image resizing, encryption or bayesian filtering (which I think you are crazy if you...
There Be Dragons: Rhino.Commons.SqlCommandSet
After last night's post about the performance benefits of SqlCommandSet, I decided to give the ADO.Net team some headache, and release the results in a reusable form. The relevant code can be found here, as part of Rhino Commons. Beside exposing the batching functionality, it is very elegant (if I say so myself) way of exposing functionality that the original author decided to mark private / internal. I really liked the declaration of this as well: [ ...
Opening Up Query Batching
I have ranted before about the annoying trend from Microsoft, to weld the hood shut in most of the interesting places. One particulary painful piece is the command batching implementation in .Net 2.0 for SQL Server. The is extremely annoying mainly because the implementation benefits are going for those who are going to be using DataSets (ahem, not me), but are not avialable to anyone outside of Microsoft. (See topic: OR/M, NHibernate, etc). Today, I have decided to actually check what the performance difference are all about. In order to do this,...
Measuring NHibernate's Queries Per Page
One of the biggest problems with abstractions is that they may allow you to do stupid things without them being obvious. In OR/M-land, that usually means SELECT N+1 issues.The problem is that you often develop a certain functionality first, and only then realize that while you tested, all was fine and dandy on the five items that you had, but on the real system, you have 5,000, and the DBA is on its way to ER...Anyway, I am currently working with Web Applications, and I wanted to get a good indication about what pages are troublesome.Being who I am, I...
Performance Logging
I just added a small http module to Rhino Commons. It is a very simple module that times how long it takes to process a page.It only times the server-side processing, of course, but it is a great way to tell you where you need to pay attention.It is using log4net to log the data, so you can redirect the output to a database, and from there, you can get all the data you want.Configurating the module is very simple. Create the following table:
CREATE TABLE [dbo].[PagePerformance]( [Id]
[int] PRIMARY KEY
IDENTITY(1,1) NOT NULL, [Date]
[datetime] NOT NULL, [Message]
[nvarchar](max) NOT NULL, [PageURL]
[nvarchar](max) NOT NULL, ...