Transactional queuing system perf test
After running into sever performance issues when using MSMQ and transactions, I decided to run a more thorough test case.
Writing 10,000 messages to MSMQ Transactional Queue, separate transactions:
private static void AddData(MessageQueue q1, byte[] bytes) { Console.WriteLine("{0:#,#}", bytes.Length); var sp = Stopwatch.StartNew(); for (int i = 0; i < 10000; i++) { using (var msmqTx = new MessageQueueTransaction()) { msmqTx.Begin(); q1.Send(new Message { BodyStream = new MemoryStream(bytes), Label = i.ToString() }, msmqTx); msmqTx.Commit(); } if (i%10 == 0) Console.WriteLine(i); } q1.Dispose(); Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds); }
This took 6.08 seconds, or about 1,600 messages per second.
Writing 10,000 messages to MSMQ Transaction Queue, single transaction:
private static void AddData(MessageQueue q1, byte[] bytes) { Console.WriteLine("{0:#,#}", bytes.Length); var sp = Stopwatch.StartNew(); using (var msmqTx = new MessageQueueTransaction()) { msmqTx.Begin(); for (int i = 0; i < 10000; i++) { q1.Send(new Message { BodyStream = new MemoryStream(bytes), Label = i.ToString() }, msmqTx); if (i % 10 == 0) Console.WriteLine(i); } msmqTx.Commit(); } q1.Dispose(); Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds); }
This took 0.825 seconds for 10,000 messages, so it should be able to process about 12,000 messages per second. But this is not a realistic scenario.
Now, to my test scenario, which is touching two queues for 10,000 messages…
Wait! I accidently run the wrong code, and that led me down the following path:
private static void JustTx() { var sp = Stopwatch.StartNew(); int msgs = 0; for (int i = 0; i < 1000; i++) { using (var msmqTx = new MessageQueueTransaction()) { msmqTx.Begin(); msgs = msgs + 1; if (msgs % 10 == 0) Console.WriteLine(msgs); msmqTx.Commit(); } } Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds); }
This code is just running & committing MSMQ transaction, no actual operation is done on the queue, and there shouldn’t be any work for the system. We open & close a 1,000 transactions. That takes 17 seconds.
Remember, this is just open/close MSMQ local transaction, with no work done, and that gives me about 60 transactions per second. I think that I found my culprit. I have other independent verification of this, and I find it extremely sad.
I am still waiting to hear from MSMQ experts about what is going on.
In the meantime, I tested this with my own queuing project, Rhino Queues. The first thing that I tried was the simplest, unrealistic scenario of sending 10,000 messages in a single transaction:
private static void AddData(IQueueManager manager, byte[] bytes) { var sp = Stopwatch.StartNew(); using (var tx = new TransactionScope()) { for (int i = 0; i < 10000; i++) { manager.Send(new Uri("rhino.queues://localhost:2222/test1"), new MessagePayload { Data = bytes }); if (i % 10 == 0) Console.WriteLine(i); } tx.Complete(); } Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds); }
This code takes 12.2 seconds to run, giving about 800 message per second. Not bad, but not really good either. In this scenario MSMQ finished everything in less than a second.
Let us see a more realistic scenario, of sending 10,000 messages in 10,000 separated transactions:
private static void AddData(IQueueManager manager, byte[] bytes) { var sp = Stopwatch.StartNew(); for (int i = 0; i < 10000; i++) { using (var tx = new TransactionScope()) { manager.Send(new Uri("rhino.queues://localhost:2222/test1"), new MessagePayload { Data = bytes }); if (i % 10 == 0) Console.WriteLine(i); tx.Complete(); } } Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds); }
This completes in 3.7 minutes, for about 45 messages per second. Slightly worse than what MSMQ can do, which isn’t good.
However, the good part about using Rhino Queues is that I know what is going on there and I intentionally left some points of optimization out (get it working, get it working right, get it working fast). After exploring some of those optimizations, the same code base run for 2.7 minutes, so we saved 60 seconds on the runtime, bringing us to 60 messages per second.
Rhino Queues is now comparable to MSMQ performance in this scenario. I find this spooky, to tell you the truth. Profiling Rhino Queues tells me that most of the time with Rhino Queues (over 40%!) is spent not in Rhino Queues, but inside System.Transactions.Transaction.Dispose().
I wonder how I can reduce that load.
The next thing I tried was implementing ISinglePhaseNotification. This means that if there are no other durable enlistments for the DTC, Rhino Queues will be able to take advantage of the lightweight transactions support in System.Transactions.
That change had a dramatic effect. I can’t really grasp the perf difference!
The code (same code!) now executes in 16.7 seconds! That means 600 messages per second.
Of course, things aren’t as good when you compare the CopyData routine:
private static void CopyData(IQueueManager manager) { Stopwatch sp = Stopwatch.StartNew(); for (int i = 0; i < 10000; i++) { using (var tx = new TransactionScope()) { var message = manager.Receive("test1"); manager.Send(new Uri("rhino.queues://localhost/test2"), new MessagePayload { Data = message.Data, Headers = message.Headers }); if (i % 10 == 0) Console.WriteLine(i); tx.Complete(); } } Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds); }
This guy takes 5.7 minutes to complete, at a rate 30 messages per second. That is pretty lame. Similar code with MSMQ did 60 message per second on my machine, and I considered that utterly unacceptable.
Firing up dotTrace showed me that I was being STUPID (as in, not just stupid, or Stupid with a capital s, but full fledged stupid in all caps) and didn’t clean out the queue from read messages. After I fixed that, the same code run in 1.5 minutes, or about 110 messages per second.
Nice.
And I still consider being almost twice as fast as MSMQ to be very spooky.
Comments
Did you try this with any AMQP - for example RabbitMQ( http://www.rabbitmq.com/) ? We switched from MSMQ to RabbitMQ and couldn't be more happier.
This makes RSB load balancing feature quite useless, as the load balancer is unable to distribute messages faster than message handlers process them. Right?
Rafal,
Yes & no.
It is a problem, but not an unsolvable one.
I do hope so. I find it extremely strange that an empty MSMQ transaction takes longer to execute than a transaction actually doing something. Maybe you should include a call to 'dummyQueue.Send("whatever")' in each transaction?
We don't use transactions when sending our messages. I know, crazy. We delay them until after database transaction commit, and then send them.
Sure, once in a blue moon a process crashes or the messaging system goes down and we lose messages. This happens so rarely for us, that it has yet to cause a problem, but it's obviously still a risk.
We then have cleaner processes that either rebroadcast messages and/or simply clean up any bad data. We usually need these processes regardless of our messaging subsystem anyway.
So far, this has worked out pretty well for us and doesn't suffer from the performance problems of using a distributed transaction.
@ Ayende,
I'd hinted at this option in your earlier post; Sorry, normally I'd try this myself but my work environment here doesn't have MSMQ set up;
Have you tried this variation? (1 re-used transaction instance, individual begin/commits)
private static void JustTx()
{
}
From what I've read in the doco it should work and sounds like it'd solve your Dispose issue.
What am I overlooking here? When I do some benchmarks on sending messages with MSMQ using distributed transactions, I do over 500 messages/second. This is the code.
class Program
{
}
I just tried that Steve and from what I am seeing it does not make much difference whether the transaction is instantiated and disposed outside the loop or inside the loop.
Ah, pity. :)
Not really related to forwarding perf, but in the "10,000 messages separate transactions" scenario you seem to compare 1600 msg/sec in MSMQ and 45(60) msg/sec in RQ and write that RQ's "slightly" worse.
Am I missing something or is there a typo?
Ok, I found something that might help explain the behaviour:
blogs.sun.com/.../java_caps_tip_reading_msmq
The issue is reading the message within the transaction. For example if you write your method to transfer 1000 messages from one transactional queue to the other, and utilize a Peek within the transaction and a Receive after the commit, your performance shoots up to respectable speeds...
You can see the trouble with the receive from 1 queue and write to another. If you "Receive" with a transaction type of "none", effectively performing it outside of the transaction the performance shoots up.
Based on that article your choices are a Peek which negates the transactional effect while preserving the behaviour. The message is received outside once the transaction is committed. The downside is of course this is only suitable for 1 listener on the receive queue. Of course if your listener is effectively working as a dispatcher moving messages between queues, this should be able to be modelled into a single listener role. The only other real alternative to minimize the impact of reading in a transaction is processing groups of messages within a single transaction.
After using MSMQ a few times and then discovering SQL Server Service Broker, I cannot fathom ever having to use MSMQ again. SSSB is much easier to setup and monitor. And since it is accessible just as extensions to T-SQL, it's accessible. And if your queue processing code ends up working against the local database, you won't need a distributed transaction.
@Michal
No, this post is probably a bit confusing without referring back to his previous gripe about MSMQ transaction performance. Writing within a transaction is plenty fast. His last MSMQ example where he hasn't read or written was still bloody slow for some reason. I believe MSMQ is just behaving a bit shirty because no operation has been taken on a Queue when the transaction has been committed. His ultimate goal is determining why reading and writing within the same transaction is so slow.
P.s. The above example I posted had a 2nd method that populated the first private queue with 1000 messages as per Ayende's first example. I'm at home now for the weekend so I have the time and resources to play around with queues again. :)
Hmm, did some further testing, resulting in the following observations:
Sending one message per distributed transaction is fast.
Receiving one message per distributed transaction is fast.
Receiving one message and sending it to another queue per distributed transaction is very slow (over 10 times slower than the other scenarios).
All scenarios are actually using MSDTC as can be seen from the number of distributed transaction that succeeded increase.
That receiving and sending is a creepy one. :(
I can recommend trying out RabbitMQ as well and having a look at erlang ;).
Comment preview