Ayende @ Rahien

Refunds available at head office

Transactional queuing system perf test

After running into sever performance issues when using MSMQ and transactions, I decided to run a more thorough test case.

Writing 10,000 messages to MSMQ Transactional Queue, separate transactions:

private static void AddData(MessageQueue q1, byte[] bytes)
{
    Console.WriteLine("{0:#,#}", bytes.Length);
    var sp = Stopwatch.StartNew();
    for (int i = 0; i < 10000; i++)
    {
        using (var msmqTx = new MessageQueueTransaction())
        {
        
            msmqTx.Begin();
            
            q1.Send(new Message
            {
                BodyStream = new MemoryStream(bytes),
                Label = i.ToString()
            }, msmqTx);

            msmqTx.Commit();
        }

        if (i%10 == 0) Console.WriteLine(i);
    }
    q1.Dispose();

    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

This took 6.08 seconds, or about 1,600 messages per second.

Writing 10,000 messages to MSMQ Transaction Queue, single transaction:

private static void AddData(MessageQueue q1, byte[] bytes)
{
    Console.WriteLine("{0:#,#}", bytes.Length);
    var sp = Stopwatch.StartNew();
    using (var msmqTx = new MessageQueueTransaction())
    {

        msmqTx.Begin(); for (int i = 0; i < 10000; i++)
        {


            q1.Send(new Message
            {
                BodyStream = new MemoryStream(bytes),
                Label = i.ToString()
            }, msmqTx);


            if (i % 10 == 0) Console.WriteLine(i);
        }
        msmqTx.Commit();
    }
    q1.Dispose();

    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

This took 0.825 seconds for 10,000 messages, so it should be able to process about 12,000 messages per second. But this is not a realistic scenario.

Now, to my test scenario, which is touching two queues for 10,000 messages…

Wait! I accidently run the wrong code, and that led me down the following path:

private static void JustTx()
{
    var sp = Stopwatch.StartNew();
    int msgs = 0;
    for (int i = 0; i < 1000; i++)
    {
        using (var msmqTx = new MessageQueueTransaction())
        {
            msmqTx.Begin();
            msgs = msgs + 1;
            if (msgs % 10 == 0)
                Console.WriteLine(msgs);
            
            msmqTx.Commit();
        }
    }
    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

This code is just running & committing MSMQ transaction, no actual operation is done on the queue, and there shouldn’t be any work for the system. We open & close a 1,000 transactions. That takes 17 seconds.

Remember, this is just open/close MSMQ local transaction, with no work done, and that gives me about 60 transactions per second. I think that I found my culprit.  I have other independent verification of this, and I find it extremely sad.

I am still waiting to hear from MSMQ experts about what is going on.

In the meantime, I tested this with my own queuing project, Rhino Queues. The first thing that I tried was the simplest, unrealistic scenario of sending 10,000 messages in a single transaction:

private static void AddData(IQueueManager manager, byte[] bytes)
{
    var sp = Stopwatch.StartNew();
    using (var tx = new TransactionScope())
    {
        for (int i = 0; i < 10000; i++)
        {
            manager.Send(new Uri("rhino.queues://localhost:2222/test1"), new MessagePayload
                {
                    Data = bytes
                });


            if (i % 10 == 0)
                Console.WriteLine(i);
        }
        tx.Complete();
    }

    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

This code takes 12.2 seconds to run, giving about 800 message per second. Not bad, but not really good either. In this scenario MSMQ finished everything in less than a second.

Let us see a more realistic scenario, of sending 10,000 messages in 10,000 separated transactions:

private static void AddData(IQueueManager manager, byte[] bytes)
{
    var sp = Stopwatch.StartNew();

    for (int i = 0; i < 10000; i++)
    {
        using (var tx = new TransactionScope())
        {
            manager.Send(new Uri("rhino.queues://localhost:2222/test1"), new MessagePayload
            {
                Data = bytes
            });


            if (i % 10 == 0)
                Console.WriteLine(i);
            tx.Complete();
        }
    }

    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

This completes in 3.7 minutes, for about 45 messages per second. Slightly worse than what MSMQ can do, which isn’t good.

However, the good part about using Rhino Queues is that I know what is going on there and I intentionally left some points of optimization out (get it working, get it working right, get it working fast). After exploring some of those optimizations, the same code base run for 2.7 minutes, so we saved 60 seconds on the runtime, bringing us to 60 messages per second.

Rhino Queues is now comparable to MSMQ performance in this scenario. I find this spooky, to tell you the truth. Profiling Rhino Queues tells me that most of the time with Rhino Queues (over 40%!) is spent not in Rhino Queues, but inside System.Transactions.Transaction.Dispose().

I wonder how I can reduce that load.

The next thing I tried was implementing ISinglePhaseNotification. This means that if there are no other durable enlistments for the DTC, Rhino Queues will be able to take advantage of the lightweight transactions support in System.Transactions.

That change had a dramatic effect. I can’t really grasp the perf difference!

The code (same code!) now executes in 16.7 seconds! That means 600 messages per second.

Of course, things aren’t as good when you compare the CopyData routine:

private static void CopyData(IQueueManager manager)
{

    Stopwatch sp = Stopwatch.StartNew();
    for (int i = 0; i < 10000; i++)
    {
        using (var tx = new TransactionScope())
        {
            var message = manager.Receive("test1");
            manager.Send(new Uri("rhino.queues://localhost/test2"), new MessagePayload
            {
                Data = message.Data,
                Headers = message.Headers
            });

            if (i % 10 == 0)
                Console.WriteLine(i);
            tx.Complete();
        }
    }

    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

This guy takes 5.7 minutes to complete, at a rate 30 messages per second. That is pretty lame. Similar code with MSMQ did 60 message per second on my machine, and I considered that utterly unacceptable.

Firing up dotTrace showed me that I was being STUPID (as in, not just stupid, or Stupid with a capital s, but full fledged stupid in all caps) and didn’t clean out the queue from read messages. After I fixed that, the same code run in 1.5 minutes, or about 110 messages per second.

Nice.

And I still consider being almost twice as fast as MSMQ to be very spooky.

Comments

annon
10/29/2009 10:32 AM by
annon

Did you try this with any AMQP - for example RabbitMQ( http://www.rabbitmq.com/) ? We switched from MSMQ to RabbitMQ and couldn't be more happier.

Rafal
10/29/2009 11:14 AM by
Rafal

This makes RSB load balancing feature quite useless, as the load balancer is unable to distribute messages faster than message handlers process them. Right?

Ayende Rahien
10/29/2009 11:26 AM by
Ayende Rahien

Rafal,

Yes & no.

It is a problem, but not an unsolvable one.

Rafal
10/29/2009 11:36 AM by
Rafal

I do hope so. I find it extremely strange that an empty MSMQ transaction takes longer to execute than a transaction actually doing something. Maybe you should include a call to 'dummyQueue.Send("whatever")' in each transaction?

Bryan
10/29/2009 12:57 PM by
Bryan

We don't use transactions when sending our messages. I know, crazy. We delay them until after database transaction commit, and then send them.

Sure, once in a blue moon a process crashes or the messaging system goes down and we lose messages. This happens so rarely for us, that it has yet to cause a problem, but it's obviously still a risk.

We then have cleaner processes that either rebroadcast messages and/or simply clean up any bad data. We usually need these processes regardless of our messaging subsystem anyway.

So far, this has worked out pretty well for us and doesn't suffer from the performance problems of using a distributed transaction.

Steve Py
10/29/2009 09:41 PM by
Steve Py

@ Ayende,

I'd hinted at this option in your earlier post; Sorry, normally I'd try this myself but my work environment here doesn't have MSMQ set up;

Have you tried this variation? (1 re-used transaction instance, individual begin/commits)

private static void JustTx()

{

var sp = Stopwatch.StartNew();

int msgs = 0;

using (var msmqTx = new MessageQueueTransaction())

{

    for (int i = 0; i < 1000; i++)

    {

        msmqTx.Begin();

        msgs = msgs + 1;

        if (msgs % 10 == 0)

            Console.WriteLine(msgs);


        msmqTx.Commit();

    }

}

Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);

}

From what I've read in the doco it should work and sounds like it'd solve your Dispose issue.

Frank
10/29/2009 11:38 PM by
Frank

What am I overlooking here? When I do some benchmarks on sending messages with MSMQ using distributed transactions, I do over 500 messages/second. This is the code.

class Program

{

static void Main(string[] args)

{

    const int messageSize = 1024;

    var messageData = new byte[messageSize];

    for (int i = 0; i < messageSize; ++i)

    {

        messageData[i] = (byte)(i & 0xFF);

    }


    var sp = Stopwatch.StartNew();

    for (int i = 0; i < 10000; i++)

    {

        using (var tx = new TransactionScope())

        {

            using (var queue = new MessageQueue(@".\private$\Benchmark", QueueAccessMode.Send))

            {

                var message = new Message();

                message.BodyStream = new MemoryStream(messageData);


                queue.Send(message, MessageQueueTransactionType.Automatic);

            }


            if (i % 100 == 0)

                Console.WriteLine(i);


            tx.Complete();

        }

    }

    sp.Stop();


    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);

}

}

Lothan
10/30/2009 12:55 AM by
Lothan

I just tried that Steve and from what I am seeing it does not make much difference whether the transaction is instantiated and disposed outside the loop or inside the loop.

Steve Py
10/30/2009 02:03 AM by
Steve Py

Ah, pity. :)

Michal
10/30/2009 07:52 AM by
Michal

Not really related to forwarding perf, but in the "10,000 messages separate transactions" scenario you seem to compare 1600 msg/sec in MSMQ and 45(60) msg/sec in RQ and write that RQ's "slightly" worse.

Am I missing something or is there a typo?

Steve Py
10/30/2009 10:29 AM by
Steve Py

Ok, I found something that might help explain the behaviour:

blogs.sun.com/.../javacapstipreadingmsmq

The issue is reading the message within the transaction. For example if you write your method to transfer 1000 messages from one transactional queue to the other, and utilize a Peek within the transaction and a Receive after the commit, your performance shoots up to respectable speeds...

    private static void TransferTx()

    {

        var sp = Stopwatch.StartNew();

        int msgs = 0;

        using ( var msmqFrom = new MessageQueue( @".\Private$\test" ) )

        {

            using ( var msmqTo = new MessageQueue( @".\Private$\test2" ) )

            {

                using ( var msmqTx = new MessageQueueTransaction() )

                {

                    for ( int i = 0; i < 1000; i++ )

                    {


                        msmqTx.Begin();

                        try

                        {

                            var message = msmqFrom.Peek( TimeSpan.Zero );

                            msmqTo.Send( new System.Messaging.Message( "Test" + i ), msmqTx );

                            msgs = msgs + 1;

                            if ( msgs % 100 == 0 )

                                Console.WriteLine( msgs );

                            msmqTx.Commit();

                            msmqFrom.Receive();

                        }

                        catch

                        {

                            msmqTx.Abort();

                        }

                    }


                }

                msmqTo.Purge();

            }

        }

        Console.WriteLine( "Transfer Queues = {0:#,#}", sp.ElapsedMilliseconds/1000f );

    }

You can see the trouble with the receive from 1 queue and write to another. If you "Receive" with a transaction type of "none", effectively performing it outside of the transaction the performance shoots up.

Based on that article your choices are a Peek which negates the transactional effect while preserving the behaviour. The message is received outside once the transaction is committed. The downside is of course this is only suitable for 1 listener on the receive queue. Of course if your listener is effectively working as a dispatcher moving messages between queues, this should be able to be modelled into a single listener role. The only other real alternative to minimize the impact of reading in a transaction is processing groups of messages within a single transaction.

MichaelGG
10/30/2009 10:34 AM by
MichaelGG

After using MSMQ a few times and then discovering SQL Server Service Broker, I cannot fathom ever having to use MSMQ again. SSSB is much easier to setup and monitor. And since it is accessible just as extensions to T-SQL, it's accessible. And if your queue processing code ends up working against the local database, you won't need a distributed transaction.

Steve Py
10/30/2009 10:39 AM by
Steve Py

@Michal

No, this post is probably a bit confusing without referring back to his previous gripe about MSMQ transaction performance. Writing within a transaction is plenty fast. His last MSMQ example where he hasn't read or written was still bloody slow for some reason. I believe MSMQ is just behaving a bit shirty because no operation has been taken on a Queue when the transaction has been committed. His ultimate goal is determining why reading and writing within the same transaction is so slow.

P.s. The above example I posted had a 2nd method that populated the first private queue with 1000 messages as per Ayende's first example. I'm at home now for the weekend so I have the time and resources to play around with queues again. :)

Frank
10/30/2009 06:06 PM by
Frank

Hmm, did some further testing, resulting in the following observations:

  • Sending one message per distributed transaction is fast.

  • Receiving one message per distributed transaction is fast.

  • Receiving one message and sending it to another queue per distributed transaction is very slow (over 10 times slower than the other scenarios).

  • All scenarios are actually using MSDTC as can be seen from the number of distributed transaction that succeeded increase.

That receiving and sending is a creepy one. :(

h
11/06/2009 01:01 AM by
h

I can recommend trying out RabbitMQ as well and having a look at erlang ;).

Comments have been closed on this topic.