Thursday, June 04, 2009 #

WCF works in mysterious ways

Here is the result of about two hours of trying to figure out what WCF is doing:

class Program
    {
        static private readonly Binding binding = new NetTcpBinding
        {
            OpenTimeout = TimeSpan.FromMilliseconds(500),
            CloseTimeout = TimeSpan.FromMilliseconds(250),
            ReaderQuotas =
                {
                    MaxArrayLength = Int32.MaxValue,
                    MaxBytesPerRead = Int32.MaxValue,
                    MaxNameTableCharCount = Int32.MaxValue,
                    MaxDepth = Int32.MaxValue,
                    MaxStringContentLength = Int32.MaxValue,
                },
            MaxReceivedMessageSize = Int32.MaxValue,
        };
        static void Main()
        {
            try
            {
                var uri = new Uri("net.tcp://" + Environment.MachineName + ":2200/master");
                var serviceHost = new ServiceHost(new DistributedHashTableMaster(new NodeEndpoint
                {
                    Async = uri.ToString(),
                    Sync = uri.ToString()
                }));
                serviceHost.AddServiceEndpoint(typeof(IDistributedHashTableMaster),
                                               binding,
                                               uri);

                serviceHost.Open();

                var channel =
                    new ChannelFactory<IDistributedHashTableMaster>(binding, new EndpointAddress(uri))
                        .CreateChannel();
                channel.Join();
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }

        }
    }

    [ServiceBehavior(
        InstanceContextMode = InstanceContextMode.Single,
        ConcurrencyMode = ConcurrencyMode.Single,
        MaxItemsInObjectGraph = Int32.MaxValue
        )]
    public class DistributedHashTableMaster : IDistributedHashTableMaster
    {
        private readonly Segment[] segments;

        public DistributedHashTableMaster(NodeEndpoint endpoint)
        {
            segments = Enumerable.Range(0, 8192).Select(i =>
                                                        new Segment
                                                        {
                                                            AssignedEndpoint = endpoint,
                                                            Index = i
                                                        }).ToArray();
        }

        public Segment[] Join()
        {
            return segments;
        }
    }

    [ServiceContract]
    public interface IDistributedHashTableMaster
    {
        [OperationContract]
        Segment[] Join();
    }

    public class NodeEndpoint
    {
        public string Sync { get; set; }
        public string Async { get; set; }
    }

    public class Segment
    {
        public Guid Version { get; set; }

        public int Index { get; set; }
        public NodeEndpoint AssignedEndpoint { get; set; }
        public NodeEndpoint InProcessOfMovingToEndpoint { get; set; }

        public int WcfHatesMeAndMakeMeSad { get; set; }
    }

The problem? On my machine, executing this results in:

Maximum number of items that can be serialized or deserialized in an object graph is '65536'. Change the object graph or increase the MaxItemsInObjectGraph quota.

The freaky part? Do you see the WcfHatesMeAndMakeMeSad property? If I comment that one out, the problem goes away. Since MaxItemsInObjectGraph is set to int.MaxValue, I don’t know what else to do, and frankly, I am getting mighty tired of WCF doing stuff in unpredictable ways.

Protocol Buffers & TcpClient, here I comes.

posted @ Thursday, June 04, 2009 8:18 PM | Feedback (34)

NHibernate – Beware of inadvisably applied caching strategies

One of the usual approaches for performance problems with most applications is to just throw caching on the problem until it goes away. NHibernate supports a very sophisticated caching mechanism, but, by default, it is disabled. Not only that, but there are multiple levels of opt ins that you have to explicitly state before you can benefit from caching.

Why is that?

The answer is quite simple, caching is an incredibly sensitive topic, involving such things as data freshness, target size, repetitive requests, etc. Each and every time I have seen caching used as a hammer, it ended up in tears, with a lot of micro management of the cache and quite a bit of frustration.

I wanted to give you an example, using the simple Blog->>Posts model, what happens if I wanted to display the blog and its posts? The code could look like this:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
    var blog = session.Get<Blog>(2);
    foreach (var post in blog.Posts)
    {
        Console.WriteLine(post.Title);
    }
    tx.Commit();
}

And the mapping are:

<class name="Blog"
         table="Blogs">
    <cache usage="read-write"/>
    <id name="Id">
        <generator class="identity"/>
    </id>
    <property name="Title"/>
    <property name="Subtitle" />
    <property name="AllowsComments" />
    <property name="CreatedAt" />
    <bag name="Posts" table="Posts" inverse="true">
        <cache usage="read-write"/>
        <key column="BlogId"/>
        <one-to-many class="Post"/>
    </bag>
</class>

<class name="Post"
             table="Posts">
    <id name="Id">
        <generator class="identity"/>
    </id>
    <property name="Title" />
    <many-to-one name="Blog"
                             column="BlogId"/>
</class>

Are you seeing the horrible issue in here? You probably don’t see this, but you will see in a moment. Let us see what is going to happen in the first run of this code:

image

That is about as well as you can make it. But what about the second time?

image

Ouch!

What just happened?!

Well, we loaded the blog from the cache, and then we loaded the Blogs’s Post collection from the cache. So far, it is working really nicely for us. However, the next thing we see, we have a huge SELECT N+1 and we have a lot more queries in the cache scenario than in the non cache scenario.

The problem is that when we cache a collection, we aren’t caching the data in that collection. We are only caching the ids that means that NHibernate gets the collection of ids and then try to resolve them one by one. Remember that I said that the mapping above has a horrible problem? While the Posts collection is cached, the Post themselves are not, requiring NHibernate to go to the database for each an every one of them.

Have I said ouch already? Be careful what you cache, and make sure that you aren't doing caching in a way that will actively harm you.

The same is applicable for the query cache as well, if you have a cached query that loaded entities, you want to make sure that the entities are also cached.

posted @ Thursday, June 04, 2009 2:20 AM | Feedback (21)