Ayende @ Rahien

Refunds available at head office

Challenge: The problem of locking down tasks…

The following code has very subtle bug:

   1: public class AsyncQueue
   2: {
   3:     private readonly Queue<int> items = new Queue<int>();
   4:     private volatile LinkedList<TaskCompletionSource<object>> waiters = new LinkedList<TaskCompletionSource<object>>();
   5:  
   6:     public void Enqueue(int i)
   7:     {
   8:         lock (items)
   9:         {
  10:             items.Enqueue(i);
  11:             while (waiters.First != null)
  12:             {
  13:                 waiters.First.Value.TrySetResult(null);
  14:                 waiters.RemoveFirst();
  15:             }
  16:         }
  17:     }
  18:  
  19:     public async Task<IEnumerable<int>> DrainAsync()
  20:     {
  21:         while (true)
  22:         {
  23:             TaskCompletionSource<object> taskCompletionSource;
  24:             lock (items)
  25:             {
  26:                 if (items.Count > 0)
  27:                 {
  28:                     return YieldAllItems();
  29:                 }
  30:                 taskCompletionSource = new TaskCompletionSource<object>();
  31:                 waiters.AddLast(taskCompletionSource);
  32:             }
  33:             await taskCompletionSource.Task;
  34:         }
  35:     }
  36:  
  37:     private IEnumerable<int> YieldAllItems()
  38:     {
  39:         while (items.Count > 0)
  40:         {
  41:             yield return items.Dequeue();
  42:         }
  43:  
  44:     }
  45: }

I’ll even give you a hint, try to run the following client code:

   1: for (int i = 0; i < 1000 * 1000; i++)
   2: {
   3:     q.Enqueue(i);
   4:     if (i%100 == 0)
   5:     {
   6:         Task.Factory.StartNew(async () =>
   7:             {
   8:                 foreach (var result in await q.DrainAsync())
   9:                 {
  10:                     Console.WriteLine(result);
  11:                 }
  12:             });
  13:     }
  14:  
  15: }
Can you figure out what the problem is?

Published at

Originally posted at

Comments (7)

Unprofessional code

This code bugs me.

image

It isn’t wrong, and it is going to produce the right result. But is ain’t pro code, in the sense that this code lacks an important *ities.

Just to cut down on the guessing, I marked the exact part that bugs me. Can you see the issue?

Tags:

Published at

Originally posted at

Comments (41)

Explain this code: Answers

The reason this code is useful?

image

Because it allows you to write this sort of code:

class Program
{
    private static Collection<int> nums;

    static void Main(string[] args)
    {
        nums.Add(1);
        Console.WriteLine(nums.Count);
    }
}

I was aiming for that, and still this code strikes me as wrong.

Tags:

Published at

Originally posted at

Comments (22)

Challenge: Minimum number of round trips

We are working on creating better experience for RavenDB & Sharding, and that led us to the following piece of code:

shardedSession.Load<Post>("posts/1234", "post/3214", "posts/1232", "posts/1238", "posts/1232");

And the following Shard function:

public static string[] GetAppropriateUrls(string id)
{
    switch (id.Last())
    {
        case '4':
            return new[] { "http://srv-4", "http://srv-backup-4" };

        case '2':
            return new[] { "http://srv-2" };

        case '8':
            return new[] { "http://srv-backup-4" };

        default:
            throw new InvalidOperationException();
    }
}

Write a function that would make the minimum number of queries to load of all the posts from all of the servers.

Tags:

Published at

Originally posted at

Comments (25)

The tax calculation challenge

People seems to be more interested in answering the question than the code that solved it. Actually, people seemed to be more interested in outdoing one another in creating answers to that. What I found most interesting is that a large percentage of the answers (both in the blog post and in the interviews) got a lot of that wrong.

So here is the question in full. The following table is the current tax rates in Israel:

  Tax Rate
Up to 5,070 10%
5,071 up to 8,660 14%
8,661 up to 14,070 23%
14,071 up to 21,240 30%
21,241 up to 40,230 33%
Higher than 40,230 45%

Here are some example answers:

  • 5,000 –> 500
  • 5,800 –> 609.2
  • 9,000 –> 1087.8
  • 15,000 –> 2532.9
  • 50,000 –> 15,068.1

This problem is a bit tricky because the tax rate doesn’t apply to the whole sum, only to the part that is within the current rate.

Tags:

Published at

Originally posted at

Comments (79)

Elegancy challenge: Cacheable Batches

Let us say that we have the following server code:

public JsonDocument[] GetDocument(string[] ids)
{
  var  results = new List<JsonDocument>();
  foreach(var id in ids)
  {
    results.Add(GetDocument(id));
  }
  return result.ToArray();  
}

This method is a classic example of batching requests to the server. For the purpose of our discussion, we have a client proxy that looks like this:

public class ClientProxy : IDisposable
{
  MemoryCache cache = new MemoryCache();
  
  public JsonDocument[] GetDocument(params string[] ids)
  {
    
    // make the request to the server, for example
    var request = WebRequest.Create("http://server/get?id=" + string.Join("&id=", ids));
    using(var stream = request.GetResponse().GetResposeStream())
    {
      return  GetResults(stream);
    }
  }
  
  public void Dispose()
  {
    cache.Dispose();
  }
}

Now, as you can probably guess from the title and from the code above, the question relates to caching. We need to make the following pass:

using(var proxy = new ClientProxy())
{
  proxy.GetPopularity("ayende", "oren"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "hippo"); // nothing in the cahce, make request to server
  proxy.GetPopulairty("rhino", "aligator"); // only request aligator, 'rhino' is in the cache
  proxy.GetPopulairty("rhino", "hippo");  // don't make any request, serve from cache
  proxy.GetPopulairty("rhino", "oren");   // don't make any request, serve from cache
}

The tricky part, of course, is to make this elegant. You can modify both server and client code.

I simplified the problem drastically, but one of the major benefits in the real issue was reducing the size of data we have to fetch over the network even for partial cached queries.

Tags:

Published at

Originally posted at

Comments (27)

Challenge: Recent Comments with Future Posts

We were asked to implement comments RSS for this blog, and many people asked about the recent comments widget. That turned out to be quite a bit more complicated than it appeared on first try, and I thought it would make a great challenge.

On the face of it, it looks like a drop dead simple feature, right? Show me the last 5 comments in the blog.

The problem with that is that it ignores something that is very important to me, the notion of future posts. One of the major advantages of RaccoonBlog is that I am able to post stuff that would go on the queue (for example, this post would be scheduled for about a month from the date of writing it), but still share that post with other people. Moreover, those other people can also comment on the post. To make things interesting, it is quite common for me to re-schedule posts, moving them from one date to another.

Given that complication, let us try to define how we want the Recent Comments feature to behave with regards to future posts. The logic is fairly simple:

  • Until the post is public, do not show the post comments.
  • If a post had comments on it while it was a future post, when it becomes public, the comments that were already posted there, should also be included.

The last requirement is a bit tricky. Allow me to explain. It would be easier to understand with an example, which luckily I have:

image_thumb[2]

As you can see, this is a post that was created on the 1st of July, but was published on the 12th. Mike and me have commented on the post shortly after it was published (while it was still hidden from the general public). Then, after it was published, grega_g and Jonty have commented on that.

Now, let us assume that we query at 12 Jul, 08:55 AM, we will not get any comments from this post, but on 12 Jul, 09:01 AM, we should get both comments from this post. To make things more interesting, those should come after comments that were posted (chronologically) after them. Confusing, isn’t it? Again, let us go with a visual aid for explaining things.

In other words, let us say that we also have this as well:

image

Here is what we should see in the Recent Comments:

12 Jul, 2011 – 08:55 AM 12 Jul, 2011 – 09:05 AM
  1. 07/07/2011 07:42 PM – jdn
  2. 07/08/2011 07:34 PM – Matt Warren
  1. 07/03/2011 05:26 PM – Ayende Rahien
  2. 07/03/2011 05:07 PM - Mike Minutillo
  3. 07/07/2011 07:42 PM – jdn
  4. 07/08/2011 07:34 PM – Matt Warren

Note that the 1st and 2snd location on 9:05 are should sort after 3rd and 4th, but are sorted before them, because of the post publish date, which we also take into account.

Given all of that, and regardless of the actual technology that you use, how would you implement this feature?

Answer: Modifying execution approaches

In RavenDB, we had this piece of code:

        internal T[] LoadInternal<T>(string[] ids, string[] includes)
        {
            if(ids.Length == 0)
                return new T[0];

            IncrementRequestCount();
            Debug.WriteLine(string.Format("Bulk loading ids [{0}] from {1}", string.Join(", ", ids), StoreIdentifier));
            MultiLoadResult multiLoadResult;
            JsonDocument[] includeResults;
            JsonDocument[] results;
#if !SILVERLIGHT
            var sp = Stopwatch.StartNew();
#else
            var startTime = DateTime.Now;
#endif
            bool firstRequest = true;
            do
            {
                IDisposable disposable = null;
                if (firstRequest == false) // if this is a repeated request, we mustn't use the cached result, but have to re-query the server
                    disposable = DatabaseCommands.DisableAllCaching();
                using (disposable)
                    multiLoadResult = DatabaseCommands.Get(ids, includes);

                firstRequest = false;
                includeResults = SerializationHelper.RavenJObjectsToJsonDocuments(multiLoadResult.Includes).ToArray();
                results = SerializationHelper.RavenJObjectsToJsonDocuments(multiLoadResult.Results).ToArray();
            } while (
                AllowNonAuthoritiveInformation == false &&
                results.Any(x => x.NonAuthoritiveInformation ?? false) &&
#if !SILVERLIGHT
                sp.Elapsed < NonAuthoritiveInformationTimeout
#else 
                (DateTime.Now - startTime) < NonAuthoritiveInformationTimeout
#endif
                );

            foreach (var include in includeResults)
            {
                TrackEntity<object>(include);
            }

            return results
                .Select(TrackEntity<T>)
                .ToArray();
        }

And we needed to take this same piece of code and execute it in:

  • Async fashion
  • As part of a batch of queries (sending multiple requests to RavenDB in a single HTTP call).

Everything else is the same, but in each case the marked line is completely different.

I chose to address this by doing a Method Object refactoring. I create a new class, and moved all the local variables to fields, and moved each part of the method to its own method. I also explicitly gave up control on executing, deferring that to whoever it calling us. We ended up with this:

    public class MultiLoadOperation
    {
        private static readonly Logger log = LogManager.GetCurrentClassLogger();

        private readonly InMemoryDocumentSessionOperations sessionOperations;
        private readonly Func<IDisposable> disableAllCaching;
        private string[] ids;
        private string[] includes;
        bool firstRequest = true;
        IDisposable disposable = null;
        JsonDocument[] results;
        JsonDocument[] includeResults;
                
#if !SILVERLIGHT
        private Stopwatch sp;
#else
        private    DateTime startTime;
#endif

        public MultiLoadOperation(InMemoryDocumentSessionOperations sessionOperations, 
            Func<IDisposable> disableAllCaching,
            string[] ids, string[] includes)
        {
            this.sessionOperations = sessionOperations;
            this.disableAllCaching = disableAllCaching;
            this.ids = ids;
            this.includes = includes;
        
            sessionOperations.IncrementRequestCount();
            log.Debug("Bulk loading ids [{0}] from {1}", string.Join(", ", ids), sessionOperations.StoreIdentifier);

#if !SILVERLIGHT
            sp = Stopwatch.StartNew();
#else
            startTime = DateTime.Now;
#endif
        }

        public IDisposable EnterMultiLoadContext()
        {
            if (firstRequest == false) // if this is a repeated request, we mustn't use the cached result, but have to re-query the server
                disposable = disableAllCaching();
            return disposable;
        }

        public bool SetResult(MultiLoadResult multiLoadResult)
        {
            firstRequest = false;
            includeResults = SerializationHelper.RavenJObjectsToJsonDocuments(multiLoadResult.Includes).ToArray();
            results = SerializationHelper.RavenJObjectsToJsonDocuments(multiLoadResult.Results).ToArray();

            return    sessionOperations.AllowNonAuthoritiveInformation == false &&
                    results.Any(x => x.NonAuthoritiveInformation ?? false) &&
#if !SILVERLIGHT
                    sp.Elapsed < sessionOperations.NonAuthoritiveInformationTimeout
#else 
                    (DateTime.Now - startTime) < sessionOperations.NonAuthoritiveInformationTimeout
#endif
                ;
        }

        public T[] Complete<T>()
        {
            foreach (var include in includeResults)
            {
                sessionOperations.TrackEntity<object>(include);
            }

            return results
                .Select(sessionOperations.TrackEntity<T>)
                .ToArray();
        }
    }

Note that this class doesn’t contain two very important things:

  • The actual call to the database, we gave up control on that.
  • The execution order for the methods, we don’t control that either.

That was ugly, and I decided that since I have to write another implementation as well, I might as well do the right thing and have a shared implementation. The key was to extract everything away except for the call to get the actual value. So I did just that, and we got a new class, that does all of the functionality above, except control where the actual call to the server is made and how.

Now, for the sync version, we have this code:

internal T[] LoadInternal<T>(string[] ids, string[] includes)
{
    if(ids.Length == 0)
        return new T[0];

    var multiLoadOperation = new MultiLoadOperation(this, DatabaseCommands.DisableAllCaching, ids, includes);
    MultiLoadResult multiLoadResult;
    do
    {
        using(multiLoadOperation.EnterMultiLoadContext())
        {
            multiLoadResult = DatabaseCommands.Get(ids, includes);
        }
    } while (multiLoadOperation.SetResult(multiLoadResult));

    return multiLoadOperation.Complete<T>();
}

This isn’t the most trivial of methods, I’ll admit, but it is ever so much better than the alternative, especially since now the async version looks like:

/// <summary>
/// Begins the async multi load operation
/// </summary>
public Task<T[]> LoadAsyncInternal<T>(string[] ids, string[] includes)
{
    var multiLoadOperation = new MultiLoadOperation(this,AsyncDatabaseCommands.DisableAllCaching, ids, includes);
    return LoadAsyncInternal<T>(ids, includes, multiLoadOperation);
}

private Task<T[]> LoadAsyncInternal<T>(string[] ids, string[] includes, MultiLoadOperation multiLoadOperation)
{
    using (multiLoadOperation.EnterMultiLoadContext())
    {
        return AsyncDatabaseCommands.MultiGetAsync(ids, includes)
            .ContinueWith(t =>
            {
                if (multiLoadOperation.SetResult(t.Result) == false)
                    return Task.Factory.StartNew(() => multiLoadOperation.Complete<T>());
                return LoadAsyncInternal<T>(ids, includes, multiLoadOperation);
            })
            .Unwrap();
    }
}

Again, it isn’t trivial, but at least the core stuff, the actual logic that isn’t related to how we execute the code is shared.

Challenge: Modifying execution approaches

In RavenDB, we had this piece of code:

internal T[] LoadInternal<T>(string[] ids, string[] includes)
        {
            if(ids.Length == 0)
                return new T[0];

            IncrementRequestCount();
            Debug.WriteLine(string.Format("Bulk loading ids [{0}] from {1}", string.Join(", ", ids), StoreIdentifier));
            MultiLoadResult multiLoadResult;
            JsonDocument[] includeResults;
            JsonDocument[] results;
#if !SILVERLIGHT
            var sp = Stopwatch.StartNew();
#else
            var startTime = DateTime.Now;
#endif
            bool firstRequest = true;
            do
            {
                IDisposable disposable = null;
                if (firstRequest == false) // if this is a repeated request, we mustn't use the cached result, but have to re-query the server
                    disposable = DatabaseCommands.DisableAllCaching();
                using (disposable)
                    multiLoadResult = DatabaseCommands.Get(ids, includes);

                firstRequest = false;
                includeResults = SerializationHelper.RavenJObjectsToJsonDocuments(multiLoadResult.Includes).ToArray();
                results = SerializationHelper.RavenJObjectsToJsonDocuments(multiLoadResult.Results).ToArray();
            } while (
                AllowNonAuthoritiveInformation == false &&
                results.Any(x => x.NonAuthoritiveInformation ?? false) &&
#if !SILVERLIGHT
                sp.Elapsed < NonAuthoritiveInformationTimeout
#else 
                (DateTime.Now - startTime) < NonAuthoritiveInformationTimeout
#endif
                );

            foreach (var include in includeResults)
            {
                TrackEntity<object>(include);
            }

            return results
                .Select(TrackEntity<T>)
                .ToArray();
        }

And we needed to take this same piece of code and execute it in:

  • Async fashion
  • As part of a batch of queries (sending multiple requests to RavenDB in a single HTTP call).

Everything else is the same, but in each case the marked line is completely different.

When we had only one additional option, I choose the direct approach, and implement it using;

public Task<T[]> LoadAsync<T>(string[] ids)
{
    IncrementRequestCount();
    return AsyncDatabaseCommands.MultiGetAsync(ids)
        .ContinueWith(task => task.Result.Select(TrackEntity<T>).ToArray());
}

You might notice a few differences between those approaches. The implementation behave, most of the time, the same, but all the behavior for edge cases is wrong. The reason for that, by the way, is that initially the Load and LoadAsync impl was functionality the same, but the Load behavior kept getting more sophisticated, and I kept forgetting to also update the LoadAsync behavior.

When I started building support for batches, this really stumped me. The last thing that I wanted to do is to either try to maintain complex logic in three different location or have different behaviors depending if you were using a direct call, a batch or async call. Just trying to document that gave me a headache.

How would you approach solving this problem?

Caching, the funny way

One of the most frustrating things in working with RavenDB is that the client API is compatible with the 3.5 framework. That means that for a lot of things we either have to use conditional compilation or we have to forgo using the new stuff in 4.0.

Case in point, we have the following issue:

image

The code in question currently looks like this:

image

This is the sort of code that simply begs to be used with ConcurrentDictionary. Unfortunately, we can’t use that here, because of the 3.5 limitation. Instead, I went with the usual, non thread safe, dictionary approach. I wanted to avoid locking, so I ended up with:

image

Pretty neat, even if I say so myself. The fun part that without any locking, this is completely thread safe. The field itself is initialized to an empty dictionary in the constructor, of course, but that is the only thing that is happening outside this method. For that matter, I didn’t even bother to make the field volatile. The only thing that this relies on is that pointer writes are atomic.

How comes this works, and what assumptions am I making that makes this thing possible?

Rhino Mocks Challenge: Implement This Feature

Okay, let us see if this approach works...

Here is a description of a feature that I would like to have in Rhino Mocks (modeled after a new feature in Type Mock). I don't consider this a complicated feature, and I would like to get more involvement from the community in building Rhino Mocks (see the list of all the people that helped get Rhino Mocks 3.5 out the door).

The feature is fluent mocks. The idea is that this code should work:

var mockService = MockRespository.GenerateMock<IMyService>();
Expect.Call( mockService.Identity.Name ).Return("foo");

Assert.AreEqual("foo", mockService.Identity.Name);

Where identity is an interface.

The best place to capture such semantics is in the RecordMockState.

Have fun, and send me the patch :-)

Challenge: Don't stop with the first DSL abstraction

I was having a discussion today about the way business rules are implemented. And large part of the discussion was focused on trying to get a specific behavior in a specific circumstance. As usual, I am going to use a totally different example, which might not be as brutal in its focus as the real one.

We have a set of business rules that relate to what is going to happen to a customer in certain situations. For example, we might have the following:

upon bounced_check or refused_credit:
	if customer.TotalPurchases > 10000: # preferred
		ask_authorizatin_for_more_credit
	else:
		call_the cops

upon new_order:
	if customer.TotalPurchases > 10000: # preferred
		apply_discount 5.precent
upon order_shipped:
send_marketing_stuff unless customer.RequestedNoSpam

What is this code crying for? Here is a hint, it is not the introduction of IsPreferred, although that would be welcome.

I am interested in hearing what you will have to say in this matter.

And as a total non sequitur, cockroaches at Starbucks, yuck.

System.Reflection.Emit fun: Find the differences

This is annoying, I am trying to make something like this works using SRE:

public void Foo(out T blah)
{
    blah = (T)Arguments[0];
}

I created the SRE code to generate the appropriate values, but it is producing invalid code.

Works, but not verified IL:

    L_0089: stloc.2
    L_008a: ldarg.1
    L_008b: ldloc.2
    L_008c: ldc.i4 0
    L_0091: ldelem.ref
    L_0092: unbox.any !!T
    L_0097: stobj !!T
    L_009c: ret

Works, valid (csc.exe output, of course):

    L_000d: stloc.1
    L_000e: ldarg.1
    L_000f: ldloc.1
    L_0010: ldc.i4.0
    L_0011: ldelem.ref
    L_0012: unbox.any !!T
    L_0017: stobj !!T
    L_001c: ret

And yes, the stloc.2, and stloc.1 are expected.

Challenge: What does this code do?

Without compiling this, can you answer me whatever this piece of code will compile? And if so, what does it do?

var dummyVariable1 = 1;
var dummyVariable2 = 3;
var a = dummyVariable1
+-+-+-+-+ + + + + + +-+-+-+-+-+
dummyVariable2;

Oh, and I want to hear reasons, too.

Challenge: Find the bug fix

Usually I tend to pose bugs as the challenges, and not the bug fixes, but this is an interesting one. Take a look at the following code:

var handles = new List<WaitHandle>();
using (
	var stream = new FileStream(path, FileMode.CreateNew, FileAccess.Write, FileShare.None, 0x1000,
								 FileOptions.Asynchronous))
{
	for (int i = 0; i < 64; i++)
	{
		var handle = new ManualResetEvent(false);
		var bytes = Encoding.UTF8.GetBytes( i + Environment.NewLine);
		stream.BeginWrite(bytes, 0, bytes.Length, delegate(IAsyncResult ar)
		{
			stream.EndWrite(ar);
			handle.Set();
		}, stream);
		handles.Add(handle);
	}
	WaitHandle.WaitAll(handles.ToArray());
	stream.Flush();

}

Now, tell me why I am creating a ManualResetEvent manually, instead of using the one that BeginWrite IAsyncResult will return?

[Unstable code] Why timeouts doesn't mean squat...

Because they aren't helpful for the pathological cases. Let us take this simple example:

[ServiceContract]
public interface IFoo
{
	[OperationContract]
	string GetMessage();
}

var stopwatch = Stopwatch.StartNew();
var channel = ChannelFactory<IFoo>.CreateChannel(
	new BasicHttpBinding
	{
		SendTimeout = TimeSpan.FromSeconds(1), 
		ReceiveTimeout = TimeSpan.FromSeconds(1),
		OpenTimeout = TimeSpan.FromSeconds(1),
                CloseTimeout = TimeSpan.FromSeconds(1)
	},
	new EndpointAddress("http://localhost:6547/bar"));

var message = channel.GetMessage();

stopwatch.Stop();
Console.WriteLine("Got message in {0}ms", stopwatch.ElapsedMilliseconds);

On the face of it, it looks like we are safe from the point of view of timeouts, right? We set all the timeout settings that are there. At most, we will spend a second waiting for the message, and get a time out exception if we fail there.

Here is a simple way to make this code hang for a minute (more after the code):

namespace ConsoleApplication1
{
	using System;
	using System.Linq;
	using System.Diagnostics;
	using System.IO;
	using System.Net;
	using System.ServiceModel;
	using System.Threading;

	class Program
	{
		static void Main(string[] args)
		{
			var host = new ServiceHost(typeof(FooImpl), 
				new Uri("http://localhost/foo"));
			host.AddServiceEndpoint(typeof(IFoo), 
				new BasicHttpBinding(), 
				new Uri("http://localhost/foo"));
			host.Open();

			new SlowFirewall();

			var stopwatch = Stopwatch.StartNew();
			var channel = ChannelFactory<IFoo>.CreateChannel(
				new BasicHttpBinding
				{
					SendTimeout = TimeSpan.FromSeconds(1), 
					ReceiveTimeout = TimeSpan.FromSeconds(1),
					OpenTimeout = TimeSpan.FromSeconds(1),
                 			CloseTimeout = TimeSpan.FromSeconds(1)
				},
				new EndpointAddress("http://localhost:6547/bar"));
			
			var message = channel.GetMessage();
			
			stopwatch.Stop();
			Console.WriteLine("Got message in {0}ms", stopwatch.ElapsedMilliseconds);


			host.Close();
		}
	}

	[ServiceContract]
	public interface IFoo
	{
		[OperationContract]
		string GetMessage();
	}

	public class FooImpl : IFoo
	{
		public string GetMessage()
		{
			return new string('*', 5000);
		}
	}

	public class SlowFirewall
	{
		private readonly HttpListener listener;

		public SlowFirewall()
		{
			listener = new HttpListener();
			listener.Prefixes.Add("http://localhost:6547/bar/");
			listener.Start();
			listener.BeginGetContext(OnGetContext, null);
		}

		private void OnGetContext(IAsyncResult ar)
		{
			var context = listener.EndGetContext(ar);
			var request = WebRequest.Create("http://localhost/foo");
			request.Method = context.Request.HttpMethod;
			request.ContentType = context.Request.ContentType;
			var specialHeaders = new[] { "Connection", "Content-Length", 
"Host", "Content-Type", "Expect" }; foreach (string header in context.Request.Headers) { if (specialHeaders.Contains(header)) continue; request.Headers[header] = context.Request.Headers[header]; } var buffer = new byte[context.Request.ContentLength64]; ReadAll(buffer, context.Request.InputStream); using (var stream = request.GetRequestStream()) { stream.Write(buffer, 0, buffer.Length); } using (var response = request.GetResponse()) using (var responseStream = response.GetResponseStream()) { buffer = new byte[response.ContentLength]; ReadAll(buffer, responseStream); foreach (string header in response.Headers) { if (specialHeaders.Contains(header)) continue; context.Response.Headers[header] = response.Headers[header]; } context.Response.ContentType = response.ContentType; int i = 0; foreach (var b in buffer) { context.Response.OutputStream.WriteByte(b); context.Response.OutputStream.Flush(); Thread.Sleep(10); Console.WriteLine(i++); } context.Response.Close(); } } private void ReadAll(byte[] buffer, Stream stream) { int current = 0; while (current < buffer.Length) { int read = stream.Read(buffer, current, buffer.Length - current); current += read; } } } }

This problem means that even supposedly safe code, which has taken care of specifying timeouts properly is not safe from blocking because of network issues. Exactly the thing we specified the timeouts to avoid. I should note that this sample code is still at a very high level. There is a lot of things that you can do at all levels of the network stack to play havoc with your code.

As an aside, what book am I re-reading?

[Unstable code] So you think you are safe...

There is some interesting discussion on my previous post about unstable code.

I thought that it would be good to give a concrete example of the issue. Given the following interface & client code, is there a way to make this code block for a long time?

[ServiceContract]
public interface IFoo
{
	[OperationContract]
	string GetMessage();
}

var stopwatch = Stopwatch.StartNew();
var channel = ChannelFactory<IFoo>.CreateChannel(
	new BasicHttpBinding
	{
		SendTimeout = TimeSpan.FromSeconds(1), 
		ReceiveTimeout = TimeSpan.FromSeconds(1),
		OpenTimeout = TimeSpan.FromSeconds(1),
                CloseTimeout = TimeSpan.FromSeconds(1)
	},
	new EndpointAddress("http://localhost:6547/bar"));

var message = channel.GetMessage();

stopwatch.Stop();
Console.WriteLine("Got message in {0}ms", stopwatch.ElapsedMilliseconds);

You are free to play around with the server implementation as well as the network topology.

Have fun....

Challenge: What is wrong with this code

Let us assume that we have the following piece of code. It has a big problem in it. The kind of problem that you get called at 2 AM to solve.

Can you find it? (more below)

public static void Main()
{
	while(true)
	{
		srv.ProcessMessages();
		Thread.Sleep(5000);
	}
}

public void ProcessMessages()
{
	try
	{
	   var msgs = GetMessages();
	   byte[] data = Serialize(msgs);
	   var req =  WebRequest.Create("http://some.remote.server");
	   req.Method = "PUT";
	   using(var stream = req.GetRequestStream())
	   {
		   stream.Write(data,0,data.Length);
	   }
	   var resp = req.GetResponse();
	   resp.Close();// we only care that no exception was thrown
	   
	   MarkMessagesAsHandled(msgs); // assume this can't throw 
	}
	catch(Exception)
	{
		// bummer, but never mind,
		// we will get it the next time that ProcessMessages 
		// is called
	}
}

public Message[] GetMessages()
{
    List<Message> msgs = new List<Message>();
    using(var reader = ExecuteReader("SELECT * FROM Messages WHERE Handled = 0;"))
    while(reader.Read())
    {
        msgs.Add( HydrateMessage(reader) );
    }
    return msgs.ToArray();
}

This code is conceptual, just to make the point. It is not real code. Things that you don't have to worry about:

  • Multi threading
  • Transactions
  • Failed database

The problem is both a bit subtle and horrifying. And just to make things interesting, for most scenarios, it will work just fine.

Challenge: why did the tests fail?

For a few days, some (~4) of SvnBridge integration tests would fail. Not always the same ones (but usually the same group), and only if I run them all as a group, never if I run each test individually, or if I run the entire test class (which rules out most of the test dependencies that causes this). This was incredibly annoying, but several attempts to track down this issue has been less than successful.

Today I got annoyed enough to say that I am not leaving until I solve this. Considering that a full test run of all SvnBridge's tests is... lengthy, that took a while, but I finally tracked down what was going on. The fault was with this method:

protected string Svn(string command)
{
	StringBuilder output = new StringBuilder();
	string err = null;
	ExecuteInternal(command, delegate(Process svn)
	{
		ThreadPool.QueueUserWorkItem(delegate

		{
			err = svn.StandardError.ReadToEnd();
		});
		ThreadPool.QueueUserWorkItem(delegate
		{
			string line;
			while ((line = svn.StandardOutput.ReadLine()) != null)
			{
				Console.WriteLine(line);
				output.AppendLine(line);
			}
		});
	});
	if (string.IsNullOrEmpty(err) == false)
	{
		throw new InvalidOperationException("Failed to execute command: " + err);
	}
	return output.ToString();
}

This will execute svn.exe and gather its input. Only sometimes it would not do so.

I fixed it by changing the implementation to:

protected static string Svn(string command)
{
	var output = new StringBuilder();
	var err = new StringBuilder();
	var readFromStdError = new Thread(prc =>
	{
		string line;
		while ((line = ((Process)prc).StandardError.ReadLine()) != null)
		{
			Console.WriteLine(line);
			err.AppendLine(line);
		}
	});
	var readFromStdOut = new Thread(prc =>
	{
		string line;
		while ((line = ((Process) prc).StandardOutput.ReadLine()) != null)
		{
			Console.WriteLine(line);
			output.AppendLine(line);
		}
	});
	ExecuteInternal(command, svn =>
	{
		readFromStdError.Start(svn);
		readFromStdOut.Start(svn);
	});

	readFromStdError.Join();
	readFromStdOut.Join();

	if (err.Length!=0)
	{
		throw new InvalidOperationException("Failed to execute command: " + err);
	}
	return output.ToString();
}

And that fixed the problem. What was the problem?

Challenge: calling generics without the generic type

Assume that I have the following interface:

public interface IMessageHandler<T> where T : AbstractMessage
{
	void Handle(T msg);
}

How would you write this method so dispatching a message doesn't require reflection every time:

public void Dispatch(AbstractMessage msg)
{
	IMessageHandler<msg.GetType()> handler = new MyHandler<msg.GetType()>();
	handler.Handle(msg);
}

Note that you can use reflection the first time you encounter a message of a particular type, but not in any subsequent calls.

Challenge: The directory tree

Since people seems to really enjoy posts like this, here is another one. This time it is an interesting issue that I dealt with today.

Given a set of versioned file, you need to cache them locally. Note that IPersistentCache semantics means that if you put a value in it, is is always going to remain there.

Here is the skeleton:

public interface IPersistentCache
{
	void Set(string key, params string[] items);
	string[] Get(string key);
}

public enum Recursion
{
	None,
	OneLevel,
	Full
}

public class VersionedFile
{
	public int Version;
	public string Name;
}

public class FileSystemCache : IFileSystemCache 
{
	IPersistentCache cache;
	
	public FileSystemCache(IPersistentCache cahce)
	{
		this.cache = cache;
	}
	
	public void Add(VersionedFile[] files)
	{
		// to do
	}

	public string[] ListFilesAndFolders(string root, int version, Recursion recursion)
	{
		// to do
	}
	
}

How would you implement this? Note that your only allowed external dependency is the ICache interface.

The usage is something like this:

// given 
var fsc = new FileSystemCache(cache);
fsc.Add(new []
{
      new VersionFile{Version = 1, Name = "/"},
      new VersionFile{Version = 1, Name = "/foo"},
      new VersionFile{Version = 1, Name = "/foo/bar"},
      new VersionFile{Version = 1, Name = "/foo/bar/text.txt"},
});
fsc.Add(new []
{

      new VersionFile{Version = 2, Name = "/"},
      new VersionFile{Version = 2, Name = "/foo"},
      new VersionFile{Version = 2, Name = "/foo/bar"},
      new VersionFile{Version = 2, Name = "/foo/bar/text.txt"},
      new VersionFile{Version = 2, Name = "/test.txt"},
});

// then 
fsc.ListFilesAndFolders("/", 1, Recursion.None) == { "/" } 

fsc.ListFilesAndFolders("/", 1, Recursion.OneLevel) == { "/", "/foo", } 

fsc.ListFilesAndFolders("/", 2, Recursion.OneLevel) == { "/", "/foo", "test.txt" } 

fsc.ListFilesAndFolders("/", 1, Recursion.Full) == { "/", "/foo", "/foo/bar", "/foo/bar/text.txt"}  

You can assume that all paths are '/' separated and they always starts with '/'.

A challenge: Getting a list of products

A few days ago I posted about two phase tests, I have been thinking about this lately, and decided that I have a good example for this, which also demonstrate some important design decisions.

The task is listing the first 10 products that we can sell to a customer. The UI is console application, and the database design and data access method are whatever you want.

That is pretty easy, right?

Expected input is:

/dev/product_listing/bin> list_products

Expected output is:

Milk                  $1.0
Bread               $1.3
Sausage            $2.5
Horror Movie    $5.0

Next requirement is that given the following input:

/dev/product_listing/bin> list_products -pg13

The output is:

Milk                  $1.0
Bread               $1.3
Sausage            $2.5

Next requirement is that given the following input:

/dev/product_listing/bin> list_products -vegetarian

Expected output is:

Milk                  $1.0
Bread               $1.3
Horror Movie    $5.0

Additional requirements of this type will follow, and they can be combined. That is, we can also have:

/dev/product_listing/bin> list_products -pg13 -vegetarian

Expected output is:

Milk                  $1.0
Bread               $1.3

How would you solve this?

Deterministic Disposable

Here is a challenge, get this to work:

///<summary>
/// Executes the given handler when the instance is disposed
/// of using the Dispose(instance);
/// Note: Doesn't cause memory leak
///</summary>
public void OnDisposable(object instance, Action<object> action);

///<summary>
/// Executes the previously registered Action for this
/// instance
///</summary>
public void Dispose(object instance);

The key part here is to get this to work without causing a memory leak. Furthermore, assume that you need to handle this scenario as well without causing a leak:

object instance = new object();
OnDisposable(instance, delegate(object obj)
{
    Console.WriteLine("Disposing of {0}", obj);
});