Ayende @ Rahien

Refunds available at head office

Extreme Patterns Video

I was asked a few times about recording a course, so I think that a few people would be happy to know that Glenn Block has posted a discussions that we had about a month ago.

You can find it here

I don't like distributed source control

Let us see how much of a furor that will cause :-)

The reasons that I don't like distributed source control systems have nothing to do with technical reasons, and everything to do with how they are used and promoted.

The main reason that I don't like them is that they encourage isolated work. I don't want anyone in my project to just go off and work on their own for a few weeks, then come back with a huge changeset that now needs to be merged. The way DVCS are promoted, this is a core scenarios, "no one have to see your mistakes".

I completely disagree. I want to see everyone's mistake, and I want them to see mine. There is little to learn from successes, and much to learn from failures. Far worse, I want to be able to peak into other people work, so I can give my input on it, even if it is just "wow, nice", or "yuck, sucks!"

The main advantage of DVCS is speed, trying to browse the repository when you have the entire thing on your HD is lightning fast, compared to doing the same with a remote repository. I tend to use SVK now, but I use it strictly as a local cache, and nothing more.

And that is why I don't like DVCS, I don't want people to work in isolation.

DevTeach Toronto

In about two weeks, I'll be speaking at DevTeach again. The last two times were enough to recharge me for months afterward, and I am looking forward to it.

I am giving several talks there.

Advance IoC - The use of advance means that I get to assume that I don't need to deal with intro stuff, so I am going to try to cram anything from generic specialization to aspect orientation to hierarchical containers and infrastructure ignorant applications. Success metric: people coming out of the talk saying "my head hurts".

OR/M += 2 - OR/M is no longer on the edge, it is a mainstream technology. That said, there isn't much knowledge out there how to take advantage of the non obvious advantages of using an OR/M. I am going to cover multi tenancy, adaptive and partial domain models, approaches for scaling, application design and architecture with OR/M and a bunch more.

DSL - This is a favorite topic of mine, this is actually an introductory talk, covering the range from "why do we even need this" to "how do we build them?"

Rapid (maintainable) web development with MonoRail - I gave that talk a few times already, but I think it is a time to give it a bit of a facelift, and try to do something a bit more impressive. So this is not going to be just "yet another intro to IoC", and it is going to dig deeper into more interesting scenarios.

I am also going to talk in a DotNetRocks panel, about the future of software development. The other parties in the panel are Scott (the Blogless) Bellware and Ted Newart.

Those are my talks, there are a lot of other good ones, and of course, the real reason that DevTeach is so much fun is the interactions with the people

Thoughts about building your own source control

Let me start by stating that you really don't want to do that. This is not something that you want to do, period.

Now that we are over that, I had the chance lately to go fairly deeply into SCM and how they are implemented from two fairly different perspectives. This is a randomly collected set of observations about SCM systems. As usual, the order is arbitrary, and no attempt was made to make any coherent idea out of this.

  • It is all about the client. The client in an SCM system has significant responsibilities. It is in charge of reporting the client state, managing all the errors hat the user can cause, and shoulder a lot of the burden.
  • It is all about the protocol. Anyone who designs a SCM system should be given a lousy DSL line with disconnects every 15 minutes. Oh, and they should also have to work on a plane a lot.
  • On the wire, it is all so simple. It is really surprising to see how the SCM complexity is really just a lot of tiny, easy to handle, details.
  • The devil is in the details, though.
  • Complexities on the server side:
    • Space management - do you save the diff or the whole file?
    • If just the diff, how do you construct an arbitrary version
    • Keeping history around for branches and copies
    • Cheap copies

  • Complexities on the client side:
    • Do you have one version on the client, per the working copy?
    • Do you have multiple versions, one per each file?
    • Handling inconsistencies between server version, working copy version and original version.

  • What do you optimize for? Bandwidth? Roundtrips?
    • I know of one SCM product who is lousy optimizer for both

  • Distributed SCM can be handled on top of centralized SCM.
  • It is not hard at all, except for all the details.
  • Don't write your own SCM.
  • Trust matters, and you really don't want to be in the situation where you don't trust your SCM.
  • Remember that SCM is temporal, you can go backward in time, and even sideway, to a branch.
  • There are only three types of operations in SCM:
    • Generate a change set between two paths at two versions
    • Apply a diff to a path, generating a new version
    • Reporting (logs, mostly, and outputting various formats of a changeset)
Overall, it is very simple endeavor. It gets complex when you start talking beyond the wire protocol. As a simple example, how much does it cost you to branch? How much does it cost you to find out if there has been any changes to the working copy?
The other major issue is: How do you ensure that it is reliable?
Now, let me repeat myself, do not write your own source control!

Customer experience to treasure

I just finished talking with Apple Store's support. Due to a misunderstanding on my part, I ended up buying two iWorks licenses, instead of one. I called the store, got a human in less than a minute, explained the issue, waited a bit while she fixed things up so I get a refund for the extra license, done.

I wish it was all like this.

Critical Mass

When a software project is started, there is a lot of new functionality to be written. Most of the time you are dealing with new code. After a while, if it is done right, you have a solid foundation that you can use to keep building the application.

Whenever I recognize this moment, it feels me with deep sense of satisfaction, this is how it should be. Rhino Mocks has reached that point some years ago, and I noticed that SvnBridge has crossed that point as well recently.

As telltales for this, I use the amount of effort it takes to build a new feature into the project. Take a look wt what was required to add svn sync support to SvnBridge, or the AAA support for Rhino Mocks.

Source control practices to be abolished: The Large Checkin

I have just had to make a modification to SvnBridge because someone decided to checkin a change set that included 1,695 changes..

Even ignoring the issue of "your checkin broke my source control" mentality that I have now, this is bad from other perspective.

How do you review such a change? How do you even know what is going on there? In changes of this size, you aren't making a single change, you are making a lot of changes. Now you can't mix & match them either.

Prefer small checkins, they are easier to deal with.

On API Design

Krzysztof points out that Rhino Mocks has 26(!) methods to create various mocks.

I was very surprised to hear that, so I took a look, and he is right. The problem is how you think about those. Those 26 methods are only applicable when you count each overload as a separate method. Without counting the overloads, we have 10 methods. Then we have to consider options. In Rhino Mocks, you tend to have the following options: Create mock, create mock with remoting, create mock from multiple types.

That works to reduce the amount of options that we have down to about 4 different types of mocks that Rhino Mocks can generate. And you know what. Almost all the time, you don't need to use them. You only have to use DynamicMock and that is it.

Krzysztof suggestion would consolidate all the options to a far smaller surface area, but it would have several major issues:

  • Break backward compatibility - I am wiling to do that, but not without due consideration.
  • Break the documentation - See previous point.
  • Force the user to deal with all the options explicitly - "In your face!" API design, from my perspective. That would lead people to create utility methods over it, which gets us right back to all the method options.

Funding Open Source Projects

That was the topic under discussion in the first ALT.Net talks today. There weren't that many people at the talk, but it was very focused and useful.

In general, there aren't that many ways of funding OSS projects. Note that I am talking here from the perspective of the developers who does the actual work, and how they get compensated for their time and effort. This exclude reasons such as scratching an itch, or as a hobby.

  • The OSS work is useful for the day-to-day work of the developer. This is by far the more common model in the .Net world. Most OSS developers tend to do so because supporting the project supports whatever they actually trying to do.
  • Reputation building - being part of OSS project usually means that you get a good reputation as a result. This can be useful in getting a job, as an example.
  • Contracting / training / support. This seems to be a very common model in the Java world. There are only a few projects in .Net that are working in this approach.
  • Donations - nice in theory, doesn't work in practice.
  • Grants - someone who needs a feature pays for it being included. I had several leads in the past, but nothing that ever got to actual money exchanging hands.
  • Work for hire - Some entity that hire a developer to work on an OSS project. SvnBridge is a good example of that. The difference between this and the previous entry is that this is not necessarily something that the dev was initially involved at.

The session was focused on two subjects, how we can increase awareness, and how we can fund OSS projects. I think that the ALT.Net community is doing a lot best practices, approaches and techniques and the tools that can be used to support those. We recently had several articles in mainstream media about from members in the ALT.Net community, as a simple example. We can do more, like reaching out to the user groups, talking more about it, doing entry level tutorials, etc. But that is more related to adoption of OSS tools, not to funding them, which was the major topic for the session.

When we are talking about funding OSS projects, we have to make a distinction in what we are talking about. We can fund the OSS project itself, and we can fund OSS developers. I feel that a large part of the session was spent making this distinction.

The major difference is in what you are trying to achieve. I use OSS projects to help me do my work, I don't need them as a source of income. They are a tool. Getting paid to work on them is fun and would be lovely, but tat is not actually something that I spend a lot of time thinking about. A lot of the suggestions that we had at the talk all involved OSS developers making considerable investment in time, money & risk for the goal of furthering OSS development.

Sorry, it doesn't work that way. At least not for me. I am getting paid to write code. Incidentally, at this point in time I am actually getting paid to write OSS code, which I consider as a really nice perk, but not something that is in any way required.

When we are talking about funding OSS projects, I am actually thinking on the other side. Funding the project itself. however, is something that I would like to focus on. The best way I know of actually getting things done is to actually pay for it to be done. Working for free works if and only if the task is fun. If it isn't, it isn't going to happen. If you want something from an OSS project, put your money where your demands are.

You want more documentation for doing X, pay for it. You want feature Y, likewise. And by paying for it, I mean anything from offering money to submitting a patch to adding to the documentation.

It is a very simple concept. And the best one I can think of.

Summing the ALT.Net conf

I was a blast, I had a lot of fun, some incredibly interesting conversations, and got to meet a lot of the members of the community that I have only knew by email alias before.

We had some really good discussions today, and I got to clarify some thoughts that have been luring in the back of my mind for a while now. After the ALT.Net conf officially ended, we started hanging around, swapping stories. Somehow it got to 8 PM, I am not sure how. Roy took a bunch of us that remained to dinner at a nice place. I haven't had a drop of alcohol today (unlike the entire last week), but I am feeling more drunk than on any other day.

I would like to take this opportunity to thank TypeMock for sponsoring the dinner. It was wonderful, although I have trouble walking and / or thinking.

Doing MVP Summit and ALT.Net, back to back, is a real tiring experience.

That said, got a lot of stuff to think of, and will probably generate quite a few blog posts.

Thanks for all those who attended, for creating such a rich discussion.

Special thanks to the sponsors, Version One, ThoughtWorks, Microsoft P&P, DigiPen, DevTeach, InfoQ and CodeBetter!

Tags:

Published at

Rhino Mocks Futures

I have been thinking about this for a while now, and I am getting ready to release Rhino Mocks 3.5. The major feature is taking advantage on C# 3.0 language features. This allow some really interesting experimentation.

var mockedSmsSender = mocks.ToBeNamedMocked<ISmsSender>();
var mockedRepository = mocks.ToBeNamedMocked<IUserRepository>();
mockedRepository.Stub( x=> x.GetUserByName("ayende") ).Return( new User{Name="ayende", Pass="1234"});

new LoginController(mockedSmsSender, mockedRepository ).ForgotYourPassword("ayende");

mockedSmsSender.Verify( x => x.Send("ayende, your pass is 1234");

A few things to note about this code.

  • No explicit record / replay model
  • Arrange Act Assert model
  • You can setup return values by using Stub()
  • You can setup expectations using Expect(), with the same syntax
  • More complex verification is also possible.
  • I don't know what to call this new mode

As an aside, I am deprecating CreateMock in favor of StrictMock. Using strict mocks by default was a bad design decision on my part.

Thoughts?

The ALT.Net Conf

I think that yesterday was absolutely wonderful. It did feel like I spent about 5 hours straight talking, however. The DSL talk was interesting, and gave me much to think of (and might actually kick start the book again!), great fun.

I think that my liver died at some point last night. MVP Summit + ALT.Net are not friendly to it. No time to recover.

What do you want to learn?

I am the states right now, and I am thinking about giving a course in how to build a DSL or Castle or NHibernate, depending on what people want.

Right now it is likely to be in Austin or thereabout, probably at the beginning of May. No promises, and strictly based on whatever there would be enough participants.

Would you like that?

An apology

Note: If you don't know about what I am talking about, please ignore this post.

A few days ago I was in a meeting with a Microsoft team, showing a few of the MVP a new product. I was there to give feedback to the team, and I am afraid that I did no such thing. In fact, I blew up. I acted in a completely unprofessional way, and I certainly did nothing productive. I was about as unproductive as possible, and offensive to boot.

To the guys in the team, I am sorry. No excuses, I am supposed to be better than that. I certainly ought to be able to make a difference between the product and the team that develop it. I should not have tried to take my frustrations on you.

I apologize again, and deeply regret this outburst.

Source control is not a feature you can postpone to vNext

I was taking part in a session in the MVP Summit today, and I came out of it absolutely shocked and bitterly disappointed with the product that was under discussion. I am not sure if I can talk about that or not, so we will skip the name and the purpose. I have several issues with the product itself and its vision, but that is beside the point that I am trying to make now.

What really bothered me is utter ignorance of a critical requirement from Microsoft, who is supposed to know what they are doing with software development. That requirement is source control.

  • Source control is not a feature
  • Source control is a mandatory requirement

The main issue is that the product uses XML files as its serialization format. Those files are not meant for human consumption, but should be used only through a tool. The major problem here is that no one took source control into consideration when designing those XML files, so they are unmergable.

Let me give you a simple scenario:

  • Developer A makes a change using the tool, let us say that he is modifying an attribute on an object.
  • Developer B makes a change using the tool, let us say tat he is modifying a different attribute on a different object.

The result?

Whoever tries to commit last will get an error, the file was already updated by another guy. Usually in such situations you simply merge the two versions together, and don't worry about this.

The problem is that this XML file is implemented in such a way that each time you save it, a whole bunch of stuff gets moved around, all sort of unrelated things change, etc. In short, even a very minor change cause a significant change in the underlying XML.

You can see this in products that are shipping today, like SSIS, WF, DSL Toolkit, etc.

The problem is that when you try to merge, you have too many unrelated changes, which completely defeat the purpose of merging.

This, in turn, means that you lose the ability to work in a team environment. This product is supposed to be aimed at big companies. But it can't suppose a team of more than one! To make things worse, when I brought up this issue, the answer was something along the line: "Yes, we know about this issue, but you can avoid this using exclusive checkouts."

At that point, I am not really sure what to say. Merges happen not just when two developers modify the same file, merges also happen when you have branches. As a simple scenario, I have a development branch and a production branch. Fixing a bug in the production branch requires touching this XML file. But if I made any change to it on the development branch, you can't merge that. What happen if I use a feature branch? Or version branches?

Not considering the implications of something as basic as source control is amateurish in the extreme. Repeating the same mistake, over and over again, across multiple products, despite customer feedback on how awful this is and how much it hurt the developers who are going to use it shows contempt to the end developers, and a sign of even more serious issue: the team isn't dogfooding the product. Not in any real capacity. Otherwise, they would have already noticed the issue, much sooner in the lifetime of the product, with enough time to actually fix that.

As it was, I was told that there is nothing to do for the v1 release, that puts the fix (at best) in two years or so. For something that is a complete deal breaker for any serious development.

I have run into issues where merge issues with SSIS caused us to have to drop days of work and having to recreating everything from scratch, costing us something in the order of two weeks. I know of people that had the same issue with WF, and from my experiments, the DSL toolkit has had the exact same issue. The SSIS issues were initially reported on 2005, but are not going to be fixed for the 2008 (or so I heard from public sources) , which puts the nearest fix for something as basic as getting source control right in a 2-3 years time.

The same for the product that I am talking about here. I am not really getting that, does Microsoft things that source control isn't an important issue? They keep repeating this critically serious mistake!

For me, this is unprofessional behavior of the first degree.

Deal breaker, buh-bye.

Paying for Hibenrating Rhinos

I have been producing the Hibernating Rhinos screen casts for over a year now, and so far, I have offered them free of charge.

Estimated cost of producing a single screen cast is 3,500$ - 10,000$ each, considering the amount of planning, recording & editing that goes into them.

I am thinking about making the new episodes available for a fee, something in the 10$ - 25$ range.

I would like your opinions in this matter,

~ Ayende

Challenge: calling generics without the generic type

Assume that I have the following interface:

public interface IMessageHandler<T> where T : AbstractMessage
{
	void Handle(T msg);
}

How would you write this method so dispatching a message doesn't require reflection every time:

public void Dispatch(AbstractMessage msg)
{
	IMessageHandler<msg.GetType()> handler = new MyHandler<msg.GetType()>();
	handler.Handle(msg);
}

Note that you can use reflection the first time you encounter a message of a particular type, but not in any subsequent calls.

The cost of abstraction - Part III

Here is another thing to note, computers are fast. I can't tell you how fast because it would take too long. Thinking about micro performance is a losing proposition. Sasha has asked an important question:

Now assume you have a non-functional requirement saying that you must support 1,000,000 messages per second inserted into this queue. Would you still disregard the fact using an interface is a BAD decision?

My answer, yes. The reason for that? Let us take a look at the slowest method I could think of to do in process queue:

public static void Main(string[] args)
{
	Queue<string> queue = new Queue<string>();
	ThreadPool.QueueUserWorkItem(delegate(object state)
	{
		Stopwatch startNew = Stopwatch.StartNew();
		for (int i = 0; i < 100000000; i++)
		{
			lock(queue)
			{
				bool queueEmpty = queue.Count == 0;
				queue.Enqueue("test");
				if(queueEmpty)
					Monitor.Pulse(queue);
			}
		}
		Console.WriteLine("Done publishing in: " + startNew.ElapsedMilliseconds);
	});
	ThreadPool.QueueUserWorkItem(delegate(object state)
	{
		Stopwatch startNew = Stopwatch.StartNew();
		for (int i = 0; i < 100000000; i++)
		{
			lock (queue)
			{
				while (queue.Count == 0) 
					Monitor.Wait(queue);
                                queue.Dequeue();
			}
		}
		Console.WriteLine("Done reading in: " + startNew.ElapsedMilliseconds);
	});
	Console.ReadLine();
}

On my machine, this outputs:

Done publishing in: 23044 (ms)
Done reading in: 26866 (ms)

Which means that it processed one hundred million items in just 26 seconds or so.

This also puts us at close to 4 million messages per second, using the most trivial and slow performing approach possible.

In fact, using a LockFreeQueue, we get significantly worse performance (3 times as slow!). I am not quite sure why, but I don't care. Pure pumping of messages is quick & easy, and scaling to millions of messages a second is trivial.

Yes, dispatching the messages is costly, I'll admit. Changing the reader thread to use:

instance.Handle(queue.Dequeue());

Has dropped the performance to a measly 2 million messages a second. Changing the message handler to use late bound semantics, like this:

IHandler instance = Activator.CreateInstance<Handler>();
instance.Handle(queue.Dequeue());

Would drop the performance down to 300,000 per second. So it is far more interesting to see what you are doing with the message the the actual dispatching.

And yes, here, in the dispatching code, I would probably do additional work. Using GetUninitializedObject() and cached constructor I was able to raise that to 350,000 messages per second.

And yes, this is a flawed benchmark, you need to test this with multiple readers & writers to get a more real understanding what what would actually happen. But initial results are very promising, I think.

And again, computers are fast, don't look for microseconds.

The cost of abstraction - Part II

Sasha pointed out that I should also test what happens when you are using multiply implementation of the interface, vs. direct calls. This is important because of JIT optimizations with regards to interface calls that always resolve to the same instance.

Here is the code:

class Program
{
	public static void Main(string[] args)
	{
		//warm up
		PerformOp(new List<string>(101));
		PerformOp(new NullList());
		List<long> times = new List<long>();

		Stopwatch startNew = new Stopwatch();
		for (int i = 0; i < 50; i++)
		{
			startNew.Start();
			PerformOp(new List<string>(101));
			PerformOp(new NullList());
			times.Add(startNew.ElapsedMilliseconds);
			startNew.Reset();
		}
		Console.WriteLine(times.Average());
	
	}

	private static void PerformOp(List<string> strings)
	{
		for (int i = 0; i < 100000000; i++)
		{
			strings.Add("item");
			if(strings.Count>100)
				strings.Clear();
		}
	}

	private static void PerformOp(NullList strings)
	{
		for (int i = 0; i < 100000000; i++)
		{
			strings.Add("item");
			if (strings.Count > 100)
				strings.Clear();
		}
	}
}

And the results are:

  • IList<T> - 5789 milliseconds
  • List<string>, NullList - 3941 millseconds

Note that this is significantly higher than the previous results, because no we run the results two hundred million times.

Individual method call here costs:

  • IList<T> - 0.0000289489
  • List<string>, NullList - 0.0000197096

We now have a major difference between the two calls. When we had a single impl of the interface, the difference between the two was: 0.00000406 milliseconds.

With multiple implementations, the difference between the two methods is: 0.0000092393 milliseconds. Much higher than before, and still less than a microsecond.

The cost of abstraction

Sasha commented on my perf post, and he mentioned the following:

But what if you're designing a messaging infrastructure for intra-application communication?  It is no longer fine to hide it behind a mockable IQueue interface because that interface call is more expensive than the enqueue operation!

Now, I know that an interface call is more expensive than a non virtual call. The machine needs to do more work. But exactly how much more work is there to do? I decided to find out.

class Program
{
	public static void Main(string[] args)
	{
		//warm up
		PerformOp(new List<string>(101));
		List<long> times = new List<long>();

		Stopwatch startNew = new Stopwatch();
		for (int i = 0; i < 50; i++)
		{
			startNew.Start();
			PerformOp(new List<string>(101));
			times.Add(startNew.ElapsedMilliseconds);
			startNew.Reset();
		}
		Console.WriteLine(times.Average());
	
	}

	private static void PerformOp(IList<string> strings)
	{
		for (int i = 0; i < 100000000; i++)
		{
			strings.Add("item");
			if(strings.Count>100)
				strings.Clear();
		}
	}
}

I run this program with first with PerforOp(List<string>) and then with PerformOp(IList<string>). In the first case, we are doing non virtual method call, and in the second, we are going through the interface.

Running this has produced the following results:

  • IList<T> - 2786 milliseconds
  • List<T> - 2380 milliseconds

Well, that is a significant, right?

Not really. Remember that we run a tight loop of a hundred million times.

Let us see individual call times:

  • IList<T> - 0.0000278600 milliseconds
  • List<T>  - 0.0000238000 milliseconds

The difference between the two is 0.00000406 milliseconds. In other words, it is an insignificant part of a microsecond.

The difference between a non virtual call and an method call is significantly less than a microsecond.

I don't think that I am going to worry about that, thank you very much.

Full speed ahead, and damn the benchmarks

image A while ago I posted about upfront optimizations. Sasha Goldshtein commented (after an interesting email conversation that we had):

What truly boggles me is how the argument for correctness in software is not applied to performance in software.  I don't understand how someone can write unit tests for their yet-unwritten code (as TDD teaches us) and disregard its performance implications, at the same time.  No one in their right mind could possibly say to you, "Let's define and test for correctness later, first I'd like to write some code without thinking about functional requirements."  But on the other hand, how can you say to someone, "Let's define and test for performance later, first I'd like to write some code without thinking about non-functional requirements?"

It fell into my blog backlog (which is a tame black hole) and just now came up.

The reason for this attitude, which I subscribe to, is very simple. A lot of developers will do a lot of really bad things in the name of performance. But I did see, more than once, whole teams dedicate inordinate amount of time to performance, contorting the design so it resembles a snake after a fit of the hiccups and leave an unholy mess in their wake.

Add to that that most of the time performance bottlenecks will not be where you think they would, you get into a really big problem with most developers (myself included, by the way). As a result, the push to "don't think about performance" has very real reasoning behind it. As I mentioned, you should think about performance when you are designing the system, but only in order to ensure that you aren't trying to do stupid things. (Like running a server off a single user embedded database, my latest blunder). Most of the time, and assuming this is not a brand new field, you already know that calling the DB in a tight loop is going to be a problem, so you don't even get into that situation.

Another issue is that I can refactor for performance, but (by definition), I can't refactor to be correct.