Ayende @ Rahien

It's a girl

Kazien Conf Workshops

Yesterday I gave two workshops, Advanced NHibernate and Building DSL with Boo. I finished the day absolutely crushed, but I think they went very well.

Both were recorded, although I am not sure when they will be online.

Ryan Kelley has a blow by blow description of the NHibernate talk, and you can get the code for that here:

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/SampleApplications/ORM+=2/

I'll post the code for the DSL talk shortly afterward.

I got some really positive feedback about the NHibernate Profiler, and I am very interested in demoing that and getting additional feedback when the real conference starts.

Microsoft kills Linq to SQL

In a typical management speak post, the ADO.Net team has killed the Linq to SQL project. I think this is a mistake from Microsoft part. Regardless of how this affects NHibernate (more users to us, yeah!), I think that this is a big mistake

Doing something like this is spitting in the face of everyone who investment time and money in the Linq to SQL framework, only to be left hanging in the wind, with a dead end software and a costly porting process if they ever want to see new features. Linq to SQL is a decent base level OR/M, and I had had several people tell me that they are willing to accept its current deficiencies, knowing that this will be fixed in the next version. Now, there isn't going to be a next version, and that is really bad for Microsoft reputation.

From my point of view, this is going to be an example that I will use whenever someone tries to sell me half baked solutions from Microsoft (just to note, I don't consider Linq to SQL half baked) and tell me to wait for vNext with all the features in the world.

It doesn't matter how I turn this decision, I can't find any way in which it make sense from Microsoft perspective.

NH Prof: Teaser

If you want to learn more, come to my Advanced NHibernate talk tomorrow.

image

This time, this is literally a snapshot of the application as it is running, and it is showing most of the surface level functionality that exists at the moment in the application.

Oh, and all the kudos for the look and feel goes to Christopher and Rob, who make it looks so easy.

Why I hate traveling

Just arrived at the motel after just over 25 hours on the road. Yuck!

Some random thoughts along the way:

  • The US remain at the top of the list of countries with the most obnoxious entry procedure.
  • If the hotel says its address is on main street, I expect to find the bloody entrance on main street, not hidden in some side street.
  • Driving an SUV is fun from the comfort side of things, but a PITA from the point of view of driving, parking or managing them.
  • I like the ability to lose my way to wallmart, get a GPS, and get to the motel.
  • Drug addicts make for really good guides.
  • One way streets in the US are marked in a really confusing manner to me, and I keep driving against the direction of traffic.
  • I am so tired, and on top of jet lag, I start working tomorrow morning.

Visual Studio 2010

I got the chance to get an early CTP of Visual Studio 2010. 

image

This is the post I use to record my first impressions. There is no order to this post, it is just impressions jotted down as I see them.

We seem to have a new start page:

image

Following the MS & OSS new approach, one of the samples is Dinner Now using Lucene, which is the first project that I found to test.

TFS is still broken:

image

I really don't like to see this kind of issues in a source control system. It means that it cannot be trusted.

image

Looks like we have something new here. On first impression, it looks like we have UML integrated into VS.

image

 

I took a look at the generated XML, which is the backing store for the diagrams, and it looks like it should work with source control much better than the usual modeling stuff in visual studio.

Another feature that is very welcome for anyone doing presentations is the use of CTRL+Scroll Wheel for zooming.

image

We are also promised performance improvements for large files, which is nice. Part of the walkthroughs talk about integrating functionality using MEF, which is good.

Looking at the walkthrough for creating syntax highlighting, tagging and intellisense, it looks like a lot of ceremony still, but it seems significantly easier than before.

WPF - It looks like VS is moving to WPF, although this CTP is still midway.

C# has dynamic variables!

dynamic doc = HtmlPage.Document.AsDynamic();

dynamic win = HtmlPage.Window.AsDynamic();

This was talked about in the MVP Summit, a dynamic object is an object that implements IDynamicObject:

image

Note that we accept an expression parameter (using Linq expressions) and we return a meta object. Show below.

image

This looks like C# + DLR integration, which is cool. I am looking forward to see what we can do with it.

VS also get some R# like features:

image

There is also a quick search, apparently, but I am not really impressed. Again, show me something that I don't have.

There is CLR 4.0, so we somehow skipped CLR 3.0. I am glad to know that we have a new runtime version, instead of just patching the 2.0 very slowly.

Threading

System.Threading.Tasks is new, and looks to be very interesting. It also seem to have integration with Visual Studio. It is also interesting because we seem to have a lot more control over that than we traditionally had in the ThreadPoll.

Parallel extensions are also in as part of the framework, not that this would be a big surprise to anyone.

In the CTP that I have, there is nothing about Oslo, models or DSL, which I found disappointing. I guess I'll have to wait a bit more to figure out what is going on.

That was a quick review, and I must admit that I haven't dug deep, but the most important IDE feature, from my perspective, is the CTRL+Scroll wheel zooming. The diagrams support is nice, but I am not sure that I like it in my IDE. Threading enhancements are going to be cool, and I am looking forward to seeing what kind of dynamic meta programming we can do with it.

Every DSL ends up being Smalltalk

I had this though in my head for a while now. I built an IDE for a DSL, and somewhere toward the end of the first revision I understood something very interesting. My IDE wasn't actually using the textual representation of the language. The scripts that the user was editing were actually live instances, and they were fully capable of validating and saving themselves.

The IDE worked with those instances, used them to do its operations, and allowed to edit them on the fly. It was quite challenging to do, I must say, and I kept thinking about the image model of smalltalk, where everything is a live instance.

This brings to mind Greenspan's tenth rule, which state: Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Yet another good interview test

Yes, this is another challenge that I ran into which I consider well suited for an interview.

  • It is short
  • It doesn't require specific knowledge
  • There are a lot of ways of solving that
  • I can give the develop access to Google and the test is still valid

The test itself is very simple:

  • Detect if another instance of the application is running on the network which is registered to the same user
  • It doesn't have to be hack proof, and it doesn't have to be 100% complete. The purpose it to stop casual copying, not serious hackers.
    • A great example of the feature in action is R# detecting that two users are using the same license at the same time.

Oh, and for real world scenarios, use a licensing framework instead of rolling your own.

NH Prof: How to detect SELECT N + 1

One of the things that the NHibernate Profiler is going to do is to inspect your NHibernate usage and suggest improvements to them.

Since I consider this to be a pretty important capability, I wanted to stream line the process as much as possible.

Here is how I detect this now:

image

It is not perfect, but it is pretty close.

A messaging problem: Order in the bus

In NH Prof, I have structured the application around the idea of message passing. I am not yet in the erlang world (which requires a framework that would keep hold of the state), but I am nearer than before.

The back end of the profiler is listening to the event streams generated by NHibernate. We get things like:

  • Session opened on thread #1
  • SQL Executed on thread #1
    • SELECT * FROM Customers
  • Session closed on thread #1

I am taking that and turning it into something that is more easily understandable.

Now, as you can imagine, order is pretty important here.

Currently I am dealing with this by assuming order in the stream, and ensuring that the bus will dispatch messages in order (and that recursive message dispatch is handled in place). This works, but I don't think that I like it much. Especially in light of the previous problem that I outlined.

Another option would be to avoid the order assumption, and use the timestamp in order to reorder the stream. That might be a challenge to solve, though.

Thoughts?

NHProf: Logging interception

One of the goals that I set for myself with the NHibernate Profiler is to be able to run on unmodified NHibernate 2.0. The way that I do that is by intercepting and parsing the log stream from NHibernate.

NHibernate logging is extremely rich and detailed, so anything I wanted to do so far was possible. I am pretty sure that there would come a time when a feature would require more invasive approaches, running profiler code in the client application to gather more information, but for now this is enough.

I did run into several problems with logging interception. Ideally, I want this to happen on the fly, as we go. So I really want to get the real time logging stream. The problem is how to do so. I started with the UdpAppender, but that doesn't work on Vista in the released version. Remoting Appender is what I am using now, but it has one critical issue, it is an async appender, so message can (and do) appear out of order.

The message order is pretty important to the profiler. It can deal with that, but it would lead to surprising results. So that one is out as well.

The only other appender that comes out of the box with log4net and can be used remotedly is the telnet appender, which is next on the list for exploring. It does mean that the profiler has to connect to the application, rather than the other way around, which can be a problem.

I built an appender that fit my needs, and I am using it now to test how the profiler works, but before starting to deal with the telnet appender, I thought it would be a good time to ask.

How important is "running on unmodified NHibernate" is?

I am not talking about having a profiler build of NHibernate, I am talking about doing things like using the profiler appender, or registering an HttpModule.

When select IS broken (or just slow)

Usually, "select" isn't broken is a good motto to follow. Occasionally, there are cases where this is case. In particular, it may not be that it is broken, it may very well be that the way it works doesn't match the things that we need it to do.

I spoke about an optimization story that happened recently, in which we managed to reduce the average time from 5 - 10 seconds to 5 - 15 milliseconds.

What we needed was to walk a tree structure, which was stored in a database, and do various interesting tree based operations on it. The most natural way of working with trees is with recursion, and SQL is just not the right way of dealing with it.

Deciding to load the entire table to memory, build a real tree structure and perform all the operations on that tree structure has paid off tremendously. What is important to remember is that we hadn't had to do anything radical to the data model or the way the application worked. We only had to modify the implementation of the component that exposed that tree to the application.

One of the things that we had to deal was the case where the amount of data we have to deal with would exceed available memory. At least, we thought we had to deal with it.

But our tree was very simple, it consisted of a few properties and that it. Let us do the math about this, shall we?

  • Name - 50 chars, unicode - 100 bytes
  • 4 decimal fields - 16 bytes each = 64 bytes
  • 3 boolean fields - 3 bytes
  • Parent point - 4 bytes
  • Set of children - average of 10 per node - 40 bytes + ~50 bytes bookkeeping

This is very rough, of course, but that would do. It puts the memory cost of a node at just under 256 bytes. We will use that number, because it is easier to work with.

Now, with 256 bytes per node, how many can we reasonably use?

Well, 100 MB will take 409,600 nodes or so. Which is pretty good number, I say. A table of that size is considered big, by most people. A GB of memory will give us 4,194,304 items in the tree, and keep the traversal speed near instantaneous. At that point, I would start thinking about the size of the node, because 256 bytes is big size. More realistic size would be 64 bytes or so (drop the name, pack the decimals, use linked list for children) which would give me 16,777,216 nodes for the same memory requirement.

All of those numbers are greater than the current and expected size of the data set, so there isn't a reason to care much beyond that.

The important thing here is to understand that the usual truth about "let the tool do the optimization" doesn't really hold true when you have specific scenarios. For solving very specific, very narrow circumstances, you can generally come up with a much better approach than the generic one.

Of course, this approach would not allow any generalization, and it doesn't have other benefits that using the common platform might have offered (needing to do our own transactions, for example).

Keep that in mind.

Making the complex trivial: Rich Domain Querying

It is an extremely common issue and I talked about it in the past quite a few times. I have learned a lot since then, however, and I want to show you can create rich, complex, querying support with very little effort.

We will start with the following model:

image

And see how we can query it. We start by defining search filters, classes that look more or less like our domain. Here is a simple example:

public abstract class AbstractSearchFilter
{
	protected IList<Action<DetachedCriteria>> actions = new List<Action<DetachedCriteria>>();
	
	public void Apply(DetachedCriteria dc)
	{
		foreach(var action in actions)
		{
			action(dc);
		}
	}
}


public class PostSearchFilter : AbstractSearchFilter
{
	private string title;
	
	public string Title
	{
		get { return title; }
		set
		{
			title = value;
			actions.Add(dc => 
			{
				if(title.Empty())
					return;
				
				dc.Add(Restictions.Like("Title", title, MatchMode.Start));
			});
		}
	}
}

public class UserSearchFilter : AbstractSearchFilter
{
	private string username;
	private PostSearchFilter post;
	
	public string Username
	{
		get { return username; }
		set
		{
			username = value;
			actions.Add(dc =>
			{
				if(username.Empty())
					return;
			
				dc.Add(Restrictions.Like("Username", username, MatchMode.Start));
			});
		}
	}
	
	public PostSearchFilter Post
	{
		get { return post; }
		set
		{
			post = value;
			actions.Add(dc=>
			{
				if(post==null)
					return;
				
				var postDC = dc.Path("Posts"); // Path is an extension method for GetCriteriaByPath(name) ?? CreateCriteria(path)
				post.Apply(postDC);
			);
		}
	}
}

Now that we have the code in front of us, let us talk about it. The main idea here is that we move the responsibility of deciding what to query to the hands of the client. It can make decisions by just setting our properties. Not only that, but we support rich domain queries using this approach. Notice what we are doing in UserSearchFilter.Post.set, we create a sub criteria and pass it to the post search filter, to apply itself on that. Using this method, we completely abstract all the need to deal with our current position in the tree. We can query on posts directly, through users, through comments, etc. We don't care, we just run in the provided context and apply our conditions on it.

Let us take the example of wanting to search all the users who posts about NHibernate.  I can express this as:

usersRepository.FindAll(
  new UserSearchFilter
  {
    Post = new PostSearchFilter
        {
            Title = "NHibernate"
        }
  }
);

But that is only useful for static scenarios, and in those cases, it is easier to just write the query using the facilities NHibernate already gives us. Where does it shine?

There is a really good reason that I chose this design for the query mechanism. JSON.

I can ask the json serializer to serialize a JSON string into this object graph. Along the way, it will make all the property setting (and query building) that I need. On the client side, I just need to build the JSON string (an easy task, I think you would agree), and send it to the server. On the server side, I just need to build the filter classes (another very easy task). Done, I have a very rich, very complex, very complete solution.

Just to give you an idea, assuming that I had fully fleshed out the filters above, here is how I search for users name 'ayende', who posted about 'nhibernate' with the tug 'amazing' and has a comment saying 'help':

{ // root is user, in this case
	Name: 'ayende',
	Post:
	{
		Title: 'NHibernate',
		Tag:
		{
			Name: ['amazing']
		}
		Comment:
		{
			Comment: 'Help'
		}
	}
}
Deserializing that into our filter object graph gives us immediate results that we can pass the the repository to query with exactly zero hard work.
 

A bug story

I run into a bug today with the way NHibernate dealt with order clauses. In particular, it can only happen if you are:

  • Use parameters in the order clause
  • Using SQL Server 2005
  • Using a limit clause

If you met all three conditions, you would run into a whole host of problems (in particular, NH-1527 and NH-1528). They are all fixed now, and I am writing this post as the build run. The underlying issue is that SQL Server 2005 syntax for paging is broken, badly.

Let us take the this statement:

SELECT   THIS_.ID         AS ID0_0_,
         THIS_.AREA       AS AREA0_0_,
         THIS_.PARENT     AS PARENT0_0_,
         THIS_.PARENTAREA AS PARENTAREA0_0_,
         THIS_.TYPE       AS TYPE0_0_,
         THIS_.NAME       AS NAME0_0_
FROM     TREENODE THIS_
WHERE    THIS_.NAME LIKE ?
         AND THIS_.ID > ?
ORDER BY (SELECT THIS_0_.TYPE AS Y0_
          FROM   TREENODE THIS_0_
          WHERE  THIS_0_.TYPE = ?) ASC

And let us say that we want to get a paged view of the data. How can we do it? Here is the code:

SELECT   TOP 1000 ID0_0_,
                  AREA0_0_,
                  PARENT0_0_,
                  PARENTAREA0_0_,
                  TYPE0_0_,
                  NAME0_0_
FROM     (SELECT ROW_NUMBER()
                   OVER(ORDER BY __HIBERNATE_SORT_EXPR_0__) AS ROW,
                 QUERY.ID0_0_,
                 QUERY.AREA0_0_,
                 QUERY.PARENT0_0_,
                 QUERY.PARENTAREA0_0_,
                 QUERY.TYPE0_0_,
                 QUERY.NAME0_0_,
                 QUERY.__HIBERNATE_SORT_EXPR_0__
          FROM   (SELECT THIS_.ID         AS ID0_0_,
                         THIS_.AREA       AS AREA0_0_,
                         THIS_.PARENT     AS PARENT0_0_,
                         THIS_.PARENTAREA AS PARENTAREA0_0_,
                         THIS_.TYPE       AS TYPE0_0_,
                         THIS_.NAME       AS NAME0_0_,
                         (SELECT THIS_0_.TYPE AS Y0_
                          FROM   TREENODE THIS_0_
                          WHERE  THIS_0_.TYPE = ?) AS __HIBERNATE_SORT_EXPR_0__
                  FROM   TREENODE THIS_
                  WHERE  THIS_.NAME LIKE ?
                         AND THIS_.ID > ?) QUERY) PAGE
WHERE    PAGE.ROW > 10
ORDER BY __HIBERNATE_SORT_EXPR_0__

Yes, in this case, we could use TOP 1000 as well, but that doesn't work if we want pages data that isn't started at the beginning of the data set.

Now, here is an important fact, the question marks that you see? Those are positional parameters. Do you see the bug now?

SQL Server 2005 (and 2008) paging support is broken. I find it hard to believe that a feature that is just a tad less important than SELECT is so broken. Any other database get it right, for crying out load.

Anyway, by now you noticed that when we processed the statement to add the limit clause, we had re-written the structure of the statement and changed the order of the parameters. Tracking that problem down was a pain, just to give an idea, here is a bit of the change that I had to make:

/// <summary>
/// We need to know what the position of the parameter was in a query
/// before we rearranged the query.
/// This is used only by dialects that rearrange the query, unfortunately, 
/// the MS SQL 2005 dialect have to re shuffle the query (and ruin positional parameter
/// support) because the SQL 2005 and 2008 SQL dialects have a completely broken 
/// support for paging, which is just a tad less important than SELECT.
/// See  	 NH-1528
/// </summary>
public int? OriginalPositionInQuery;

I fixed the issue, but it is an annoying problem that keep occurring. Paging in SQL Server 2005/8 is broken!

Oh, and just to clarify some things. The ability to use complex expressions for the order by clause using the projection API is fairly new for NHibernate, it is incredibly powerful and really scares me.

An optimization story

I left work today very happy. There was a piece in the UI that was taking too long when run under with a real world data set. What is slow? Let us call it 40 seconds to start with. This is a pretty common operation in the UI, so that was a good place to optimize.

I wasn't there for that part, but optimizing the algorithms used reduced the time from 40 seconds to 5 - 10 seconds, and impressive amount by all accounts, but still one in which the users had to wait an appreciable amount for a common UI operation. Today we decided to tackle this issue, and see if we can optimize this further.

The root action is loading some data and executing a bit of business logic on top of that data. I checked the queries being generated, and while they weren't ideal, they weren't really bad (just not the way I would do things). At that point, we decided to isolate the issue in a test page, which would allow us to test just this function in isolation. Then, we implemented this from scratch, as plain data loading process.

The performance for that was simply amazing. 300 - 150 ms per operation, vs. 5 - 10 seconds in the optimized scenario. Obviously, however, we were comparing apples to oranges here. The real process also did a fair amount of business logic (and related data loading), which was the reason that it was slow. I looked at the requirement again, then at the queries, and despaired.

I hoped that I would be able to use a clever indexing scheme and get the 1000% perf benefit using some form of SQL. But the requirement simply cannot be expressed in SQL. And trying to duplicate the existing logic would only put us in the same position as before.

What to do... what to do...

The solution was quite simple, take the database out of the loop. For a performance critical piece of the application, we really can't afford to rely on external service (and the DB is considered one, in this scenario). I spent some time loading the data at application startup, as well as doing up front work on the data set to make it easier to work with.

This turned that operation into an O(1) operation, where O consists of a small set of in memory hash table lookups. And the performance? The performance story goes like this:

I go into the manager office, and ask him how fast he wants this piece of functionality to run. He hesitate for a moment and then says: "A second?".
I shake my head, "I can't do that, can you try again?"
"Two seconds?" He asked.
"I am sorry", I replied, "I can do five"
The I left the office and thrown over my shoulder, "oh, but it is in milliseconds".
Sometimes I have a rotten sense of humor, but the stunned silence that followed that declaration was a very pleasing.

I am lucky in that the data set is small enough to fit in memory. But I am not going to rely on that, we need to implement soft paging of the data anyway (to make the application startup time acceptable), so it will be able to handle that easily enough even when the data set that we are talking about will grow beyond the limits of memory (which I don't expect to happen in the next couple of years).

Overall, it was a very impressive optimization, even if I say so myself.

It might work, but is it good enough?

Note, I am explicitly not asking if this is optimal. I am asking if it is good enough.

There is a tendency to assume 'it works, and that let sleeping dragons be. This is usually correct, my own definition for legacy code is "code that makes money". As such, any modifications to it should be justified in terms of ROI.

The term that I often use for that is technical debt, by no means my own invention, by a very useful concept. This allow me to explain, in terms that makes sense to the client, what are the implications of letting working, but not good enough implementation to stay in place. Or why I need to take a week or two with a couple of developers to refactor parts of the application.

We like to think about refactoring as: changing the internal structure of the code without changing observable behavior. Business people tend to think about it differently. A time in which the development team is going to do stuff that doesn't give me any value. The inability to translate the difficulty to terms that the business understand is important. And framing such discussions in terms of the technical debt into which they will get us into is critical.

Setting expectations about the behavior of the team is just as important as setting expectations about the behavior of the application.

NHProf: Another milestone

It is by no means the final UI, it  was just thrown together by me in about half an hour, just to show how things are working.

image

I am also using it now to track what is going on in an application that I am working on.

NHProf: Alive! It is alive!

I just finished writing the final test for the basic functionality that I want for NHibernate Profiler:

        [Test]
public void SelectBlogById()
{
ExecuteScenarioInDifferentProcess<SelectBlogByIdUsingCriteria>();
StatementModel selectBlogById = observer.Model.Sessions.First()
.Statements.First();
const string expected = @"SELECT this_.Id as Id3_0_,
this_.Title as Title3_0_,
this_.Subtitle as Subtitle3_0_,
this_.AllowsComments as AllowsCo4_3_0_,
this_.CreatedAt as CreatedAt3_0_
FROM Blogs this_
WHERE this_.Id = @p0

";
Assert.AreEqual(expected, selectBlogById.Text);
}

I actually had to invest some thought about the architecture of testing this. This little test has a whole set of ideas behind it, about which I'll talk about at a later date. Suffice to say that this test creates a new process, start to listen to interesting things that are going on there (populating the observer model with data).

Another interesting tidbit is that the output is formatted for readability. By default, NHiberante's SQL output looks something like this:

SELECT this_.Id as Id3_0_, this_.Title as Title3_0_, this_.Subtitle as Subtitle3_0_, this_.AllowsComments as AllowsCo4_3_0_, this_.CreatedAt as CreatedAt3_0_ FROM Blogs this_ WHERE this_.Id = @p0

This is pretty hard to read the moment that you have any sort of complex conditions.

API Design

There are several important concerns that needs to be taken into account when designing an API. Clarity is an important concern, of course, but the responsibilities of the users and implementers of the API should be given a lot of consideration. Let us take a look at a couple of designs for a simple notification observer. We need to observe a set of actions (with context). I don't want to have force mutable state on the users, so I have started with this approach (using out parameters instead of return values in order to name the parameter):

public interface INotificationObserver
{
    void OnNewSession(out object sessionTag);
    void OnNewStatement(object sessionTag, StatementInformation statementInformation, out object statementTag);
    void OnNewAction(object statementTag, ActionInformation actionInformation);
}

I don't really like this, too much magic objects here, and too much work for the client. We can do it in a slightly different way, however:

public delegate void OnNewAction(ActionInformation actionInformation);

public delegate void OnNewStatement(StatementInformation statementInformation, out OnNewAction onNewAction);

public interface INotificationObserver
{
    void OnNewSession(out OnNewStatement onNewStatement);
}

Sins of Omissions

Joel Splosky's latest column talks about Sins of Commissions has a lot of good information in it. In particular:

There's a great book on the subject by Harvard Business School professor Robert Austin -- Measuring and Managing Performance in Organizations. The book's central thesis is fairly simple: When you try to measure people's performance, you have to take into account how they are going to react. Inevitably, people will figure out how to get the number you want at the expense of what you are not measuring, including things you can't measure, such as morale and customer goodwill.

Where Joel got it wrong is with the ending:

But we soon realized that commissions weren't the only management tool at our disposal. We simply established as a rule the idea that gaming the incentive plan was wrong and unacceptable. Employees generally follow the rules you give them -- and if they don't, you can discipline them or, in extreme cases, dismiss them. The problem with most incentive systems is not that they are too complicated -- it's that they don't explicitly forbid the kind of shenanigans that will inevitably make them unsuccessful.

And here the train not only goes off the tracks but also start chasing cats.

It doesn't work like that. Oh, I don't doubt that it works this way in Joel's case. The problem is that Joel's point of view is that of a small company, one where he is able to maintain high level of control over what is going on. Let me tell you a different story. When I was in the army, I was part of the military police corps. I spent most of my time in prison, but I was involved in the usual gossips about what is going on in the corp. One part of the corp was maintaining discipline, and the soldiers serving there were rewarded (not explicitly, because that was strictly forbidden) for giving tickets. That was implicit, for doing a good job.

The problem is that there has been many cases in which soldiers has been known to... generate tickets. That is by far no the common case, I have to point out, but it has happened. Now, just to give you a clear idea about what is going on, getting caught doing this was a jail time offense. People still did that.

And sometimes they got away with that for long period of times simply for the fact that the army was so big it took time for this type of things to trickle up.

In any organization of significant size, you are going to have this sort of problems. I have seen salespeople that push a project that they knew wouldn't be profitable, just to pocket their commission. When they were called on the carpet for that, they called that Strategic Loss Leader Projects, and continued doing so. And that was in a place that should have been able to keep track about what is going on. In bigger organizations, the same thing happened, but no one actually caught on to that.

I believe that the term for that is local optimization, to the detriment of the entire organization.

Recursive Mocking

This now works :-)

image

The challenge is still open, I intentionally stopped before completing the feature, and there is a failing test in the RecusriveMocks fixture that you can start from.

And just to give you an idea about what I am talking about, please run this and examine the results:

svn diff https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk -r 1682:1683

A really cool web view of them is here.

Request for comments: Changing the way dynamic mocks behave in Rhino Mocks

I have just committed a change to the way Rhino Mocks handles expectations for dynamic mocks and stubs.  Previously, the meaning of this statement was "expect Foo() to be called once and return 1 when it does":

Expect.Call( bar.Foo ).Return(1);

Now, the meaning of this is: "expect Foo() to be called one or more times, and return 1 when it does". This means that this will work:

Assert.AreEqual(1, bar.Foo);
Assert.AreEqual(1, bar.Foo);
Assert.AreEqual(1, bar.Foo);

Where as previously, using dynamic mocks, it would fail on the second assert, because the expectation that was setup was consumed. I think that this is a more natural way to behave, but this is a subtle breaking change.
You can get the old behavior by specifying .Repeat.Once().

Thoughts?

Database Schemas

I was asked to comment on the use of DB schemas, so here it is. The first thing that we need to do is decide what a schema is.

A schema is an organization unit inside the database. You can think about it as a folder structure with an allowed depth of 1. (Yes, just like MS-DOS 1.0). Like folders in the real file system, you can associate security attributes to the schema, and you can put items in the schema. There is the notion of the current schema, and that about it.

Well, so this is what it is. But what are we going to use if for?

People are putting schemas to a lot of usages, from application segregation to versioning. In general, I think that each application should have its own database, and that versioning shouldn't be a concern, because when you upgrade the application, you upgrade the database, and no one else has access to your database.

What we are left with is security and organization. In many applications, the model layout naturally fall out into fairly well define sections. A good example is the user's data (Users, Preferences, Tracking, etc). It is nice to be able to treat those as a cohesive unit for security purposes (imagine wanting to limit table access to the Accounting schema). It is nice, but it is not really something that I would tend to do, mostly because, again, it is only the applications that is accessing the database.

Defense in depth might cause me to have some sort of permission scheme for the database users, but that tends to be rare, and only happen when you have relatively different operation modes.

What I would use schemas for is simply organization. Take a look at Rhino Security as a good example, but default, it will tack its tables into their own schema, to avoid cluttering the default schema with them.

In short, I use schemas mostly for namespacing, and like namespaces elsewhere, they can be used for other things, but I find them most useful for simply adding order.

NHibernate & Static Proxies

I decided to take a look at what I would take to implement static proxies (via Post Sharp) in NHibernate. The following is my implementation log.

  • 09:30 PM - Started to work on post sharp interceptors for NHibernate
  • 09:35 PM - Needs to learn how I can implement additional interfaces with PostSharp.
  • 10:00 PM - Implemented ICollection<T> wrapping for entities
  • 10:35 PM - Proxy Factory Factory can now control proxy validation
  • 11:15 PM - Modified NHibernate to accept static proxies
  • 11:28 PM - Saving Works
  • 11:35 PM - Deleting Works
  • 11:50 PM - Rethought the whole approach and implemented this using method interception instead of field interception
  • 11:58 PM - Access ID without loading from DB implemented
  • 12:01 AM - Checking IsInitialized works
  • 12:13 AM - After midnight and I am debugging interceptions issues.
  • 12:15 AM - It is considered bad to kill the constructor, I feel.
  • 12:16 AM - No one needs a constructor anyway
  • 12:30 AM - Realized that I can't spell my own name
  • 12:34 AM - Resorting to Console.Write debugging
  • 12:40 AM - Wrote my own lazy initializer
  • 12:42 AM - Realized that we can't handle lazy loading without forwarding to a second instance, need to see how we can capture references to the this parameter using PostSharp.
  • 12;45 AM - I think I realized what went wrong
  • 12:55 AM - Lazy loading for non virtual property references works!
  • 12:57 AM - Constructors are back
  • 12:59 AM - Lazy loading for calling non virtual methods works!

The first thing that I have to say is wow Post Sharp rocks! And I mean that as someone who is doing AOP for a long while, and has implemented some not insignificant parts of Castle.DynamicProxy. Leaving aside the amount of power that it gives you, PostSharp simplicity is simply amazing, wow!

The second is that while things are working, it is not even an alpha release. What we have right now is, literally, one evening's hacking.

What we have now is:

  • Removed the requirement for virtual methods
  • Removed the requirement for set to be an instance of Iesi.Collections.ISet<T>, now you can use ICollection<T> and HashSet<T>.
  • Probably broken a lot of things

Consider this a proof of concept, as you can see, it takes time to implements those things, and currently I am doing it at the expense of time better spent sleeping. I started this because I wanted to get relax up from a 12 hours coding day.

If you have interest in this, please contribute to this by testing the code and seeing what breaks it. There are a bunch of TODO there that I would appreciate a second pair of eyes looking over.

You can get the code here: https://nhibernate.svn.sourceforge.net/svnroot/nhibernate/branches/static-proxies

Note that you need to reset the project post build action to where you have PostSharp installed.

Oh, and I left a joke there, see if you can find it.