Saturday, October 11, 2008
#
API Design
There are several important concerns that needs to be taken into account when designing an API. Clarity is an important concern, of course, but the responsibilities of the users and implementers of the API should be given a lot of consideration. Let us take a look at a couple of designs for a simple notification observer. We need to observe a set of actions (with context). I don't want to have force mutable state on the users, so I have started with this approach (using out parameters instead of return values in order to name the parameter):
public interface INotificationObserver
{
void OnNewSession(out object sessionTag);
void OnNewStatement(object sessionTag, StatementInformation statementInformation, out object statementTag);
void OnNewAction(object statementTag, ActionInformation actionInformation);
}
I don't really like this, too much magic objects here, and too much work for the client. We can do it in a slightly different way, however:
public delegate void OnNewAction(ActionInformation actionInformation);
public delegate void OnNewStatement(StatementInformation statementInformation, out OnNewAction onNewAction);
public interface INotificationObserver
{
void OnNewSession(out OnNewStatement onNewStatement);
}
Sins of Omissions
Joel Splosky's latest column talks about Sins of Commissions has a lot of good information in it. In particular:
There's a great book on the subject by Harvard Business School professor Robert Austin -- Measuring and Managing Performance in Organizations. The book's central thesis is fairly simple: When you try to measure people's performance, you have to take into account how they are going to react. Inevitably, people will figure out how to get the number you want at the expense of what you are not measuring, including things you can't measure, such as morale and customer goodwill.
Where Joel got it wrong is with the ending:
But we soon realized that commissions weren't the only management tool at our disposal. We simply established as a rule the idea that gaming the incentive plan was wrong and unacceptable. Employees generally follow the rules you give them -- and if they don't, you can discipline them or, in extreme cases, dismiss them. The problem with most incentive systems is not that they are too complicated -- it's that they don't explicitly forbid the kind of shenanigans that will inevitably make them unsuccessful.
And here the train not only goes off the tracks but also start chasing cats.
It doesn't work like that. Oh, I don't doubt that it works this way in Joel's case. The problem is that Joel's point of view is that of a small company, one where he is able to maintain high level of control over what is going on. Let me tell you a different story. When I was in the army, I was part of the military police corps. I spent most of my time in prison, but I was involved in the usual gossips about what is going on in the corp. One part of the corp was maintaining discipline, and the soldiers serving there were rewarded (not explicitly, because that was strictly forbidden) for giving tickets. That was implicit, for doing a good job.
The problem is that there has been many cases in which soldiers has been known to... generate tickets. That is by far no the common case, I have to point out, but it has happened. Now, just to give you a clear idea about what is going on, getting caught doing this was a jail time offense. People still did that.
And sometimes they got away with that for long period of times simply for the fact that the army was so big it took time for this type of things to trickle up.
In any organization of significant size, you are going to have this sort of problems. I have seen salespeople that push a project that they knew wouldn't be profitable, just to pocket their commission. When they were called on the carpet for that, they called that Strategic Loss Leader Projects, and continued doing so. And that was in a place that should have been able to keep track about what is going on. In bigger organizations, the same thing happened, but no one actually caught on to that.
I believe that the term for that is local optimization, to the detriment of the entire organization.
Friday, October 10, 2008
#
Recursive Mocking
This now works :-)

The challenge is still open, I intentionally stopped before completing the feature, and there is a failing test in the RecusriveMocks fixture that you can start from.
And just to give you an idea about what I am talking about, please run this and examine the results:
svn diff https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk -r 1682:1683
A really cool web view of them is here.
Request for comments: Changing the way dynamic mocks behave in Rhino Mocks
I have just committed a change to the way Rhino Mocks handles expectations for dynamic mocks and stubs. Previously, the meaning of this statement was "expect Foo() to be called once and return 1 when it does":
Expect.Call( bar.Foo ).Return(1);
Now, the meaning of this is: "expect Foo() to be called one or more times, and return 1 when it does". This means that this will work:
Assert.AreEqual(1, bar.Foo);
Assert.AreEqual(1, bar.Foo);
Assert.AreEqual(1, bar.Foo);
Where as previously, using dynamic mocks, it would fail on the second assert, because the expectation that was setup was consumed. I think that this is a more natural way to behave, but this is a subtle breaking change.
You can get the old behavior by specifying .Repeat.Once().
Thoughts?
Database Schemas
I was asked to comment on the use of DB schemas, so here it is. The first thing that we need to do is decide what a schema is.
A schema is an organization unit inside the database. You can think about it as a folder structure with an allowed depth of 1. (Yes, just like MS-DOS 1.0). Like folders in the real file system, you can associate security attributes to the schema, and you can put items in the schema. There is the notion of the current schema, and that about it.
Well, so this is what it is. But what are we going to use if for?
People are putting schemas to a lot of usages, from application segregation to versioning. In general, I think that each application should have its own database, and that versioning shouldn't be a concern, because when you upgrade the application, you upgrade the database, and no one else has access to your database.
What we are left with is security and organization. In many applications, the model layout naturally fall out into fairly well define sections. A good example is the user's data (Users, Preferences, Tracking, etc). It is nice to be able to treat those as a cohesive unit for security purposes (imagine wanting to limit table access to the Accounting schema). It is nice, but it is not really something that I would tend to do, mostly because, again, it is only the applications that is accessing the database.
Defense in depth might cause me to have some sort of permission scheme for the database users, but that tends to be rare, and only happen when you have relatively different operation modes.
What I would use schemas for is simply organization. Take a look at Rhino Security as a good example, but default, it will tack its tables into their own schema, to avoid cluttering the default schema with them.
In short, I use schemas mostly for namespacing, and like namespaces elsewhere, they can be used for other things, but I find them most useful for simply adding order.
NHibernate & Static Proxies
I decided to take a look at what I would take to implement static proxies (via Post Sharp) in NHibernate. The following is my implementation log.
- 09:30 PM - Started to work on post sharp interceptors for NHibernate
- 09:35 PM - Needs to learn how I can implement additional interfaces with PostSharp.
- 10:00 PM - Implemented ICollection<T> wrapping for entities
- 10:35 PM - Proxy Factory Factory can now control proxy validation
- 11:15 PM - Modified NHibernate to accept static proxies
- 11:28 PM - Saving Works
- 11:35 PM - Deleting Works
- 11:50 PM - Rethought the whole approach and implemented this using method interception instead of field interception
- 11:58 PM - Access ID without loading from DB implemented
- 12:01 AM - Checking IsInitialized works
- 12:13 AM - After midnight and I am debugging interceptions issues.
- 12:15 AM - It is considered bad to kill the constructor, I feel.
- 12:16 AM - No one needs a constructor anyway
- 12:30 AM - Realized that I can't spell my own name
- 12:34 AM - Resorting to Console.Write debugging
- 12:40 AM - Wrote my own lazy initializer
- 12:42 AM - Realized that we can't handle lazy loading without forwarding to a second instance, need to see how we can capture references to the this parameter using PostSharp.
- 12;45 AM - I think I realized what went wrong
- 12:55 AM - Lazy loading for non virtual property references works!
- 12:57 AM - Constructors are back
- 12:59 AM - Lazy loading for calling non virtual methods works!
The first thing that I have to say is wow Post Sharp rocks! And I mean that as someone who is doing AOP for a long while, and has implemented some not insignificant parts of Castle.DynamicProxy. Leaving aside the amount of power that it gives you, PostSharp simplicity is simply amazing, wow!
The second is that while things are working, it is not even an alpha release. What we have right now is, literally, one evening's hacking.
What we have now is:
- Removed the requirement for virtual methods
- Removed the requirement for set to be an instance of Iesi.Collections.ISet<T>, now you can use ICollection<T> and HashSet<T>.
- Probably broken a lot of things
Consider this a proof of concept, as you can see, it takes time to implements those things, and currently I am doing it at the expense of time better spent sleeping. I started this because I wanted to get relax up from a 12 hours coding day.
If you have interest in this, please contribute to this by testing the code and seeing what breaks it. There are a bunch of TODO there that I would appreciate a second pair of eyes looking over.
You can get the code here: https://nhibernate.svn.sourceforge.net/svnroot/nhibernate/branches/static-proxies
Note that you need to reset the project post build action to where you have PostSharp installed.
Oh, and I left a joke there, see if you can find it.
Thursday, October 09, 2008
#
First Steps with Post Sharp
PostSharp is an AOP framework that works using byte code weaving. That is, it re-writes your IL to add behaviors to it. From my point of view, it is like having the cake (interception, byte code weaving) and eating it (I haven't even looked at the PostSharp source code, just used the binary release).
My initial spike with it went very well. Here it is:
[Serializable]
public class Logger : OnFieldAccessAspect
{
public override void OnGetValue(FieldAccessEventArgs eventArgs)
{
Console.WriteLine(eventArgs.InstanceTag);
Console.WriteLine("get value");
base.OnGetValue(eventArgs);
}
public override InstanceTagRequest GetInstanceTagRequest()
{
return new InstanceTagRequest("logger", new Guid("4f8a4963-82bf-4d32-8775-42cc3cd119bd"), false);
}
public override void OnSetValue(FieldAccessEventArgs eventArgs)
{
int i = (int?)eventArgs.InstanceTag ?? 0;
eventArgs.InstanceTag = i + 1;
Console.WriteLine("set value");
base.OnSetValue(eventArgs);
}
}
This is an aspect that run on each field access. It is not really useful, but it helps to show how things works. A couple of things that are I think are insanely useful:
- Aspects are instantiated at compile time, allowed time to set themselves up, then serialized to an resource in the assembly. At runtime, they are de-serialized and ready to run. The possibilities this give you are amazing.
- InstanceTag is a way to keep additional data per aspect.
Now, let us assume that I want to add the aspect to this code:
[Logger]
public class Customer
{
public string Name { get; set; }
}
Note, there is no field. (Well, there is, it is generated by the compiler). Now we compile and run the PostSharp post compile step. With that, we can now investigate what is going on.
As you can see, we are deserializing the attribute and storing it in a field that we can now access. Let us check the Customer implementation now:
We have the logger field, which is used for something, but we also have the ~get~<Name>k__Backingfield and ~set~<Name>k__BackingField. <Name>k__BackingField (and I would love to hear the story behind that) is the compiler generated field that was created for us. The ~get~... and ~set~ are generated by PostSharp. Before we look at them, we will look at the implementation of Name.
Where it used to call the field directly, now it is doing this via a method call. And now we can look at those method calls.
There is a lot going on here. We create a new field access event arg, call the aspect method, and return the value. Note that the state (instance tag) is stored in the object as well, for each field access.
It looks very well done.
Rhino Mocks 3.5 Gems - Explicit Property Setting Expectations
This post is derived from the Rhino Mocks 3.5 documentation.
Setting expectations for property set was always very simple, and slightly confusing with Rhino Mocks. Here is how you do it:
view.Username = "the user name";
The problem is that it is hard to see that there is an expectation created here. So, with the generous help of Sebastian Jancke, we have a new syntax:
Expect.Call(view.Username).SetPropertyWithArguments("the user name");
This is much more explicit and easier to understand. We can also set expectation on the property set, without expecting a certain value using this syntax:
Expect.Call(view.Username).SetPropertyAndIgnoreArguments();
How to build an application
I am currently working (well, sort of, more playing around) on the NHibernate Profiler. I thought that this would be a good time to describe how I approach most development tasks.
I don't actually have the time/strength to start serious development effort, but I have the time to do a lot of spikes, and to think about architecture. If I decide to spend the time making this happen, I'll post more about how it works.
I started with a spike about extracting the actual data from NHibernate. My target goal is being able to profile NHibernate 2.0 without having to modify NHibernate to support this. I actually run into some issues with that, more specifically, I run into technical issues with my chosen approach. Instead of spending time on troubling shooting that, I chose another path. That may not be the way it will end, but I am not interested in the technicalities at the moment, I am interested in the feasibility of the project. Suffice to say that I was successful (and learned a bit about named pipes :-) ).
Next, SQL formatting is a pain, but it is something that is mandatory for a profiler, a least if you want to let the users a fighting chance in understanding what is going on. I found a solution for that as well, which I am extremely pleased with.
Next is not the actual application architecture, but the user interface. That helps nailing down what the application is going to do, and cement what it is going to do. I am also a great believer of building from the top down, and the architecture that I have in mind make it very simple to split back end processing from the front end.
This is an example of what I have in mind. This is concept UI. It is not meant to be the final thing, it is here merely to give me an idea about what I am going to build. This UI was generated with the assistance of power point and paint, the disclaimer is what I usually put whenever I create a UI.

What you can't see, and the purpose of the UI building exercise, is the mental model that it helped me create (the main concepts in the application are sessions, statements and warnings/suggestions). There is a streaming rule set that process those, and a whole set of functionality that just sits there in my head.
The next step, at this point, is getting from a UI sketch to a working UI draft. What do I mean by that? It means building the UI (which will intentionally look ugly) and hooking it to a dummy back end. The reasoning here is quite simple. I don't care for the look and feel at the moment, that is utterly uninteresting to me at this point, because I have seen what real UI developers can do, and I am not in their league so I am not going to even try.
What I do care about in this stage is the actual operation of the UI. I can hand it over for the professionals to handle the beautification process after that part work. Of course, I feel bad about not knowing more about this, so I may spend more time there than I should, but I am going to try.
After the presentation layer is done, we can start focusing on the actual back end. That is going to composed of event processing pipeline with a set of rules that I can use to provide warnings and suggestions about what to do. Still debating the interaction between the back end and the front end. Most likely this is going to be a simple ThreadSafeQueue<T> and that's it.
Oh, and here is the total sum of code written so far:
And if you think that this is interesting that I am not showing code, you are right, what is see is what is :-)
Safe for multi threading...
The easiest way of getting there is to have no mutable state. And here is a simple test to ensure that. Seeing how CouchDB code works and how erlag handle things is quite educating in this regard.
[TestFixture]
public class EnssureTypesSafeForMultiThreadingTestFixture
{
[Test]
public void TypeIsSafeForMultiThreading()
{
var visitedTypes = new List<Type>
{ // immutable types, partial list
typeof(int),
typeof(long),
typeof(string),
typeof(DateTime)
};
foreach (var type in GetRootTypesToCheck())
{
CheckType(type, visitedTypes);
}
}
private static void CheckType(Type type, ICollection<Type> types)
{
if(types.Contains(type))
return;
types.Add(type);
var fields = type.GetFields(BindingFlags.Instance|BindingFlags.NonPublic|BindingFlags.Public);
foreach (var info in fields)
{
var isReadOnlyField = (info.Attributes & FieldAttributes.InitOnly)==FieldAttributes.InitOnly;
if(isReadOnlyField==false)
throw new InvalidAsynchronousStateException("Dude, " + type + "." + info.Name +
" is not marked as read only. You are NOT safe for multi threading, enjoy the deadlock, bye!");
CheckType(info.FieldType, types);
}
}
private static IEnumerable<Type> GetRootTypesToCheck()
{
// return types that I am interested in verifying
}
}
Wednesday, October 08, 2008
#
Rhino Mocks Challenge: Implement This Feature
Okay, let us see if this approach works...
Here is a description of a feature that I would like to have in Rhino Mocks (modeled after a new feature in Type Mock). I don't consider this a complicated feature, and I would like to get more involvement from the community in building Rhino Mocks (see the list of all the people that helped get Rhino Mocks 3.5 out the door).
The feature is fluent mocks. The idea is that this code should work:
var mockService = MockRespository.GenerateMock<IMyService>();
Expect.Call( mockService.Identity.Name ).Return("foo");
Assert.AreEqual("foo", mockService.Identity.Name);
Where identity is an interface.
The best place to capture such semantics is in the RecordMockState.
Have fun, and send me the patch :-)
Monday, October 06, 2008
#
Reading Eralng: CouchDB Streams
A question to my dear readers, do you find this series valuable? Do you consider it interesting? It is a significant departure from my usual set of topics.
In my recent post I wondered about the concept of summary streams, and how they related to the way CouchDB works. I couldn't figure out what they were doing. As it turn out, there is a good documentation for them in this post. There were two things that mislead me. The first was the notion of summary. I think that this is a misnomer, because this is the only place in CouchDB where that term is used. It is not a summary, it is the actual document. As a matter of fact, I think that it is a term that was carried over from Notes.
The second misleading clue was stream. I am used to think about streams in the classic sense, as a way to access a stream of bytes. The CouchDB notion of stream is quite different. It is a way to optimize disk access, it seems. I wondered about that, because the nature of CouchDB append only file seems certain to cause a lot of issues with regards to internal fragmentation in the file (which would require a lot of seeks, which are slow).
Let me see if I can deconstruct what is going on in couch_stream, first, we have the structure declaration:

This seems to be pretty reasonable. The write_stream is defining a reserved space in the file, note the next_alloc field. It looks like we are allocating memory (or disk space). Should be interesting. The stream structure just hold the process and the file description, and isn't really interesting.
The initialization of a stream is interesting in itself:

Here we just create a new write stream and copy the initialization values to the state. There seems to be a 1 to 1 mapping between a stream and an erlang process. Let us examine how we write the data. First, we have the stop condition of running out of data to write:

You can learn a lot about the a function in erlang just from its declaration. In this case, <<>> means a binary with no items in it, hence, we run out of things to write.
And now we come to the function clause that deals with the issue of running out of room to write it.

The first part of the function is using variable binding to extract the values out of the stream. It goes against the grain, I know, to see CurrentPos being "set" when it is on the right side of the assignment, but that how it works. (Well, actually it isn't being set, it is being matched, or bound, but that is another issue).
Next we find what is the next size that we have to allocate, and ask the file to expand. I don't think that I have seen this before, let us take a look:

We first get a the end of the file, and then we write a single byte at the end of the file plus the expansion value. In other words, we increase the length of the file by however big Num is. For .NET, this is the equivalent of the SetLength call, and it is important to create continuos files, with as little fragmentation as possible.
Going back to write_data, we have this line:

The syntax isn't really nice, in my opinion, but what we have here is basically: Write to the file Fd at position CurrentPos the value of NewPos (with pack to FILE_POINTER_BITS and then the value of NewSize (packed to STREAM_OFFSET_BITS).
It is important to note where this is written. Go and take a look at the case statement in which this expression is. First, we setup enough size for the data we want to write and for the next allocation. When we come to the end of the current chunk, we create a new one (by expanding the file) and then write the address of the new chunk into the end of the chunk, following that by a move to the new chunk.
The last part of write_data is very simple:

We start by figuring out how many bytes we have to write, and then split the binary data by that. We write what we can to the file, and then recursively call ourself (which will either exit (nothing to write) or create a new chunk of the file and continue writing it.
Elegant, short and to the point. It take more time to describe how it works than it is to write the code. The code for reading is just as sweet:

The stop condition is when we have no more data to read. The second clause is when we have run out of data to read in the current stream, and we need to read about the next one. Again, erlang's pattern matching is useful here, because it allow to easily unpack the values from the file to in memory representation.
The last clause is where things are actually happening. We select the number of bytes to read, offset is a really misleading term here, it is not the offset from the beginning of the chunk (like most of us would think), it is the amount of bytes remaining in the current chunk.
We read the data to memory, update the Sp (stream position?) and then call the function that we were passed, to find out if we should read more or stop.
Now, how are those streams used?
From reading the code, it looks like there are two streams used in CouchDB. The first is the document stream (called the summary_stream). And the second (actually, the seconds) is a stream for all the binary attachments for a document (a stream per a set of document attachments).
And with this, we conclude the reading of CouchDB persistence architecture. Next topic, views, and how they are used.
Sunday, October 05, 2008
#
Windsor - IModelInterceptersSelector
In my previous post I introduced the basis of context as an architectural pattern. Now I want to talk about how we can implement that using Windsor and a new extensibility point: IModelInterceptersSelector.
The interface is defined as:
/// <summary>
/// Select the appropriate interecptors based on the application specific
/// business logic
/// </summary>
public interface IModelInterceptorsSelector
{
/// <summary>
/// Select the appropriate intereceptor references.
/// The intereceptor references aren't neccessarily registered in the model.Intereceptors
/// </summary>
/// <param name="model">The model to select the interceptors for</param>
/// <returns>The intereceptors for this model (in the current context) or a null reference</returns>
/// <remarks>
/// If the selector is not interested in modifying the interceptors for this model, it
/// should return a null reference and the next selector in line would be executed (or the default
/// model.Interceptors).
/// If the selector return a non null value, this is the value that is used, and the model.Interectors are ignored, if this
/// is not the desirable behavior, you need to merge your interceptors with the ones in model.Interecptors yourself.
/// </remarks>
InterceptorReference[] SelectInterceptors(ComponentModel model);
/// <summary>
/// Determain whatever the specified has interecptors.
/// The selector should only return true from this method if it has determained that is
/// a model that it would likely add interceptors to.
/// </summary>
/// <param name="model">The model</param>
/// <returns>Whatever this selector is likely to add intereceptors to the specified model</returns>
bool HasInterceptors(ComponentModel model);
}
And registering it in the container is simply:
container.Kernel.ProxyFactory.AddInterceptorSelector(selector);
Interceptors are the basis of AOP, but traditionally, you didn't get a lot of choices in how you compose your interceptors at runtime. Using IModelInterceptersSelector make it extremely easy to modify the selection of interceptors based on relevant business logic.
Let us take the following example. We have a warehouse service that we want to add caching to. However, we can't use the cache in the request comes from the fulfillment service. First, we define the caching interceptor, then, we define the logic that controls adding or removing it.
public class WarehouseCachingInterceptorSelector : IModelInterceptorsSelector
{
public InterceptorReference[] SelectInterceptors(ComponentModel model)
{
if(model.Service!=typeof(IWarehouse))
return null;
if(Origin.IsFromFulfillment)
return null;
return new InterceptorReference[]{new InterceptorReference(typeof(WarehouseCachingInterceptor)), };
}
public bool HasInterceptors(ComponentModel model)
{
return model.Service == typeof (IWarehouse);
}
}
And now we get caching for everything except for fulfillment. And we get this in a clean and very easy to understand way. :-D
Windsor - IHandlerSelector
In my previous post I introduced the basis of context as an architectural pattern. Now I want to talk about how we can implement that using Windsor and a new extensibility point: IHandlerSelector.
The interface is defined as:
/// <summary>
/// Implementors of this interface allow to extend the way the container perform
/// component resolution based on some application specific business logic.
/// </summary>
/// <remarks>
/// This is the sibling interface to <seealso cref="ISubDependencyResolver"/>.
/// This is dealing strictly with root components, while the <seealso cref="ISubDependencyResolver"/> is dealing with
/// dependent components.
/// </remarks>
public interface IHandlerSelector
{
/// <summary>
/// Whatever the selector has an opinion about resolving a component with the
/// specified service and key.
/// </summary>
/// <param name="key">The service key - can be null</param>
/// <param name="service">The service interface that we want to resolve</param>
bool HasOpinionAbout(string key, Type service);
/// <summary>
/// Select the appropriate handler from the list of defined handlers.
/// The returned handler should be a member from the <paramref name="handlers"/> array.
/// </summary>
/// <param name="key">The service key - can be null</param>
/// <param name="service">The service interface that we want to resolve</param>
/// <param name="handlers">The defined handlers</param>
/// <returns>The selected handler, or null</returns>
IHandler SelectHandler(string key, Type service, IHandler[] handlers);
}
And registering it in the container is simply:
container.Kernel.AddHandlerSelector(selector);
A handler selector is asked if it wants to express an opinion on a particular component resolution, based on key (optional) and type. Assuming we say yes, we are called to select the appropriate handler from all the registered handlers that can satisfy that request.
Let us say that we want to recover from the database being down by serving an implementation that reads from only the cache, we can implement it thusly:
public class DataAccessHandlerSelector : IHandlerSelector
{
bool databaseIsDown = false;
public DataAccessHandlerSelector()
{
DatabaseMonitor.OnChangedState +=
state => databaseIsDown = state == DatabaseState.Down;
}
public bool HasOpinionAbout(string key, Type service)
{
return databaseIsDown && service == typeof(IRepository);
}
public IHandler SelectHandler(string key, Type service, IHandler[] handlers)
{
return handlers.Where(x=>x.ComponentModel.Implementation == typeof(CacheOnlyRepository)).First();
}
}
Now we automatically replace, based on our own logic and the current context what type of component the container should resolve.
I am giving the example of detecting infrastructure change, but as important, and as interesting, is the ability to easily use this in order to select services in a multi tenant environment. We can use this approach to perform service overrides all over the place in a way that is natural, easy and extremely powerful.
Have fun...
Components, Implementations and Contextual Decisions
I am a big believer in using context in order to drive a system. What do I mean by that?
Note, I am going to talk about the problem in general, and its solution implementation using Windsor. The example is fictitious and is here to represent the problem in a way that allow me to talk about it in isolation, it doesn't necessarily represent good design.
It seems like just about all the applications that I had to deal with recently had to have the notion of system variability. Now, let us make it clear. System variability is a fancy name for the if statement. The problem with the if statement is that when you have a lot of them, it gets pretty tricky to understand what is going on with the system. That is why a common refactoring is replace conditional with polymorphism.
What I am usually talking about is "when we are in this condition, we should do X, otherwise, we should do Y". Let us take the simple idea of a warehouse service. If we are making a call from the web site, it is okay to return data that may not be accurate to the second. If we are calling from the fulfillment service, we need accurate, up to date results. A simple way of handling this is:
public bool ItemIsPhysicallyOnTheShelve(Guid id)
{
if(Origin == Originators.Website)//can use caching
{
var result = Cache.Get<bool?>("item-on-shelve-" + id)
if(result.HasValue)
return result.Value;
}
// actual work and putting in cache
}
A more interesting example might be different business rules for making order authorization, based on whatever we have a strategic customer or not. In both cases, we have some context for the operation that modify the way that we deal with this operation.
public bool IsValid(Order order, ValidationSummary summary)
{
IRule[] rules = CurrentCustomer.IsStrategic ?
strategicCutomerRules : normalCustomerRules;
foreach(IRule rule in rules)
{
rule.Validate(order, summary);
}
return summary.HasErrors;
}
One way of dealing with that is as you see in the code samples, get the state from somewhere and make decisions based on that. Another, more advance option is to create:
- IWarehouseService
- DefaultWarehouseService
- CachingWarehouseServiceDecorator
And because decorators are really annoying, we will use AOP to deal with it by creating a caching interceptor.
Now the issue is mere configuration, I can deal with that by flipping bits in the container configuration. The second example can be solved by creating two components with different rule sets and using that. The problem is that this remove the coding issues, but it creates a more subtle and much harder to deal with problems.
If I rely on the container configuration alone, I suddenly have logic there. Important business logic. That is not a good idea, I think. Especially since this means that at some point my code has to make an explicit decision about what component to use, and that breaks the infrastructure independence rule.
What this boil down to is that now I have to manage a lot of the complexity in the application using the container configuration and tie the working of the system into it. That works if the number of variables that I have to juggle is small, but if I have a lot of axes (plural: axis) that are orthogonal to one another, it is getting complex very fast.
My solution for that problem is to define a service and its context as a cohesive unit. That is, the concept of a service contains its interface, all of its implementations and the business logic required to select which implementation (and configuration) to choose for a given context.
In the warehouse example above, what we will have is:
- IWarehouseService
- DefainltWarehouseService
- WarehouseCachingInterceptor
- WarehouseInterceptorsSelector
Now all of those are part of the same service. The last one is where we isolate the actual decision about what type of implementation we should get. In this case, we use Windsor's IModelInterecptorsSelector to add additional, context bound, interceptors to the service.
But that is just from the interceptors side, what about the selection of the appropriate rules? We can handle that using ISubDependencyResolver, where we can decide how we want to filter the rules that goes into IWarehouseService based on the context. For that matter, we might have a completely different warehouse implementations, VirtualWarehouseService and PhysicalWarehouseService. And we need to select between them based on some business criteria. We handle that using IHanlderSelector, that make the decision which component to create.
Again, IHandlerSelector, IModelInterceptorsSelector and ISubDependencyResolvers are all implementations of Windsor extensibility mechanisms (my next two posts will cover them in details) that allows us to make it aware of the context that we have in the application.
The purpose of the explicit notion of context is to allow us to deal with the variability in the application in an explicit manner. And that, in turn means that we get much better separation of concerns.
Reading MEF code
Okay, here is the deal. There is a feature in MEF that I find interesting, the ability to dynamically recompose the imports that an instance have. Well, that is not accurate. that doesn't really interest me. What does interest me is some of the implementation details. Let me explain a bit better.
As I understand the feature, MEF can load the imports from an assembly, and if I drop another file into the appropriate location, it will be able to update my imports collection. Now, what I am interested in is to know whatever MEF allow me to update file itself and update it on the fly. The reason that I am interested in that is to know how this is done without locking the file (loading an assembly usually locks the file, unless you use shadow copy assemblies, which means that you have to use a separate AppDomain).
As you can imagine, this is a very specific need, and I want to go in, figure out if this is possible, and go away.
I started by checking out the MEF code:
svn co https://mef.svn.codeplex.com/svn mef
I just love the SVN integration that CodePlex has.
Now, the only way that MEF can implement this feature is by watching the file system, and that can be done using a FileSystemWatcher. Looking for that, I can see that it appears that DirectoryPartCatalog is using it, which isn't really surprising.
But, going there and reading the code gives us this:
Note what isn't there. there is no registration to Changed. This is likely not something that MEF supports.
Okay, one more try. Let us see how it actually load an assembly. We start from Export<T> and GetExportedObject() which calls to GetExportedObjectCore() which shell out to a delegate. Along the way I looked at CompositionException, just to make sure that it doesn't have the same problem as TypeLoadException and the hidden information, it doesn't.
I tried to follow the reference chain, but I quickly got lost, I then tried to figure out how MEF does delayed assembly loading, to see if it is doing anything special there, but I am currently hung at ComposablePartDefinition.Create, which seems promising, but it is accepting a delegate and no one is calling this.
So this looks like it for now.
Rhino Mocks 3.5 RTM
Today I decided that I had enough time to get bugs for the 3.5 RC, so I fixed all the remaining bugs, updated the Rhino Mocks 3.5 Documentation, and put the binaries out the site.
For this release, I actually have 4 binary packages. One for .NET 3.5 and one for .NET 2.0, but I have an additional criteria, with the castle assemblies merged (default) and with the castle assemblies included). The reason for having those two options is that people who want to extend Rhino Mocks directly can do it more easily. In general, I suggest using the merged version.
So, what do we actually have here (feature differences from 3.4)?
Features:
-
Assert Act Arrange syntax for mocking
- Including support for .NET 2.0
- Added a way to access the mocked method at runtime, using WhenCalled (similar to Do(), but without the pain of having to specify a special delegate).
- CreateMock() is deprecated and marked with the [Obsolete] attribute. Use StrictMock() instead.
- Support for mocking interface in C++ that mix native and managed types. (Note, may require that you install kb957541 to get around bug introduced to the framework on SP1).
- New event raising syntax:
eventHolder.Raise(stub => stub.Blah += null, this, EventArgs.Empty);
- Better support for multi threaded replays.
- Note that access to the mock object is now serialized.
- Support AssertWasCalled on parial mocks.
Patches:
- From Sebastian Jancke, adding support for SetPropertyAndIgnoreArguments() and SetPropertyWithArguments( o );
- From Yann Trevin, adding support for List.Element("MyKey", ...), so we are not limited to just integers.
- From David Tchepak, adding support for ctor arguments when creating a mock using static method.
- From Stefan Steinegger - much better support for creating inline constraints.
Improvements:
- Better handling of exception in raising events from mock objects
- Better error message when trying to set expectation on properties of a stub.
- Better error handling for AAA syntax abuse
- Will give better errors if you call Verify on a mock that is in record mode.
- Allowing to return to record mode without losing expectations.
- BackToRecord extension method.
- AAA syntax now works with Ordering
- Better error message if trying to use SetupResult on stubbed mock properties.
- Better error message when trying to mock null instance.
Bug fixes:
- Fixing an issue with mock objects that expose methods with output parameter of type System.IntPtr.
- Fixed an issue with merging, would cause issues if you are also using Castle Dynamic Proxy.
- Fixed various typos
- Fixed issue with mocking internal classes and interfaces.
- OutRef params was not copied when creating new expectation from an existing one.
- Fixing an issue with leaking expectationReplaced in mocks.
Saturday, October 04, 2008
#
The NHibernate Profiler
This is speculative at the moment, just to be clear.
I am thinking about creating a profiler for NHibernate. This came out of the common need to actually get a good view about what is going on with NHibernate.
This is intended to be a commercial project.
I have a feature set in mind, but I would rather hear from you if you think that it is a tool that you would use (and buy) and what kind of features do you expect such a tool to have?
And, to forestall the nitpickers, I am well aware of SQL Profiler.
Erlang Reading: CouchDB - Digging Down to Disk
Jan has corrected a misconception that I had here, where I assumed that PUT was create and POST was update. Apparently this is not quite correct. A PUT request will update/create a new document with a user supplied name, while a POST request will create a new document with a server generated name.
That narrows down my search for "create new document" code path again. Here is the method in couch_httpd_db that handles a POST request on a document.

It parse the form, get the revision of the document, and then it calls to open_doc_revs, which we haven't explored so far. I feel confident that I understand what this is doing, however:

I'll not show the open_doc_revs_int, because it is too big, but it contains some references that so far I haven't encountered. Specifically, couch_key_tree, which I don't think that I seen before, and to get_full_doc_infos, which looks like this:

Oh, now we are on a more familiar territory. Lookup performs a search on a binary tree, and we have gone over it in depth before. However, we can now see that the we pass to it a fulldocinfo_by_id_tree... I don't think we have explored how CouchDB make use of the btree, so let us head to the db structure and see what else we have there.

Here is what I know about this structure at the moment. db_header container the data that is actually written to the file, while db is the in memory structure. Not sure what write_version is all about, update_seq looks like the the internal sequence number for CouchDB. I don't think that I have even seen the concept of stream before, but it is there, and look like most stream implementations that I am already familiar with.
get_state on that returns a {Position, BytesRemaining}, since I don't understand how this is used at the moment, I don't understand why this is important. fulldocsinfo_by_id_btree_state contains the root of the btree, and the same holds for docinfo_by_seq_btree_state, I assume that local_docs_btree_state is also the same.
purge_seq is a counter for the last purge (physical delete), and it is used in conjunction with the views in order to ensure that there are no purged documents in the view. purged_docs keeps track of the documents that were last purged.
db, the runtime structure, is considerably more interesting. It took me a while to figure out where it is created, but I finally found the init_db function in couch_db_updater:

The first part is to open the summary_state stream, still not sure what this is about, though. We then define a comparison function for use when opening the fulldocsinfo btree, and we pass several additional functions when we create the btree. Reduce is usually used when we compact the tree, and to get information about the btree (such as count of documents). The join and split also prepare the data to be written to disk in a consistent fashion.
I don't like the name join/split. The internal names, assemble and extract seems to be much more accurate.
Finally, we get to the actual creation of the db structure itself. An interesting thing to note is that only the update_pid is filled. Since CouchDB implements the many readers / single writer, I assume that this is how we ensure the single reader, because only a single process ever handle writes to the DB. I am assuming for now that main_pid is for readers, and compactor_pid if for compacting (obviously). I'll track them next.
Well, that took some doing, the main_pid is set in the couch_db_updater:init (the same function that calls the init_db function):

I had hard time tracking this because I couldn't figure out what who was calling couch_db_updater:init (and what is the MainPid). Usually, in gen_server processes, there is a start_link function that you call, but in this case, there wasn't. Some digging revealed that the couch_db process is the one that actually call start_link on the couch_db_updater. I wonder if this is a common pattern in erlang, because it sure did caught me by surprise.
So main_pid is the couch_db process, and update_pid is the couch_db_updater. It actually goes a lot further than that. couch_db_updater doesn't actually have a public interface. It looks like couch_db is the one that exposes the interface, and send synchronous messages to the couch_db_updater process:

That is the way in which CouchDB ensures that we have only a single write, I presume. Compaction is handled in couch_db_updater as well, the implementation is interesting:

We create a new process and return before we finish compacting. It is interesting because it is the first example of explicit concurrency that I saw. Let us dig a bit better into start_copy_compact_int, which is pretty straight forward:

This just create a new file, init a database and call_compact_docs, finishing by notifying the updater that it finished compacting. copy_compact_docs is interesting in itself:

We start by defining the EnumBySeqFun, which batches documents to write to disk until should_flush returns true. should_flush is funny, it check memory thresholds and flush to disk whenever there is about 10 MB of data ready to be flushed.
The usage of foldl here is a good reason to go back and examine it in more details:

And fold is defined as:

So we have this stream_node, whatever that is, let us take an additional look.

We are already familiar with get_node(), but adjust_dir and stream_kp_node and stream_kv_node are new. The first is easy:

Let us explore stream_kv_node, which is likely to be easier than stream_kp_node.

This is slightly complex. We start by defining the drop off point (smaller / greater from the current key). The important bit is in the last function, the call to our function and the recursion back into the function. As a reminder, here is the function that we pass:

I hope that this make it clear how this works. We simply stream all the items to the function, and it copies it. Note that an important subtlety is that we have several btree over the same data (we will touch down in a bit) and in this case we are only copying the items that are greater than the update_seq in the current DB. Still not sure what this update_seq is all about. We will go all the way back to copy_compact_docs_int's last line, where it notify the updater that it finished the current compaction. This message arrive to this method:

This looks complex, but it really isn't. It is simply opening the new database, and checking if the current update_seq equal to the latest in the new database. If it isn't, we restart the process. If it is, then we copy the local documents (not sure why they get special treatment, and it looks like it is expected to have very few of them), move the files and move on with our lives.
Now, it looks like update_seq is updated whenever someone make a change to the database, but let us verify this, shall we?
Looking at couch_db_updater:update_docs_int, we can see that the update_seq is updated for each new document that is modified (see merge_rev_trees).
Okay, that was interesting. But I still don't know how we save a document to disk. Let us go back to couch_db:update_docs, and see what is going on here. This is pretty complex, so we will break it apart to discrete pieces.

We start by generating a new revision for the documents. Note that in the case local revisions, we just increment the revision, if it is a standard document, we generate a random revision number. I think that this is done in order to support farm wide revisions, so if I update document #1 in server A and in server B, I get two different revisions, which would be detected as conflicts.
We extract all the new revisions, group all the documents by id and extract all the ids. So far, so good. Let us see the rest of the function:

We get the current existing documents, and then we merge the existing with the new bucket. That requires some explanation.
The first function clause uses list comprehensions in order to ensure that if we couldn't find a document in the DB, we didn't specify a previous version. The second is more complex and require us to understand about key trees. CouchDB defines them as:

Based on that and the dict:from_list arguments, we can deduct what the structure of a revision is. It should be noted that it looks like there is an expectation that the number of revisions wouldn't be too big. This make sense, CouchDB does keep previous versions, but it make no guarantees about their life spans. Also, we should note that we have both revs and rev_tree, I am not quite sure why.
The last line in the zipwith function is prepare_doc_for_new_edit, which is called for each document, let us take a look at it.

This is an example of how condense functional code can get. Let us see what it does:
- Get the new revisions and all the previous revisions
- If the previous revisions has a value, it perform a lookup based on the revision to the revision tree, to get the revision information. (Which answers the question of why we need both revs and rev_tree. revs is an ordered list of revision keys. rev_tree is apparently the metadata).
- Now, the LeafRevsDict contains only the most recent one, so failing to find it in the dictionary means that we trying to update a non current version, so we error.
- I am not sure what a stub is, but the rest of it seems pretty simple.
The last part is interesting, however, we check that we aren't trying to create a new document with the same id as existing one (although we allow it if that document is deleted).
After the call to zipwith(), we have a call to doc_flush_binaries, which we totally ignored so far. It flushes all the documents to disk, it is not really interesting, it is dealing with writing the attachments to disk. I am not quite sure what is going on (it is complex), mostly because I don't follow to which file it is writing. Or, to be rather more exact, I am not following how the attachment can be written to a different file, which is what the code seems to imply.
Anyway, the code there is pretty clear about what is going on, we append to the current file (except if we write to a different file, again, not following that) and update the in memory structure of the document to point to the range of data that each attachment contains.
Let us look at the last part of the couch_db:update_docs:

And is looks like we have hit the place where it is actually happening. We call to the couch_db_updater in order to do the actual update, and we make be asked to retry (this will require more study), in which case we do try.
Let us look at update_docs at the couch_db_updater.

Sigh, I can feel it in my waters, update_docs_int is going to be complex... once more, we will try to divide and conquer.

We start by extracting variable from the Db, then we split the list based on whatever this is local or not. NonRep documents (local) are not replicated, have only a single version and are expected to have very few of them. I still don't know what is the use case for those, however. It looks like DocsList is a list of lists of docs, probably this is the buckets that we had to deal with earlier.
We finish the snipper by extracting the ids from the documents, what comes next?

We lookup the existing documents, and then we merge them into a file info (note that we create a new full_doc_info with just the id if this is a new file). We then create the new doc infos by calling merge_rev_trees. We will ignore the implementation for now.
Finally, we gather all the sequences that we would like to remove. Moving on, we see that we start by updating the local docs and then flushing the trees. Both of which looks promising for my quest to understand how the data is stored on the disk:

update_local_docs just modify the in memory btree, so I am not going to show it, flush_trees should be more interesting, based on just the comment.

This is interesting. What we see here is that we write the attachment references and the document body to the summary stream. That is interesting, because I am still not sure what the summary stream is. Note that here we have the issue of writing to a file during compaction, so we ask the higher level code to retry that again.
We still have to see how the indexes are maintained, but that just update the in memory structure, the real bit happens next:

Here we see our good old friend the btree, and we add/remove the new index to the file. We have seen how this works here. We finish by updating the Db structure with the new values and committing the data.
I now feel that I have a much better understanding on how CouchDB file handling works. Except that I still don't know what this summary stream is. That is a topic for later, though.