Ayende @ Rahien

Refunds available at head office

More xUnit tweaks, dynamic test skipping

For a long time, xUnit’s dev has resisted adding support for skipping a test dynamically. You could create your own Fact class and handle that yourself, but that was quite a lot of work, for something very specific.

In RavenDB, we had the need to dynamically decide whatever the test can run based on the actual test situation, so I decided to add this to our xUnit fork. This turned out to be really simple to do. Just three lines of code Smile

https://github.com/ayende/xunit/commit/82accb4c850a3938187ac334fb73d6e81dc921e3#diff-3c2b9f2cb8392f32456d0bf81151b59fR57

Tweaking xUnit

One of the interesting challenges that we have with RavenDB is the number and duration of our tests.

In particular, we current have over three thousands tests, and they take hours to run. We are doing a lot of stuff there “let us insert million docs, write a map/reduce index, query on that, then do a mass update, see what happens”, etc. We are also doing a lot of stuff that really can’t be emulated easily. If I’m testing replication for a non existent target, I need to check that actual behavior, etc. Oh, and we’re probably doing silly stuff in there, too.

In order to try to increase our feedback cycle times, I made some modifications to xUnit. It is now going to record the test duration of the tests, the results look like that:

image

You can see that Troy is taking too long. In fact, there is a bug that those tests currently expose that result in a timeout exception, that is why they take so long.

But this is just to demonstrate the issue. The real power here is that we also use this when decided how to run the tests. We are simply sorting them by how long they took to run. If we don’t have a record for that, we’ll give them a run time of –1.

This has a bunch of interesting implications:

  • The faster tests are going to run first. That means that we’ll have earlier feedback if we broke something.
  • The new tests (haven’t had chance to run ever) will run first, those are were the problems are more likely anyway.
  • We only run report this for passing tests, that means that we are going to run failed tests first as well.

In addition to that, this will also give us better feedback on what are slow tests are. So we can actually give them some attention and see if they are really required to be slow or they can be fixed.

Hopefully, we can find a lot of the tests that are long, and just split them off into a separate test project, to be run at a later time.

The important thing is, now we have the information to handle this.

What does the test say?

Because RavenDB is a database, a lot of the tests we have to run are pretty long. For example, we need to touch the disk a lot, and we have a lot of networked tests.

that means that running this test suite can take a while. But the default information we get is pretty lousy. Just the test count and that is it. But when a test hang, and they do if we have bugs, it make it very hard to figure out where the culprit is.

So we forked xunit and added a tiny feature to the console runner:

image

Defensive coding is your friend

We just had a failing test:

image

As you can see we assumed that fiddler is running, when it isn’t. Here is the bug:

image

Now, this is great when I am testing things out, and want to check what is going on the wire using Fiddler, but I always have to remember to revert this change, otherwise we will have a failing test and a failing build.

That isn’t very friction free, so I added the following:

image

Now the code is smart enough to not fail the test if we didn’t do things right.

On failing tests

I made a change (deep in the guts of RavenDB), and then I run the tests, and I go this:

image

I love* it when this happens, because it means that there is one root cause that I need to fix, really obvious and in the main code path.

I hate it when there is just one failing test, because it means that this is an edge condition or something freaky like that.

* obviously I would love it more if there were no failing tests.

Testing Rhino Service Bus applications

One of the really nice things about Rhino Service Bus applications is that we have created a structured way to handle inputs and outputs. You have messages coming in and out, as well as the endpoint local state to deal with. You don’t have to worry about how to deal with external integration points, because those are already going over messages.

And when you have basic input/output figured out, you are pretty much done.

For example, let us see the code that handles extending trail licenses in our ordering system:

public class ExtendTrialLicenseConsumer : ConsumerOf<ExtendTrialLicense>
{
    public IDocumentSession Session { get; set; }
    public IServiceBus Bus { get; set; }

    public void Consume(ExtendTrialLicense message)
    {
        var productId = message.ProductId ?? "products/" + message.Profile;
        var trial = Session.Query<Trial>()
            .Where(x => x.Email == message.Email && x.ProductId == productId)
            .FirstOrDefault();

        if (trial == null)
            return;
        
        trial.EndsAt = DateTime.Today.AddDays(message.Days);
        Bus.Send(new NewTrial
        {
            ProductId = productId,
            Email = trial.Email,
            Company = trial.Company,
            FullName = trial.Name,
            TrackingId = trial.TrackingId
        });
    }
}

How do we test something like this? As it turns out, quite easily:

public class TrailTesting : ConsumersTests
{
    protected override void PrepareData(IDocumentSession session)
    {
        session.Store(new Trial
        {
            Email = "you@there.gov",
            EndsAt = DateTime.Today,
            ProductId = "products/nhprof"
        });
    }

    [Fact]
    public void Will_update_trial_date()
    {
        Consume<ExtendTrialLicenseConsumer, ExtendTrialLicense>(new ExtendTrialLicense
        {
            ProductId = "products/nhprof",
            Days = 30,
            Email = "you@there.gov",
        });

        using (var session = documentStore.OpenSession())
        {
            var trial = session.Load<Trial>(1);
            Assert.Equal(DateTime.Today.AddDays(30), trial.EndsAt);
        }
    }

    // more tests here
}

All the magic happens in the base class, though:

public abstract class ConsumersTests : IDisposable
{
    protected IDocumentStore documentStore;
    private IServiceBus Bus = new FakeBus();

    protected ConsumersTests()
    {
        documentStore = new EmbeddableDocumentStore
        {
            RunInMemory = true,
            Conventions =
                {
                    DefaultQueryingConsistency = ConsistencyOptions.QueryYourWrites
                }
        }.Initialize();

        IndexCreation.CreateIndexes(typeof(Products_Stats).Assembly, documentStore);

        Products.Create(documentStore);
        using (var session = documentStore.OpenSession())
        {
            PrepareData(session);
            session.SaveChanges();
        }
    }

    protected T ConsumeSentMessage<T>()
    {
        var fakeBus = ((FakeBus)Bus);
        object o = fakeBus.Messages.Where(x => x.GetType() == typeof(T)).First();

        fakeBus.Messages.Remove(o);
        return (T) o;
    }

    protected void Consume<TConsumer, TMsg>(TMsg msg)
        where TConsumer : ConsumerOf<TMsg>, new()
    {
        var foo = new TConsumer();

        using (var documentSession = documentStore.OpenSession())
        {
            Set(foo, documentSession);
            Set(foo, Bus);
            Set(foo, documentStore);

            foo.Consume(msg);

            documentSession.SaveChanges();
        }
    }

    private void Set<T,TValue>(T foo, TValue value)
    {
        PropertyInfo firstOrDefault = typeof(T).GetProperties().FirstOrDefault(x=>x.PropertyType==typeof(TValue));
        if (firstOrDefault == null) return;
        firstOrDefault.SetValue(foo, value, null);
    }

    protected virtual void PrepareData(IDocumentSession session)
    {
    }

    public void Dispose()
    {
        documentStore.Dispose();
    }
}

And here are the relevant details for the FakeBus implementation:

public class FakeBus : IServiceBus
{
    public List<object>  Messages = new List<object>();

    public void Send(params object[] messages)
    {
        Messages.AddRange(messages);
    }
}

Now, admittedly, this is a fairly raw approach and we can probably do better. This is basically hand crafted auto mocking for consumers, and I don’t like the Consume<TConsumer,TMsg>() syntax very much. But it works, it is simple and it doesn’t really gets in the way.

I won’t say it is the way to go about it, but it is certainly easier than many other efforts that I have seen. We just need to handle the inputs & outputs and have a way to look at the local state, and you are pretty much done.

Structuring your Unit Tests, why?

I am a strong believer in automated unit tests. And I read this post by Phil Haack with part amusement and part wonder.

RavenDB currently has close to 1,400 tests in it. We routinely ask for failing tests from users and fix bugs by writing tests to verify fixes.

But structuring them in terms of source code? That seems to be very strange.

You can take a look at the source code layout of some of our tests here: https://github.com/ayende/ravendb/tree/master/Raven.Tests/Bugs

It is a dumping ground, basically, for tests. That is, for the most part, I view tests as very important in telling me “does this !@#! works or not?” and that is about it. Spending a lot of time organizing them seems to be something of little value from my perspective.

If I need to find a particular test, I have R# code browsing to help me, and if I need to find who is testing a piece of code, I can use Find References to get it.

At the end, it boils down to the fact that I don’t consider tests to be, by themselves, a value to the product. Their only value is their binary ability to tell me whatever the product is okay or not. Spending a lot of extra time on the tests distract from creating real value, shippable software.

What I do deeply care about with regards to structuring the tests is the actual structure of the test. It is important to make sure that all the tests looks very much the same, because I should be able to look at any of them and figure out what is going on rapidly.

I am not going to use the RavenDB example, because that is system software and usually different from most business apps (although we use a similar approach there). Instead, here are a few tests from our new ordering system:

[Fact]
public void Will_send_email_after_trial_extension()
{
    Consume<ExtendTrialLicenseConsumer, ExtendTrialLicense>(new ExtendTrialLicense
    {
        ProductId = "products/nhprof",
        Days = 30,
        Email = "you@there.gov",
    });

    var email = ConsumeSentMessage<NewTrial>();

    Assert.Equal("you@there.gov", email.Email);
}

[Fact]
public void Trial_request_from_same_customer_sends_email()
{
    Consume<NewTrialConsumer, NewTrial>(new NewTrial
    {
        ProductId = "products/nhprof",
        Email = "who@is.there",
        Company = "h",
        FullName = "a",
        TrackingId = Guid.NewGuid()
    });
    Trial firstTrial;
    using (var session = documentStore.OpenSession())
    {
        firstTrial = session.Load<Trial>(1);
    }
    Assert.NotNull(ConsumeSentMessage<SendEmail>());
    
    Consume<NewTrialConsumer, NewTrial>(new NewTrial
    {
        TrackingId = firstTrial.TrackingId,
        Email = firstTrial.Email,
        Profile = firstTrial.ProductId.Substring("products/".Length)
    });

    var email = ConsumeSentMessage<SendEmail>();
    Assert.Equal("Hibernating Rhinos - Trials Agent", email.ReplyToDisplay);
}

As you can probably see, we have a structured way to send input to the system, and we can verify the output and the side affects (creating the trial, for example).

This leads to a system that can be easily tested, but doesn’t force us to spend too much time in the ceremony of tests.

Async tests in Silverlight

One of the things that we do is build a lot of stuff in Silverlight, usually, those things are either libraries or UI. Testing Silverlight was always a problem, but at least there is a solution OOTB for that.

Unfortunately, the moment that you start talking about async tests (for example, you want to run a web server to check things), you need to do things like this, EnqueueCallback, EnqueueConditional and other stuff that makes the test nearly impossible to read.

Luckily for us, Christopher Bennage stopped here for a while and created a solution.

It allows you to take the following sync test:

[Fact]
public void CanUpload()
{
    var ms = new MemoryStream();
    var streamWriter = new StreamWriter(ms);
    var expected = new string('a',1024);
    streamWriter.Write(expected);
    streamWriter.Flush();
    ms.Position = 0;

    var client = NewClient(); 
    client.UploadAsync("abc.txt", ms).Wait();

    var downloadString = webClient.DownloadString("/files/abc.txt");
    Assert.Equal(expected, downloadString);
}

And translate it to:

[Asynchronous]
public IEnumerable<Task> CanUpload()
{
    var ms = new MemoryStream();
    var streamWriter = new StreamWriter(ms);
    var expected = new string('a', 1024);
    streamWriter.Write(expected);
    streamWriter.Flush();
    ms.Position = 0;

    yield return client.UploadAsync("abc.txt", ms);

    var async = webClient.DownloadStringTaskAsync("/files/abc.txt");
    yield return async;

    Assert.AreEqual(expected, async.Result);
}

It makes things so much easier. To set this us, just reference the project and add the following in the App.xaml.cs file:

private void Application_Startup(object sender, StartupEventArgs e)
{
    UnitTestSystem.RegisterUnitTestProvider(new RavenCustomProvider());
    RootVisual = UnitTestSystem.CreateTestPage();
}

And you get tests that are now easy to write and run in Silverlight.

Cut the abstractions by putting test hooks

I have been hearing about testable code for a long time now. It looks somewhat like this, although I had to cut on the number of interfaces along the way.

We go through a lot of contortions to be able to do something fairly simple, avoid hitting a data store in our tests.

This is actually inaccurate, we are putting in a lot of effort into being able to do that without changing production code. There are even a lot of explanations how testable code is decoupled, and easy to change, etc.

In my experience, one common problem is that we put in too much abstraction in our code. Sometimes it actually serve a purpose, but in many cases, it is just something that we do to enable testing. But we still pay the hit in the design complexity anyway.

We can throw all of that away, and keep only what we need to run production code, but that would mean that we would have harder time with the tests. But we can resolve the issue very easily by making my infrastructure aware of testing, such as this example:

image

But now your production code was changed by tests?!

Yes, it was, so? I never really got the problem people had with this, but at this day and age, putting in the hooks to make testing easier just make sense. Yes, you can go with the “let us add abstractions until we can do it”, but it is much cheaper and faster to go with this approach.

Moreover, notice that this is part of the infrastructure code, which I don’t consider as production code (you don’t touch it very often, although of course it has to be production ready), so I don’t have any issue with this.

Nitpicker corner: Let us skip the whole TypeMock discussion, shall we?

TDD fanatic corner: I don’t really care about testable code, I care about tested code. If I have a regression suite, that is just what i need.

Hey you, developer! Learn a bit of business speak!

image  In a comment to one of my posts, Dave Mertens said:

I often wish I had the luxury the throw away the current code and start from scratch. Why. Because my manager ask me how long it takes to fix just this bug. 80% of the bugs (luckily we don't have that many bugs) are fixed within 2 hours including writing tests. Than he asks me how long it takes to rewrite the damn code.

It is interesting to note that the questions you ask select the answers you want.

When you give someone a choice between 2 hours of fixing code vs. 3 days of fixing code, the decision is always going to be in favor of 2 hours.

At one point in my life I was asked if I stopped beating up people. This is a twist on the old “have you stopped beating your wife yet? yes/no” question. I answered in Mu. Then I spent another ten minutes explaining what I meant.

When someone asks me those types of questions, I correct the question, before I answer. Because while you may ask questions to lead the answer in a certain way, there is nothing that says that I must go that way.

The right question is: How often are we hitting on bugs for this same code path? Then asking about how long it takes to fix the whole damn thing.

Oh, and make sure that you can fix it. There is nothing a business likes less than spending time & money on solving a problem and ending up not solving it. Leaving aside what it does to your credibility and the trust relationship with the business.

re: Are you smart enough to do without TDD

Daniel has posted a reply to my post, titling it:  Are you smart enough to do without TDD. I more or less expected to get responses like that, which was why I was hesitant to  post it. Contrary to popular opinion, I don’t really enjoy being controversial.

There are two main points that I object to in his post:

You see, Ayende appears to say that if you're smart enough, you'll just know what code to write, just like that. Ergo, if you don't know, maybe you're not that smart and hence you would need this technique for losers called Test Driven Design/Development.

That is not what I said, please don’t put words in my mouth. What I said was: “The idea behind TDD is to use the tests to drive the design. Well, in this case, I don’t have any design to drive.” Combine this with my concepts & features architecture, where the main tenets is: “A feature creation may not involve any design activity.” and it should be clear why TDD simply doesn’t work for my scenario.

And his attack on Rhino Mocks:

Moq vs Rhino Mocks: he [Ayende, it seems] read the (useless IMO) literature on mocks vs stubs vs fakes, had apparently a clear idea of what to do, and came up with Rhino's awkward, user unfriendly and hard to learn API with a myriad of concepts and options, and a record-replay-driven API (ok, I'm sure it was not his original idea, but certainly it's his impl.) which two years ago seemed to him to stand at the core of mocking. Nowadays not only he learned what I've been saying all along, that "dynamic, strict, partial and stub... No one cares", but also is planning to remove the record / playback API too.

This is just full of misinformation. Let me see how:

  • Rhino Mocks is 5 years old.
  • Rhino Mocks came out for .NET 1.0.
  • Rhino Mocks actually predate most of the mocks vs. stubs debate.

I keep Rhino Mocks updated as new concepts and syntax options comes. Yes, AAA is easier, but AAA relies on having the syntax options that we have in C# 3.0. Rhino Mocks didn’t start from there, it started a lot earlier, and it is a testament to its flexibility that I was able to adapt it to any change along the way.

Oh, and Rhino Mocks was developed with TDD, fully. Still is, for that matter. So I find it annoying that someone attacks it on this grounds without really understanding how it worked.

Scenario Driven Tests

I originally titled this blog post: Separate the scenario under test from the asserts. I intentionally use the terminology scenario under test, instead of calling it class or method under test.

One of the main problem with unit testing is that we are torn between competing forces. One is the usual drive for abstraction and eradication of duplication, the second is clarity of the test itself. Karl Seguin does a good job covering that conflict.

I am dealing with the issue by the simple expedient of forbidding anything but asserts in the test method. And no, I don’t mean something like BDD, where the code under test is being setup in the constructor or the context initialization method.

I tend to divide my tests code into four distinct parts:

  • Scenario under test
  • Scenario executer
  • Test model, represent the state of the application
  • Test code itself, asserting the result of a specific scenario on the test model

The problem is that a single scenario in the application may very well have multiple things that we want to actually test. Let us take the example of authenticating a user, there are several things that happen during the process of authentication, such as the actual authentication, updating the last login date, resetting bad login attempts, updating usage statistics, etc.

I am going to write the code to test all of those scenarios first, and then discuss the roles of each item in the list. I think it will be clearer to discuss it when you have the code in front of you.

We will start with the scenarios:

public class LoginSuccessfully : IScenario
{
public void Execute(ScenarioContext context)
{
context.Login("my-user","swordfish is a bad password");
}
}

public class TryLoginWithBadPasswordTwice : IScenario
{
public void Execute(ScenarioContext context)
{
context.Login("my-user","bad pass");
context.Login("my-user","bad pass");
}
}

public class TryLoginWithBadPasswordTwiceThenTryWithRealPassword : IScenario
{
public void Execute(ScenarioContext context)
{
context.Login("my-user","bad pass");
context.Login("my-user","bad pass");
context.Login("my-user","swordfish is a bad password");
}
}

And a few tests that would show the common usage:

public class AuthenticationTests : ScenarioTests
{
[Fact]
public void WillUpdateLoginDateOnSuccessfulLogin()
{
ExecuteScenario<LoginSuccessfully>();

Assert.Equal(CurrentTime, model.CurrentUser.LastLogin);
}


[Fact]
public void WillNotUpdateLoginDateOnFailedLogin()
{
ExecuteScenario<TryLoginWithBadPasswordTwice>();

Assert.NotEqual(CurrentTime, model.CurrentUser.LastLogin);
}

[Fact]
public void WillUpdateBadLoginCountOnFailedLogin()
{
ExecuteScenario<TryLoginWithBadPasswordTwice>();

Assert.NotEqual(2, model.CurrentUser.BadLoginCount);
}

[Fact]
public void CanSuccessfullyLoginAfterTwoFailedAttempts()
{
ExecuteScenario<TryLoginWithBadPasswordTwiceThenTryWithRealPassword>();

Assert.True(model.CurrentUser.IsAuthenticated);
}
}

As you can see, each of the tests is pretty short and to the point, there is a clear distinction between what we are testing and what is being tested.

Each scenario represent some action in the system which we want to verify behavior for. Those are usually written with the help of a scenario context (or something of the like) with gives the scenario access to the application services required to perform its work. An alternative to the scenario context is to use a container in the tests and supply the application service implementations from there.

The executer (ExecuteScenario<TScenario>() method) is responsible for setting the environment for the scenario, executing the scenario, and cleaning up afterward. It is also responsible for any updates necessary to get the test model up to date.

The test model represent the state of the application after the scenario was executed. It is meant for the tests to be able to assert against. In many cases, you can use the actual model from the application, but there are cases where you would want to augment that with test specific items, to allow easier testing.

And the tests, well, the tests simple execute a scenario and assert on the result.

By abstracting the execution of a scenario into the executer (which rarely change) and providing an easy way of building scenarios, you can get very rapid feedback into test cycle while maintaining testing at a high level.

Also, relating to my previous post, note what we are testing here isn’t a single class. We are testing the system behavior in a given scenario. Note also that we usually want to assert on various aspects of a single scenario as well (such as in the WillNotUpdateLoginDateOnFailedLogin and WillUpdateBadLoginCountOnFailedLogin tests).

Even tests has got to justify themselves

Let us get a few things out of the way:

  • I am not using TDD.
  • I am not using BDD.
  • I am not using Test After.
  • I am not ignoring testing.

I considered not posting this post, because of the likely response, but it is something that I think it worth at least discussion. The event that made me decide to post this is the following bug:

public bool IsValid
{
get { return string.IsNullOrEmpty(Url); }
}

As you can probably guess, I have an inverted conditional here. The real logic is that the filter is valid if the Url is not empty, not the other way around.

When I found the bug, I briefly considered writing a test for it, but it struck me as a bad decision. This is code that I don’t see any value in testing. It is too stupid to test, because I won’t have any ROI from the tests.  And yes, I am saying that after seeing that the first time I wrote the code it had a bug.

The idea behind TDD is to use the tests to drive the design. Well, in this case, I don’t have any design to drive. In recent years, I have moved away from the tenets of TDD toward a more system oriented testing system.

I don’t care about testing a specific class, I want to test the entire system as whole. I may switch some parts of the infrastructure (for example, change the DB to in memory one), for perf sake, but I usually try to test an entire component at a time.

My components may be as small as a single class or as big as the entire NH Prof sans the actual UI pieces.  I have posted in the past, showing how I implement features for NH Prof, including the full source code for the relevant sections. Please visit the link, it will probably make more sense to you afterward. It is usually faster, easier and more convenient to write a system test than to try to figure out how to write a unit test for the code.

Now, let us look at why people are writing tests:

  • Higher quality code
  • Safe from regressions
  • Drive design

Well, as I said, I really like tests, but my method of designing software is no longer tied to a particular class. I have the design of the class handed to me by a higher authority (the concept), so that is out. Regressions are handled quite nicely using the tests that I do write.

What about the parts when I am doing design, when I am working on a new concept?

Well, there are two problems here:

  • I usually try several things before I settle down on a final design. During this bit of churn, it is going to take longer to do things with tests.
  • After I have a design finalized, it is still easier to write a system level test than write unit tests for the particular implementation.

As a matter of fact, in many cases, I don’t really care about the implementation details of a feature, I just want to know that the feature works. As a good example, let us take a look at this test:

public class CanGetDurationOfQueries : IntegrationTestBase
{
[Fact]
public void QueriesSpecifyTheirDuration()
{
ExecuteScenarioInDifferentAppDomain<SelectBlogByIdUsingCriteria>();
var first = model.RecentStatements
.ExcludeTransactions()
.First();
Assert.NotNull(first.DurationViewModel.Inner.Value);

}
}

NH Prof went through three different ways of measuring the duration of a query. The test didn’t need to change. I have a lot of tests that work in the same manner. Specifying the final intent, rather than specifying each individual step.

There are some parts in which I would use Test First, usually parts that I have high degree of uncertainty about.  The “show rows from query” feature in NH Prof was develop using Test First, because I had absolutely no idea how to approach it.

But most of the time, I have a pretty good idea where I am and where I am going, and writing unit tests for every miniscule change is (for lack of a better phrase) hurting my style.

Just about any feature in NH Prof is covered in tests, and we are confident enough in our test coverage to release on every single commit.

But I think that even a test has got to justify its existence, and in many cases, I see people writing tests that have no real meaning. They duplicate the logic in a single class or method. But that isn’t what I usually care about. I don’t care about what a method or a class does.

I care about what the overall behavior is. And I shaped my tests to allow me to assert just that. I’ll admit that NH Prof is somewhat of a special case, since you have a more or less central location that you can navigate form which to everything else. In most systems, you don’t have something like that.

But the same principle remains, if you setup your test environment so you are testing the system, it is going to be much easier to test the system. It isn’t a circular argument. Let us take a simple example of an online shop and wanting to test the “email on order confirmed” feature.

One way of doing this would be to write a test saying that when the OrderConfirmed message arrive, a SendEmail message is sent. And another to verify that SendEmail message actually send an email.

I would rather write something like this, however:

[Fact]
public void WillSendEmailOnOrderConfirmation()
{
// setup the system using an in memory bus
// load all endpoints and activate them
// execute the given scenario
ExecuteScenario<BuyProductUsingValidCreditCard>();

var confirmation = model.EmailSender.EmailsToSend.FirstOrDefault(x=>x.Subject.Contains("Order Confirmation");
Assert.NotNull(confirmation);
}

I don’t care about implementation, I just care about what I want to assert.

But I think that I am getting side tracked to another subject, so I’ll stop here and post about separating asserts from their scenarios at another time.

Assert.True is the tool of last resort

I was just reviewing someone’s code, and as I was browsing through the tests, I run into the following anti pattern, just about all the asserts were specified as:

Assert.True(pet.Owner == "Baz");

The problem with that is that this is a perfect case of “let’s create a test that will tell us nothing when it is failing”.

It seems like a minor thing, to use the more explicit version:

Assert.Equal("Baz", pet.Owner);

But it isn’t, when this assert fails, we are going to get a lovely error message telling us exactly what went wrong. From maintainability standpoint, being able to see why a test failed without debugging it is not a plus, it is essential.

Assert.True is a tool of last resort, don’t use it if you have any choice at all.

Fixing test failures using binary search

With Rhino Queues, I have a very serious problem. While all the tests would run if they were run individually, running them in concrete would cause them to fail. That is a pretty common case of some test affecting the global state of the system (in fact, the problem is one test not releasing resources and all other tests failing because they tried to access a locked resource).

The problem is finding out which. By the way, note that I am making a big assumption here, that there is only one test that does so, but for the purposes of this discussion, it doesn’t matter, the technique works for both options.

At the heart of it, we have the following assumption:

If you have tests that cause failure, and you don’t know which, start by deleting tests.

In practice, it means:

  1. Run all unit tests
  2. If the unit tests failed:
    • Delete a unit test file with all its tests
    • Go to first step
  3. If the unit tests passed:
    • Revert deletion of a unit test file in reverse order to their deletion
    • Go to step 1

When you have only a single file, you have pinpointed the problem.

In my case, unit tests started to consistently pass when I deleted all of those ( I took a shortcut and delete entire directories ):

image

So I re-introduced UsingSubQueues, and… the tests passed.

So I re-introduced Storage/CanUseQueue and… the tests passed.

This time, I re-introduced Errors, because I have a feeling that QueueIsAsync may cause that, and… the tests FAILED.

We have found the problem. Looking at the Errors file, it was clear that it wasn’t disposing resources cleanly, and that was a very simple fix.

The problem wasn’t the actual fix, the problem was isolating it.

With the fix… the tests passed. So I re-introduced all the other tests and run the full build again.

Everything worked, and I was on my way :-)

Challenge: multi threaded test failure

On one of my machines, I can consistently get the following test failure, because of an exception throw here:

image

As you can see, we get an ObjectDisposedException here, which cause the test to fail. There is only one problem with that:

image

We only dispose the queue manager on the dispose method, and that can only be called if the test itself was completed.

On the face of it, it looks like I am doing things correctly, but there is a bug here. The CLR’s threading semantics are not broken, even though on first glance it appears so.

As a side note, people keep asking me how I blog so much. This post is a perfect example. When I started writing this blog post, I was truly stamped. By the time I finished, I already noticed where the bug was and fixed it.

Can you tell?

Don’t mock my integration tests

Dror from TypeMock has managed to capture the essence of my post about unit tests vs. integration tests quite well:

Unit tests runs faster but integration tests are easier to write.

Unfortunately, he draws an incorrect conclusion out of that;

There is however another solution instead of declaring unit testing as hard to write - use an isolation framework which make writing unit test much easier.

And the answer to that is… no, you can’t do that. Doing something like that put you back in the onerous position of unit test, where you actually have to understand exactly what is going on and have to deal with that. With an integration test, you can assert directly on the end result, which is completely different than what I would have to do if I wanted to mock a part of the system. A typical integration test looks something like:

public class SelectNPlusOne : IScenario
{
    public void Execute(ISessionFactory factory)
    {
        using (var session = factory.OpenSession())
        using (var tx = session.BeginTransaction())
        {
            var blog = session.CreateCriteria(typeof(Blog))
                .SetMaxResults(1)
                .UniqueResult<Blog>();//1
            foreach (var post in blog.Posts)//2
            {
                Console.WriteLine(post.Comments.Count);// SELECT N
            }
            tx.Commit();
        }
    }
}


[Fact]
public void AllCallsToLoadCommentsAreIdentical()
{
    ExecuteScenarioInDifferentAppDomain<SelectNPlusOne>();
    var array = model.Sessions[0]
        .SessionStatements
        .ExcludeTransactions()
        .Skip(2)
        .ToArray();
    Assert.Equal(10, array.Length);
    foreach (StatementModel statementModel in array)
    {
        Assert.Equal(statementModel.RawSql, array[0].RawSql);
    }
}

This shows how we can assert on the actual end model of the system, without really trying to deal with what is actually going on. You cannot introducing any mocking to the mix without significantly hurting the clarity of the code.

Unit tests vs. Integration tests

I can sum it up pretty easily:

Unit tests:
65 passed, 0 failed, 5 skipped, took 7.49 seconds.

Integration tests:
45 passed, 0 failed, 0 skipped, took 284.43 seconds.

Except, there is one thing that you aren’t seeing here. Writing an integration tests is about as straightforward as they come. Execute the user scenario, assert the result. There is no real knowledge required to write the test, and they are very easy to read and understand.

Unit tests, however, often require you to understand how the software work in much finer detail, and they often make no sense to someone who isn’t familiar with the software.

NHibernate Unit Testing

When using NHibernate we generally want to test only three things, that properties are persisted, that cascade works as expected and that queries return the correct result. In order to do all of those, we generally have to talk to a real database, trying to fake any of those at this level is futile and going to be very complicated.

We can either use a standard RDBMS or use an in memory database such as SQLite in order to get very speedy tests.

I have a pretty big implementation of a base class for unit testing NHibernate in Rhino Commons, but that has so many features that I forget how to use it sometimes. Most of those features, by the way, are now null & void because we have NH Prof, and can easily see what is going on without resorting to the SQL Profiler.

At any rate, here is a very simple implementation of that base class, which gives us the ability to execute NHibernate tests in memory.

public class InMemoryDatabaseTest : IDisposable
{
	private static Configuration Configuration;
	private static ISessionFactory SessionFactory;
	protected ISession session;

	public InMemoryDatabaseTest(Assembly assemblyContainingMapping)
	{
		if (Configuration == null)
		{
			Configuration = new Configuration()
				.SetProperty(Environment.ReleaseConnections,"on_close")
				.SetProperty(Environment.Dialect, typeof (SQLiteDialect).AssemblyQualifiedName)
				.SetProperty(Environment.ConnectionDriver, typeof(SQLite20Driver).AssemblyQualifiedName)
				.SetProperty(Environment.ConnectionString, "data source=:memory:")
				.SetProperty(Environment.ProxyFactoryFactoryClass, typeof (ProxyFactoryFactory).AssemblyQualifiedName)
				.AddAssembly(assemblyContainingMapping);

			SessionFactory = Configuration.BuildSessionFactory();
		}

		session = SessionFactory.OpenSession();

		new SchemaExport(Configuration).Execute(true, true, false, true, session.Connection, Console.Out);
	}

	public void Dispose()
	{
		session.Dispose();
	}
}

This just set up the in memory database, the mappings, and create a session which we can now use. Here is how we use this base class:

public class BlogTestFixture : InMemoryDatabaseTest
{
	public BlogTestFixture() : base(typeof(Blog).Assembly)
	{
	}

	[Fact]
	public void CanSaveAndLoadBlog()
	{
		object id;

		using (var tx = session.BeginTransaction())
		{
			id = session.Save(new Blog
			{
				AllowsComments = true,
				CreatedAt = new DateTime(2000,1,1),
				Subtitle = "Hello",
				Title = "World",
			});

			tx.Commit();
		}

		session.Clear();


		using (var tx = session.BeginTransaction())
		{
			var blog = session.Get<Blog>(id);

			Assert.Equal(new DateTime(2000, 1, 1), blog.CreatedAt);
			Assert.Equal("Hello", blog.Subtitle);
			Assert.Equal("World", blog.Title);
			Assert.True(blog.AllowsComments);

			tx.Commit();
		}
	}
}

Pretty simple, ah?

Relying on hash code implementation is BAD – part II

To be truthful, I never thought that I would have a following for this post 4 years later, but I run into that today.

The following is a part of an integration test for NH Prof:

Assert.AreEqual(47, alerts[new StatementAlert(new NHProfDispatcher())
{
	Title = "SELECT N+1"
}]);

I am reviewing all our tests now, and I nearly choked on that one. I mean, who was stupid enough to write code like this?  I mean, yes, I can understand what it is doing, sort of, but only because I have a dawning sense of horror when looking at it.

I immediately decided that the miscreant that wrote that piece of code should be publically humiliated and  chewed on by a large dog.

SVN Blame is a wonderful thing, isn’t it?

image

Hm… there is a problem here.

Actually, there are a couple of problems here. One is that we have a pretty clear indication that we have a historic artifact here. Just look at the number of version that are shown in just this small blame window. This is good enough reason to start doing full fledged ancestory inspection. The test has started life as:

[TestFixture]
public class AggregatedAlerts:IntegrationTestBase
{
	[Test]
	public void Can_get_aggregated_alerts_from_model()
	{
		ExecuteScenarioInDifferentAppDomain<Scenarios.ExecutingTooManyQueries>();

		var alerts = observer.Model.Sessions[1].AggregatedAlerts;
		Assert.AreEqual(47, alerts["SELECT N+1"]);
		Assert.AreEqual(21, alerts["Too many database calls per session"]);
	}
}

Which I think is reasonable enough. Unfortunately, it looks like somewhere along the way, someone had taken the big hammer approach to this. The code now looks like this:

Assert.AreEqual(47, alerts.First(x => x.Key.Title == "SELECT N+1"));

Now this is readable.

Oh, for the nitpickers, using hash code evaluation as the basis of any sort of logic is wrong. That is the point of this post. It is a non obvious side affect that will byte* you in the ass.

* intentional misspelling

The difference between Infrastructure & Application

Recently I am finding myself writing more and more infrastructure level code. Now, there are several reasons for that, mostly because the architectural approaches that I advocate don’t have a good enough infrastructure in the environment that I usually work with.

Writing infrastructure is both fun & annoying. It is fun because usually you don’t have business rules to deal with, it is annoying because it take time to get it to do something that will give the business some real value.

That said, there are some significant differences between writing application level code and infrastructure level code. For that matter, I usually think about this as:

image

Infrastructure code is usually the base, it provides basic services such as communication, storage, thread management, etc. It should also provide strong guarantees regarding what it is doing, it should be simple, understandable and provide the hooks to understand what happens when things go wrong.

Framework code is sitting on top of the infrastructure, and provide easy to use semantics on top of that. They usually take away some of the options that the infrastructure give you in order to present a more focused solution for a particular scenario.

App code is even more specific than that, making use of the underlying framework to deal with much of the complexities that we have to deal with.

Writing application code is easy, it is a single purpose piece of code. Writing framework and infrastructure code is harder, they have much more applicability.

So far, I don’t believe that I said anything new.

What is important to understand is that practices that works for application level code does not necessarily work for infrastructure code. A good example would be this nasty bit of work. It doesn’t read very well, and it has some really long methods, and… it handle a lot of important infrastructure concerns that you have to deal with. For example, it is completely async, has good error handling and reporting and it has absolutely no knowledge about what exactly it is doing. That is left for higher level pieces of the code. Trying to apply application code level of practices to that will not really work, different constraints and different requirements.

By the same token, testing such code follow a different pattern than testing application level code. Tests are often more complex, requiring more behavior in the test to reproduce real world scenarios. And the tests can rarely be isolated bits, they usually have to include significant pieces of the infrastructure. And what they test can be complex enough as well.

Different constraints and different requirements.

Test refacotring

I just posted about a horribly complicated test, I thought I might as well share the results of its refactoring:

[TestFixture]
public class IndexedEmbeddedAndCollections : SearchTestCase
{
	private Author a;
	private Author a2;
	private Author a3;
	private Author a4;
	private Order o;
	private Order o2;
	private Product p1;
	private Product p2;
	private ISession s;
	private ITransaction tx;

	protected override IList Mappings
	{
		get
		{
			return new string[]
			{
				"Embedded.Tower.hbm.xml",
				"Embedded.Address.hbm.xml",
				"Embedded.Product.hbm.xml",
				"Embedded.Order.hbm.xml",
				"Embedded.Author.hbm.xml",
				"Embedded.Country.hbm.xml"
			};
		}
	}

	protected override void OnSetUp()
	{
		base.OnSetUp();

		a = new Author();
		a.Name = "Voltaire";
		a2 = new Author();
		a2.Name = "Victor Hugo";
		a3 = new Author();
		a3.Name = "Moliere";
		a4 = new Author();
		a4.Name = "Proust";

		o = new Order();
		o.OrderNumber = "ACVBNM";

		o2 = new Order();
		o2.OrderNumber = "ZERTYD";

		p1 = new Product();
		p1.Name = "Candide";
		p1.Authors.Add(a);
		p1.Authors.Add(a2); //be creative

		p2 = new Product();
		p2.Name = "Le malade imaginaire";
		p2.Authors.Add(a3);
		p2.Orders.Add("Emmanuel", o);
		p2.Orders.Add("Gavin", o2);


		s = OpenSession();
		tx = s.BeginTransaction();
		s.Persist(a);
		s.Persist(a2);
		s.Persist(a3);
		s.Persist(a4);
		s.Persist(o);
		s.Persist(o2);
		s.Persist(p1);
		s.Persist(p2);
		tx.Commit();

		tx = s.BeginTransaction();

		s.Clear();
	}

	protected override void OnTearDown()
	{
		// Tidy up
		s.Delete("from System.Object");

		tx.Commit();

		s.Close();

		base.OnTearDown();
	}

	[Test]
	public void CanLookupEntityByValueOfEmbeddedSetValues()
	{
		IFullTextSession session = Search.CreateFullTextSession(s);

		QueryParser parser = new MultiFieldQueryParser(new string[] { "name", "authors.name" }, new StandardAnalyzer());

		Lucene.Net.Search.Query query = parser.Parse("Hugo");
		IList result = session.CreateFullTextQuery(query).List();
		Assert.AreEqual(1, result.Count, "collection of embedded (set) ignored");
	}

	[Test]
	public void CanLookupEntityByValueOfEmbeddedDictionaryValue()
	{
		IFullTextSession session = Search.CreateFullTextSession(s);
		
		//PhraseQuery
		TermQuery  query = new TermQuery(new Term("orders.orderNumber", "ZERTYD"));
		IList result = session.CreateFullTextQuery(query).List();
		Assert.AreEqual(1, result.Count, "collection of untokenized ignored");
		query = new TermQuery(new Term("orders.orderNumber", "ACVBNM"));
		result = session.CreateFullTextQuery(query).List();
		Assert.AreEqual(1, result.Count, "collection of untokenized ignored");
	}

	[Test]
	[Ignore]
	public void CanLookupEntityByUpdatedValueInSet()
	{
		Product p = s.Get<Product>(p1.Id);
		p.Authors.Add(s.Get<Author>(a4.Id));
		tx.Commit();

		QueryParser parser = new MultiFieldQueryParser(new string[] { "name", "authors.name" }, new StandardAnalyzer());
		IFullTextSession session = Search.CreateFullTextSession(s);
		Query query = parser.Parse("Proust");
		IList result = session.CreateFullTextQuery(query).List();
		//HSEARCH-56
		Assert.AreEqual(1, result.Count, "update of collection of embedded ignored");

	}
}

It is almost the same as before, the changed are mainly structural, but it is so much easier to read, understand and debug.

A horrible, HORRIBLE, horrible test

I think, without question, that this is one of the most horrible nasty tests that I have seen. It has been ported as is from the java code, and you can literally feel the nastiness in reading it:

[Test]
public void IndexedEmbeddedAndCollections()
{
	Author a = new Author();
	a.Name = "Voltaire";
	Author a2 = new Author();
	a2.Name = "Victor Hugo";
	Author a3 = new Author();
	a3.Name = "Moliere";
	Author a4 = new Author();
	a4.Name = "Proust";

	Order o = new Order();
	o.OrderNumber = "ACVBNM";

	Order o2 = new Order();
	o2.OrderNumber = "ZERTYD";

	Product p1 = new Product();
	p1.Name = "Candide";
	p1.Authors.Add(a);
	p1.Authors.Add(a2); //be creative

	Product p2 = new Product();
	p2.Name = "Le malade imaginaire";
	p2.Authors.Add(a3);
	p2.Orders.Add("Emmanuel", o);
	p2.Orders.Add("Gavin", o2);


	ISession s = OpenSession();
	ITransaction tx = s.BeginTransaction();
	s.Persist(a);
	s.Persist(a2);
	s.Persist(a3);
	s.Persist(a4);
	s.Persist(o);
	s.Persist(o2);
	s.Persist(p1);
	s.Persist(p2);
	tx.Commit();

	s.Clear();

	IFullTextSession session = Search.CreateFullTextSession( s );
	tx = session.BeginTransaction();

	QueryParser parser = new MultiFieldQueryParser( new string[] { "name", "authors.name" }, new StandardAnalyzer() );

	Lucene.Net.Search.Query query = parser.Parse( "Hugo" );
	IList result = session.CreateFullTextQuery( query ).List();
	Assert.AreEqual( 1, result.Count, "collection of embedded ignored" );

	//update the collection
	Product p = (Product) result[0];
	p.Authors.Add( a4 );

	//PhraseQuery
	query = new TermQuery( new Term( "orders.orderNumber", "ZERTYD" ) );
	result = session.CreateFullTextQuery( query).List();
	Assert.AreEqual( 1, result.Count, "collection of untokenized ignored" );
	query = new TermQuery( new Term( "orders.orderNumber", "ACVBNM" ) );
	result = session.CreateFullTextQuery( query).List();
	Assert.AreEqual( 1, result.Count, "collection of untokenized ignored" );

	tx.Commit();

	s.Clear();

	tx = s.BeginTransaction();
	session = Search.CreateFullTextSession( s );
	query = parser.Parse( "Proust" );
	result = session.CreateFullTextQuery( query ).List();
	//HSEARCH-56
	Assert.AreEqual( 1, result.Count, "update of collection of embedded ignored" );

	// Tidy up
	s.Delete(a);
	s.Delete(a2);
	s.Delete(a3);
	s.Delete(a4);
	s.Delete(o);
	s.Delete(o2);
	s.Delete(p1);
	s.Delete(p2);
	tx.Commit();

	s.Close();
}

Point to anyone who want to start a list of how many good test metrics this test violates.

Oh, and right now, this test fails.

Mocking NHibernate

My recent post caused quite a few comments, many of them about two major topics. The first is the mockability of my approach and the second is regarding the separation of layers.

This post is about the first concern. I don’t usually mock my database when using NHibernate. I use an in memory database and leave it at that. There are usually two common behaviors for loading data from the database. The first is when you need to load by primary key, usually using either Get or Load, the second is using a query.

If we are using Get or Load, this is extremely easy to mock, so I won’t touch that any further.

If we are using a query, I categorically don’t want to mock that. Querying is a business concern, and should be treated as such. Setting up an in memory database to be used with NHibernate is easy, you’ll have to wait until the 28th for the post about that to be published, but it is basically ten lines of code that you stick in a base class.

As such, I have no real motivation to try to abstract that away, I gain a lot, and lose nothing.

Opening seams for testing

While testing Rhino Service Bus, I run into several pretty annoying issues. The most consistent one is that the actual work done by the bus is done on another thread, so we have to have some synchronization mechanisms build into the bus just so we would be able to get consistent tests.

In some tests, this is not really needed, because I can utilize the existing synchronization primitives in the platform. Here is a good example of that:

   1: [Fact]
   2: public void when_start_load_balancer_that_has_secondary_will_start_sending_heartbeats_to_secondary()
   3: {
   4:     using (var loadBalancer = container.Resolve<MsmqLoadBalancer>())
   5:     {
   6:         loadBalancer.Start();
   7:  
   8:         Message peek = testQueue2.Peek();
   9:         object[] msgs = container.Resolve<IMessageSerializer>().Deserialize(peek.BodyStream);
  10:  
  11:         Assert.IsType<HeartBeat>(msgs[0]);
  12:         var beat = (HeartBeat)msgs[0];
  13:         Assert.Equal(loadBalancer.Endpoint.Uri, beat.From);
  14:     }
  15: }

Here, the synchronization is happening in line 8, Peek() will wait until a message arrive in the queue, so we don’t need to manage that ourselves.

This is not always possible, however, and this actually breaks down for more complex cases. For example, let us inspect this test:

   1: [Fact]
   2: public void Can_ReRoute_messages()
   3: {
   4:     using (var bus = container.Resolve<IStartableServiceBus>())
   5:     {
   6:         bus.Start();
   7:         var endpointRouter = container.Resolve<IEndpointRouter>();
   8:         var original = new Uri("msmq://foo/original");
   9:  
  10:         var routedEndpoint = endpointRouter.GetRoutedEndpoint(original);
  11:         Assert.Equal(original, routedEndpoint.Uri);
  12:  
  13:         var wait = new ManualResetEvent(false);
  14:         bus.ReroutedEndpoint += x => wait.Set();
  15:  
  16:         var newEndPoint = new Uri("msmq://new/endpoint");
  17:         bus.Send(bus.Endpoint,
  18:                  new Reroute
  19:                  {
  20:                      OriginalEndPoint = original,
  21:                      NewEndPoint = newEndPoint
  22:                  });
  23:  
  24:         wait.WaitOne();
  25:         routedEndpoint = endpointRouter.GetRoutedEndpoint(original);
  26:         Assert.Equal(newEndPoint, routedEndpoint.Uri);
  27:     }
  28: }

Notice that we are making explicit synchronization in the tests, line 14 and line 24. ReroutedEndpoint is an event that we added for the express purpose of allowing us to write this test.

I remember several years ago the big debates on whatever it is okay to change your code to make it more testable. I haven’t heard this issue raised in a while, I guess that the argument was decided.

As a side note, in order to get rerouting to work, we had to change the way that Rhino Service Bus viewed endpoints. That was a very invasive change, and we did it in less than two hours, but simply making the change and fixing the tests where they broke.