Ayende @ Rahien

It's a girl

Sometimes I have code blinders on

This is a piece of code that I am using in RDB, at some point, it threw a null reference exception:

image

I am ashamed to admit that I started doing some really deep debugging to understand the bug (this happen only under very strange circumstances).

When I figured out what it was, I was deeply ashamed, this is easy.

Actual scenario testing with Raven

Yesterday I posted about doing scenario testing with Raven, and I showed the concept of what i am doing. This time, I wanted to show what I am actually talking about, and how this is implemented. Here are the current scenarios for Raven.

image

Each scenario is looks something like this (showing PutAndGetDocument here):

image

And the second request:

image

The scenarios are being picked up using:

public class AllScenariosWithoutExplicitScenario
{
    [Theory]
    [PropertyData("ScenariosWithoutExplicitScenario")]
    public void Execute(string file)

        new Scenario(Path.Combine(ScenariosPath, file+".saz")).Execute();
    }

    public static string ScenariosPath
    {
        get
        {
            return Directory.Exists(@"..\..\bin") // running in VS
                       ? @"..\..\Scenarios" : @"..\Raven.Scenarios\Scenarios";
        }
    }

    public static IEnumerable<object[]> ScenariosWithoutExplicitScenario
    {
        get
        {
            foreach (var file in Directory.GetFiles(ScenariosPath,"*.saz"))
            {
                if (typeof(Scenario).Assembly.GetType("Raven.Scenarios." +
                          Path.GetFileNameWithoutExtension(file) +"Scenario") != null)
                    continue;
                yield return new object[] {Path.GetFileNameWithoutExtension(file)};
            };
        }
    }
}

There are two reasons why I am ignoring explicit scenarios. Adding a class for a specific scenario allows me to run the scenario in the debugger, and also allow me to selectively skip certain scenarios if I need to.

Scenario.Execute is fairly involved, it parse the Fiddler’s saz file, build appropriate request and compare to the expect response, it is also smart enough to handle changing things like ETags and pass them along.

The end result is that I can very easily add new scenarios as I get new features to that requires tests.

Is select (System.Uri) broken?

I can’t really figure out what is going on!

Take a look:

image

The value :

http://localhost:58080/indexes/categoriesByName?query=CategoryName%3ABeverages&start=0&pageSize=25

And the problem is that I can’t figure out why calling this once would fail, but calling it the second time would fail. That is leaving aside the fact this looks like a pretty good url to me.

Any ideas? This is perfectly reproducible on one project, but I can’t reproduce this on another project.

Updates:

  • This is System.Uri
  • The issue that it fails the first time, and works the second!
  • The exception is:
  • System.ArgumentNullException: Value cannot be null.
    Parameter name: str
       at System.Security.Permissions.FileIOPermission.HasIllegalCharacters(String[] str)
       at System.Security.Permissions.FileIOPermission.AddPathList(FileIOPermissionAccess access, AccessControlActions control, String[] pathListOrig, Boolean checkForDuplicates, Boolean needFullPath, Boolean copyPathList)
       at System.Security.Permissions.FileIOPermission..ctor(FileIOPermissionAccess access, String path)
       at System.Uri.ParseConfigFile(String file, IdnScopeFromConfig& idnStateConfig, IriParsingFromConfig& iriParsingConfig)
       at System.Uri.GetConfig(UriIdnScope& idnScope, Boolean& iriParsing)
       at System.Uri.InitializeUriConfig()
       at System.Uri.InitializeUri(ParsingError err, UriKind uriKind, UriFormatException& e)
       at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
       at System.Uri..ctor(String uriString)
       at Raven.Scenarios.Scenario.GetUri_WorkaroundForStrangeBug(String uriString) in C:\Work\ravendb\Raven.Scenarios\Scenario.cs:line 155

  • This is a console application.

Okay, I can reproduce this now, here it how it got there:

public class Strange : MarshalByRefObject
{
    public void WTF()
    {
        Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile);
        new Uri("http://localhost:58080/indexes/categoriesByName?query=CategoryName%3ABeverages&start=0&pageSize=25");
    }
}

public class Program
{
    private static void Main()
    {
        var instanceAndUnwrap = (Strange) AppDomain.CreateDomain("test", null, new AppDomainSetup
        {
            ConfigurationFile = ""
        }).CreateInstanceAndUnwrap("ConsoleApplication5", "ConsoleApplication5.Strange");
        instanceAndUnwrap.WTF();
    }
}

That took some time to figure out.

The reason that I got this issue is that I am running this code as part of a unit test, and the xUnit seems to be running my system under the following conditions.

Scenario based testing in Rhino DivanDB

Here is a unit test testing Rhino DivanDB:

image

Here is a test that tests the same thing, using scenario based approach:

image

What are those strange files? Well, let us take a pick at the first one:

0_PutDoc.request 0_PutDoc.response

PUT /docs HTTP/1.1
Content-Length: 283

{
    "_id": "ayende",
    "email": "ayende@ayende.com",
    "projects": [
        "rhino mocks",
        "nhibernate",
        "rhino service bus",
        "rhino divan db",
        "rhino persistent hash table",
        "rhino distributed hash table",
        "rhino etl",
        "rhino security",
        "rampaging rhinos"
    ]
}

HTTP/1.1 201 Created
Connection: close
Content-Length: 15
Content-Type: application/json; charset=utf-8
Date: Sat, 27 Feb 2010 08:12:08 GMT
Server: Kayak

{"id":"ayende"}

Those are just test files, corresponding to the request and the expected response.

RBD’s turn those into tests, by issuing each request in turn and asserting on the actual output. This is slightly more complicated than it seems, because some requests contains things like dates, or generated guids. The scenario runner is aware of those and resolve those automatically. Another issue is dealing with potentially stale requests, especially because we are issuing requests on the same data immediately Again, this is something that the scenario runner handles internally, and we don’t have to worry about it.

There are some things here that may not be immediately apparent. We are doing pure state base testing, in fact, this is black box testing. The scenarios define the external API of the system, which is a nice addition.

We don’t care about the actual implementation, look at the unit test, we need to setup a db instance, start the background threads, etc. If I modify the DocumentDatabase constructor, or the initialization process, I need to touch each test that uses it. I can try to encapsulate that, but in many cases, you really can’t do that upfront. SpinBackgroundWorkers, for example, is something that is required in only some of the unit tests, and it is a late addition. So I would have to go and add it to each of the tests that require it.

Because the scenarios don’t have any intrinsic knowledge about the server, any require change is something that you would have to do in a single location, nothing more.

Users can send me a failure scenarios. I am using this extensively with NH Prof (InitializeOfflineLogging), and it is amazing. When a user runs into a problem, I can tell them, please send me a Fiddler trace of the issue, and I can turn that into a repeatable test in a matter of moments.

I actually thought about using Fiddler’s saz files as the format for my scenarios, but I would have to actually understand them first. :-) It doesn’t look hard, but flat files seemed easier still.

Actually, I went ahead and made the modification, because now i have even less friction, just record a Fiddler session, drop it in a folder, and I have a test. Turned out that the Fiddler format is very easy to work with.

Challenge: Robust enumeration over external code

Here is an interesting little problem:

public class Program
{
    private static void Main()
    {
        foreach (int i in RobustEnumerating(Enumerable.Range(0, 10), FaultyFunc))
        {
            Console.WriteLine(i);
        }
    }

    public static IEnumerable<T> RobustEnumerating<T>(
        IEnumerable<T> input,Func<IEnumerable<T>, IEnumerable<T>> func)
    {
        // how to do this?
        return func(input);

    }

    public static IEnumerable<int> FaultyFunc(IEnumerable<int> source)
    {
        foreach (int i in source)
        {
            yield return i/(i%2);
        }
    }
}

This code should not throw, but print:

1
3
5
7
9

Can you make this happen? You can only change the RobustEnumerating method, nothing else in the code

Git is teh SUCK

Today, I had two separate incidents in which my git repository was corrupted! To the point that nothing, git fsck or git reflog or git just-work-or-i-WILL-shoot-you didn’t work.

The first time, there was no harm done, I just cloned my repository again, and moved on. The second time that it happened, it was after I had ~10 commits locally that weren’t pushed. I had my working copy intact, but I didn’t want to lose the history. I asked around, and got a couple of suggestion to move to mercurial instead, because git has no engineering behind it.

Based on that feedback, I …

Oh, wait, it isn’t this sort of a post.

What I actually did was setup Process Monitor and watched what git.exe was actually doing. I noticed that it was searching for a .git/objects directory, and couldn’t find it anywhere in the path. Indeed, looking there myself, it appeared clear that there was no objects directory under the .git dir. And checking in other repositories showed that they had it. So now I knew why, but I still had no idea who the #*@# decided to randomly @#$%( my repository, totally derailing my productivity.

That is where having multiple personalities come in handy, he did it. The one that isn’t writing this blog post, at some point during the day, there was a need to zip the repository and send it somewhere. Since the working copy is full of crap, that idiot issued the following:

ls -R obj | rm –F

ls -R bin | rm –F

(Not the exact commands, the idiot used the UI to do a search & delete).

You can guess the following from there. At this point, having come to this astounding discovery, I heroically went to the recycle bin, found the objects directory there, and rescued it! All is well, except that there is still a thrashing for uncommon stupidity owed.

And remember, it wasn’t me, it was the other one who did that!

And yes, the spelling mistake in the title is intentional.

Where do git repositories go when they die?

My RDB repository started giving me this error;

fatal: Not a git repository (or any of the parent directories): .git

I don’t think that I did anything to it, but it is still dead.

image

Any ideas how to recover this?

Update: Found why Git doesn't like my repository, it doesn't have .git\objects, but I have no idea where it could have gone to… or why.

Rhino DivanDB – A full coding sample – Embedded

Rhino Divan DB is going to come in at least two forms, embedded, and remote. The following is a full example of starting DivanDB, defining a view, adding some documents and then querying the database.

Note that here we want to ensure that we get the most up to date result, so we refuse to accept a potentially stale query.

image

This outputs the right result, by the way :-)

Rhino Divan DB – Design

One of the things that I wanted to do with RDB is to create an explicit actor model inside the codebase. I have been using a similar structure inside NH Prof, and it has been quite successful. The design goals for RDB is:

Assumptions for the database cosntruction

Get / Put / Delete semantics for Json documents.

All those operations can access batches of documents to work on. Those operations fully implement ACID. Which means that if you got a successful response for a document Put, you can rely on the document always being there.

Those operations should be considered cheap.

Reboot / crash resistant

The DB can crash / restart, but no lose of functionality may occur, but as soon as it restarts, everything goes on as usual. There can be no in memory data structures / work that cannot be recovered from persistent structure.

Views for searching

The DB use views, defined using linq expressions, for supporting search capabilities. Those views are background indexed (so no holding up request processing for views). When you get a result from a queue you always know if the result is stale or not.

Adding a view to an existing database is a cheap operation, regardless of the database size. During view construction, the view can be queried (but its results will be considered stale). Reboot during view construction will not impact the construction process.

Indexing a document twice is a stable operation, which means that a view can always choose to re-index things if it so choose.

Overall design

image

RDB stores two major pieces of information in transactional storage.

Documents, obviously, which are stored in a format that allows to send the document content to the user quickly, and tasks.

Tasks are how RDB maintains state over crashes / reboots, and they also form the base of async work of the database. Any work that is going to take some time for the database to perform is written to transactional storage as a task. Those tasks are things like: “View ‘peopleByName’ should index documents 1 – 42'”.

There are background threads working of off this tasks queue, performing the work and removing the task when they are completed.

The results of each view is written to a Lucene index (one per view).

So far i have the entire structure done, I need to some polishing, and I have a different OSS strategy to go with, but thinks are looking good.