Ayende @ Rahien

Refunds available at head office

API Design: Sharding Status for failure scenarios–explicit failure management doesn’t work

Still going on with the discussion on how to handle failures in a sharded cluster, we are back to the question of how to handle the scenario of one node in a cluster going down. The question is, what should be the system behavior in such a scenario.

In my previous post, I discussed one alternative option:

ShardingStatus status;
va recentPosts = session.Query<Post>()
          .ShardingStatus( out status )
          .OrderByDescending(x=>x.PublishedAt)
          .Take(20)
          .ToList();

I said that I really don’t like this option. But deferred the discussion on exactly why.

Basically, the entire problem boils down to a very simple fact, manual memory management doesn’t work.

Huh? What is the relation between handling failures in a cluster to manual memory management? Oren, did you get your wires crossed again and continued a different blog post altogether?

Well, no. It is the same basic principle. Requiring users to add a specific handler for this result in several scenarios, none of them ideal.

First, what happen if we don’t specify this? We are back to the “ignore & swallow the error” or “throw and kill the entire system”.

Let us assume that we go with the first option, the developer has a way to get the error if they want it, but if they don’t care, we will just ignore this error. The problem with this approach is that it is entirely certain that developers will not add this, at least, not before the first time we have a node fail in production and the system will simply ignore this and show the wrong results.

The other option, throw an exception if the user didn’t ask for the sharding status and we have a failing node, is arguably worse. We now have a ticking time bomb. If a node goes down, the entire system will go down. The reason that I say that this is worse than the previous option is that the natural inclination of most developers is to simply stick the ShardingStatus() there and “fix” the problem. Of course, this is basically the same as the first option, but this time, the API actually let the user down the wrong path.

Second, this is forcing a local solution on a global problem. We are trying to force the user to handle errors at a place where the only thing that they care about is the actual business problem.

Third, this alternative doesn’t handle scenarios where we are doing other things, like loading by id. How would you get the ShardingStatus from this call?

session.Load<Post>("tri/posts/1");

Anything that you come up with is likely to introduce additional complexity and make things much harder to work with.

As I said, I intensely dislike this option. A much better alternative exists, and I’ll discuss this in the next post…

Tags:

Published at

Originally posted at

Comments (14)

API Design: Sharding Status for failure scenarios–explicit failure management

Still going on with the discussion on how to handle failures in a sharded cluster, we are back to the question of how to handle the scenario of one node in a cluster going down. The question is, what should be the system behavior in such a scenario.

In the previous post, we discussed the option of simply ignoring the failure, and the option of simply failing entirely. Both options are unpalatable, because we either transparently hide some data from the user (which reside on the failing node) or we take the entire system down when a single node is down.

Another option that was suggested in the mailing list is to actually expose this to the user, like so:

ShardingStatus status;
va recentPosts = session.Query<Post>()
          .ShardingStatus( out status )
          .OrderByDescending(x=>x.PublishedAt)
          .Take(20)
          .ToList();

This will give us the status information about potentially failing shards.

I intensely dislike this option, and I’ll discuss the reasons why on the next post. In the meantime, I would like to hear your opinion about this API choice.

Tags:

Published at

Originally posted at

Comments (18)

API Design: Sharding Status for failure scenarios–ignore and move on

In my previous post, I discussed how to handle partial failure in sharded cluster scenario. This is particularly hard because this is the case where one node out of a 3 nodes (or more) cluster is failing, and it is entirely likely that we can give the user at least partial service properly.

The most obvious, and probably easiest option, is to simply catch and log the error for the failing server and not notify the calling application code about this. The nice thing about this option is that if you have a failing server, you don’t have your entire system goes down, and can handle this at least partially.

The sad part about this option is that there really is a good chance that you won’t notice that some part of the system is down, and that you are actually returning only partial results. That can lead to some really nasty scenarios, such as the case where we “lose” an order, or a payment, and we don’t show this to the user.

That can lead to some really frustrating scenarios where a customer is saying “but I see the payment right here” and the help desk says “I am sorry, but I don’t see it, therefor you don’t get any service, have a good day, bye”.

Still… that is probably the best case scenario considering the alternative being the entire system being unavailable if any single node is down.

Or is it… ?

Tags:

Published at

Originally posted at

Comments (10)

API Design: Sharding Status for failure scenarios

An interesting question came up recently. How do we want to handle sharding failures?

For example, let us say that I have a 3 nodes clusters of RavenDB, serving posts for a blog (just to give some random example). The way the sharding has been setup, we are doing sharding using Round Robin based on posts (so each post goes to a different machine, and anything related to post goes to the same node as the post). Here is how it can be set:

image

Now, we want to display the main page, and we would like to show the most recent posts. We can do this using the following code:

image

The question is, what would happen if the second server if offline?

I’ll give several alternative in the next few posts.

Tags:

Published at

Originally posted at

Comments (8)

And this is Ironic

I was trying to press the Yes button, and this happened:

image

To be fair, I am trying to get it to read my mail, and it is a bit… largish, I guess:

image

Tags:

Published at

Originally posted at

Comments (4)

I think that I’ll ignore this test failure

image

This really sucks, because of this:

image

This is part of the RavenDB release process, where we are actually hammering RavenDB and seeing if it breaks. And it usually takes about 24 hours to do a complete run.

Tags:

Published at

Originally posted at

Comments (3)

The RavenDB Release Process

We got several questions about this in the mailing list, so I thought that this would be a good time to discuss this in the blog.

One of the best part about RavenDB is that we are able to deliver quickly and continuously.  That means that we can deliver changes to the users very rapidly, often resulting in response times of less than an hour from “I have an issue” to “it is already fixed and you can download it”.

That is awesome on a lot of level, but it lack something very important, stability. In other words, by pushing things so rapidly, we are living on the bleeding edge. Which is great, except that you tend to bleed.

That is why we split the RavenDB release process into Unstable and Stable. Unstable builds are released on a “several times a day” basis, and only require that we will pass our internal test suite. This test suite is hefty, over 1,500 tests so far, but it is something that can be run in about 15 minutes or so on the developer machine to make sure that our changes didn’t break anything.

The release process for Stable version is much more involved. First, of course, we run the standard suite of tests. Then we have a separate set of tests, which are stress testing RavenDB by trying to see if there are any concurrency issues.

Next, we take the current bits and push them to our own internal production systems. For example, at the time of this writing, this blog (and all of our assets) are currently running on RavenDB build 726 and have been running that way for a few days. This allows us to test several things. That there are no breaking changes, that this build can survive running in production over extended period of time and that the overall performance remains excellent.

Finally, we ask users to take those builds for a spin, and they are usually far more rough on RavenDB than we are.

After all of that, we move to a set of performance tests, comparing the system behavior on a wide range of operations compared to the old version.

And then… we can do a stable release push. Phew!

Tags:

Published at

Originally posted at

Comments (7)

RavenHQ goes out of beta

After several months in public beta, I am proud to announce that RavenHQ, the RavenDB as a Service  on the cloud has dropped the beta label and is now ready for full production use.

That means that we now accept signups from the general public, you no longer need an AppHarbor account and you can use it directly. It also means that you can safely start using RavenHQ for production purposes.

RavenHQ is a fully-managed cloud of RavenDB servers and scalable plans, you’ll never have to worry about installation, updates, availability, performance, security or backups again.

We offer both standard and high availability plans, and are the perfect fit for RavenDB users who can safely outsource all the operational support of your databases in the RavenHQ’s team capable hands.

Tags:

Published at

Originally posted at

Comments (13)

If you put tripwires in my paths, you won’t get my money

Recently I updated several of my VS plugins, and immediately after that I noticed a changed in the way VS behaves.

image

This is actually quite nice, and useful. (Although I was a bit confused for a time about when it shows and when it doesn’t, but that is beside the point).

the problem is that in many cases, like the first two example, it is actually quite easy to press on those notifications. And if you do that, you get:

image

And that is annoying. Sure, I get that you want to encourage people to buy your products, I even agree. But this sort of a feature is something that is very easy to invoke by mistake, and it completely throws you out of your current context.

I have VSCommands installed for the simple reason that I really like the Reload All feature. But it isn’t worth it if I have to be careful where I put my mouse in VS.

This single feature is the reason that I uninstalled it.

Tags:

Published at

Originally posted at

Comments (4)

A bad test

image

This is a bad test, because what it does is ensuring that something does not works. I just finished implementing the session.Advaned.Defer support, and this test got my attention by failing the build.

Bad test, you should be telling me when I broke something, not when I added new functionality.

Tags:

Published at

Originally posted at

Comments (16)

When using the Task Parallel Library, Wait() is a BAD warning sign

Take a look at the following code:

public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IEnumerable<RdcNeed> needList)
{
    return Task.Factory.StartNew(() =>
    {
        foreach (var item in needList)
        {
            switch (item.BlockType)
            {
                case RdcNeedType.Source:
                    source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
                    break;
                case RdcNeedType.Seed:
                    seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
                    break;
                default:
                    throw new NotSupportedException();
            }
        }
    });
}

Do you see the problem in here?

It is a result of a code review comment about improper use of async in a project. This resulted in a lot of Task showing up in the return methods, but not in any measurable improvement in the actual codebase use of asynchronicity.

The problem is that when you need to work with such things in C# 4.0, you have to do some annoying things to get the code to work properly. In particular, this method was modified to be:

public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IList<RdcNeed> needList, int position = 0)
{
  if(position>= needList.Count)
  {
        return new CompletedTask();
  }
  var item = needList[position];
  Task task;
            
  switch (item.BlockType)
  {
        case RdcNeedType.Source:
            task = source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
            break;
        case RdcNeedType.Seed:
            task = seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
            break;
        default:
            throw new NotSupportedException();
  }

  return task.ContinueWith(resultTask =>
    {
        if (resultTask.Status == TaskStatus.Faulted)
            resultTask.Wait(); // throws
        return ParseAsync(source, seed, output, needList, position + 1);
    }).Unwrap();
}

This code is more complex, but it is actually making proper use of the TPL. We have changed the loop into a recursive function, so we can take advantage of ContinueWith to the next iteration of the loop.

And no, I can’t wait to get to C# 5.0 and have proper await work.

Reviewing RavenDB app: ReleaseCandidateTracker

ReleaseCandidateTracker is a new RavenDB based application by Szymon Pobiega. I reviewed version 5f7e42e0fb1dea70e53bace63f3e18d95d2a62dd. At this point, I don’t know anything about this application, including what exactly it means, Release Tracking.

I downloaded the code and started VS, there is one project in the solution, which is already a Good Thing. I decided to randomize my review approach and go and check the Models directory first.

Here is how it looks:

image

This is interesting for several reasons. First, it looks like it is meant to keep a record of all deployments to multiple environments, and that you can lookup the history of each deployment both on the environment side and on the release candidate side.

Note that we use rich models, which have collections in them. In fact, take a look at this method:

image

Which calls to this method:

image

You know what the really fun part about this?

It ain’t relational model. There is no cost of actually making all of these calls!

Next, we move to the Infrastructure folder, where we have a couple of action results and the RavenDB management stuff. Here it how RCT uses RavenDB:

public static class Database
{
    private static IDocumentStore storeInstance;

    public static IDocumentStore Instance
    {
        get
        {
            if (storeInstance == null)
            {
                throw new InvalidOperationException("Document store has not been initialized.");
            }
            return storeInstance;
        }
    }

    public static void Initialize()
    {
        var embeddableDocumentStore = new EmbeddableDocumentStore {DataDirectory = @"~\App_Data\Database"};
        embeddableDocumentStore.Initialize();
        storeInstance = embeddableDocumentStore;
    }
}

It is using an embedded database to do that, which makes it very easy to use the app. Just hit F5 and go. In fact, if we do, we see the fully functional website, which is quite awesome Smile.

Let us move to seeing how we are managing the sessions:

public class BaseController : Controller
{
    public IDocumentSession DocumentSession { get; private set; }
    public CandidateService CandidateService { get; private set; }
    public ScriptService ScriptService { get; private set; }

    protected override void OnActionExecuting(ActionExecutingContext filterContext)
    {
        if (filterContext.IsChildAction)
        {
            return;
        }
        DocumentSession = Database.Instance.OpenSession();
        CandidateService = new CandidateService(DocumentSession);
        ScriptService = new ScriptService(DocumentSession);
        base.OnActionExecuting(filterContext);
    }
    
    protected override void OnActionExecuted(ActionExecutedContext filterContext)
    {
        if (filterContext.IsChildAction)
        {
            return;
        }
        if(DocumentSession != null)
        {
            if (filterContext.Exception == null)
            {
                DocumentSession.SaveChanges();
            }
            DocumentSession.Dispose();
        }
        base.OnActionExecuted(filterContext);
    }
}

This is all handled inside the base controller, and it is very similar to how I am doing that in my own apps.

However, ScriptService and CandidateService seems strange, let us explore them a bit.

public class ScriptService
{
    private readonly IDocumentSession documentSession;

    public ScriptService(IDocumentSession documentSession)
    {
        this.documentSession = documentSession;
    }

    public void AttachScript(string versionNumber, Stream fileContents)
    {
        var metadata = new RavenJObject();
        documentSession.Advanced.DatabaseCommands.PutAttachment(versionNumber, null, fileContents, metadata);
    }

    public Stream GetScript(string versionNumber)
    {
        var attachment = documentSession.Advanced.DatabaseCommands.GetAttachment(versionNumber);
        return attachment != null 
            ? attachment.Data() 
            : null;
    }
}

So this is using RavenDB attachment to store stuff, I am not quite sure what yet, so let us track it down.

This is being used like this:

[HttpGet]
public ActionResult GetScript(string versionNumber)
{
    var candidate = CandidateService.FindOneByVersionNumber(versionNumber);
    var attachment = ScriptService.GetScript(versionNumber);
    if (attachment != null)
    {
        var result = new FileStreamResult(attachment, "text/plain");
        var version = candidate.VersionNumber;
        var product = candidate.ProductName;
        result.FileDownloadName = string.Format("deploy-{0}-{1}.ps1", product, version);
        return result;
    }
    return new HttpNotFoundResult("Deployment script missing.");
}

So I am assuming that the scripts are deployment scripts for different versions, and that they get uploaded on every new release candidate.

But look at the CandidateService, it looks like a traditional service wrapping RavenDB, and I have spoken against it multiple times.

In particular, I dislike this bit of code:

public ReleaseCandidate FindOneByVersionNumber(string versionNumber)
{
    var result = documentSession.Query<ReleaseCandidate>()
        .Where(x => x.VersionNumber == versionNumber)
        .FirstOrDefault();
    if(result == null)
    {
        throw new ReleaseCandidateNotFoundException(versionNumber);
    }
    return result;
}

public void Store(ReleaseCandidate candidate)
{
    var existing = documentSession.Query<ReleaseCandidate>()
        .Where(x => x.VersionNumber == candidate.VersionNumber)
        .Any();
    if (existing)
    {
        throw new ReleaseCandidateAlreadyExistsException(candidate.VersionNumber);
    }
    documentSession.Store(candidate);
}

From looking at the code, it looks like the version number of the release candidate is the primary way to look it up. More than that, in the entire codebase, there is never a case where we load a document by id.

When I see a VersionNumber, I think about things like “1.0.812.0”, but I think that in this case the version number is likely to include the product name as well, “RavenDB-1.0.812.0”, otherwise you couldn’t have two products with the same version.

That said, the code above it wrong, because it doesn’t take into account RavenDB’s indexes BASE nature. Instead, the version number should actually be the ReleaseCandidate id. This way, because RavenDB’s document store is fully ACID, we don’t have to worry about index update times, and we can load things very efficiently.

Pretty much all of the rest of the code in the CandidateService is only used in a single location, and I don’t really see a value in it being there.

For example, let us look at this one:

[HttpPost]
public ActionResult MarkAsDeployed(string versionNumber, string environment, bool success)
{
    CandidateService.MarkAsDeployed(versionNumber, environment, success);
    return new EmptyResult();
}

image

As you can see, it is merely loading the appropriate release candidate, and calling the MarkAsDeployed method on it.

Instead of doing this needless, forwarding, and assuming that we have the VersionNumber as the id, I would write:

[HttpPost]
public ActionResult MarkAsDeployed(string versionNumber, string environment, bool success)
{
    var cadnidate = DocumentSession.Load<ReleaseCandidate>(versionNumber);
    if (cadnidate == null)
        throw new ReleaseCandidateNotFoundException(versionNumber);
    var env = DocumentSession.Load<DeploymentEnvironment>(environment);
    if (env == null)
        throw new InvalidOperationException(string.Format("Environment {0} not found", environment));

    cadnidate.MarkAsDeployed(success, env);
    return new EmptyResult();
}

Finally, a word about the error handling, this is handled via:

protected override void OnException(ExceptionContext filterContext)
{
    filterContext.Result = new ErrorResult(filterContext.Exception.Message);
    filterContext.ExceptionHandled = true;
}

public class ErrorResult : ActionResult
{
    private readonly string message;

    public ErrorResult(string message)
    {
        this.message = message;
    }

    public override void ExecuteResult(ControllerContext context)
    {
        context.HttpContext.Response.Write(message);
        context.HttpContext.Response.StatusCode = 500;
    }
}

The crazy part is that OnException is overridden only on some of the controllers, rather than in the base controller, and even worse. This sort of code leads to error details loss.

For example, let us say that I get a NullReferenceException. This code will dutifully tell me all about it, but will not tell me where it happened.

This sort of thing make debugging extremely hard.

Tags:

Published at

Originally posted at

Comments (16)

Non overlapping time periods–because I like the pain of 2 AM wakeup calls

This post is partly in response for this post, discussing the Azure problem with leap year. But it is actually a bit more general than that.

In my code, here is how I define “one year from now”:

DateTime.Today.AddYears(1).AddDays(3);

As it turned out, this tend to have a lot of implications on your business, most of them are actually pretty good ones.

For a start, you will never get hit with a leap year bug, but more importantly, you are never going to have to deal with an immediate cutoff. This is important because it gives you time. Mostly, it gives you time to screw up, but having the time to do so without having an egg all of your face is a really nice thing.

For example, all of our subscriptions are using a similar method of calculation, and this is why we can take the ordering system down for a few hours or even a day or two and no one will actually notice. We have a big grace period in which we can work things out.

Sure, a user gets 3 “extra” days for free out of this, but frankly, I don’t give a damn. It is more important that I get the buffer, and most users like it much better when you don’t slam the doors in their faces on the first chance.

One of the things that is important is style, and giving a grace period for those sort of things is crucial.

Tags:

Published at

Originally posted at

Comments (5)

Rotten Scheduling: Don’t roll your own

“We need to run a specific task every 72 hours, I thought about this approach…”

public class TimedTask
{
  public static Timer Timer;
  
  public static void Init()
  {
    Timer = new Timer(ExecuteEvery72Hours, null, TimeSpan.FromHours(72), TimeSpan.FromHours(72));
  }
  
  public static void ExecuteEvery72Hours()
  {
    // do something important
  }
}

This is a bloody rotten idea, let us see why…

  • What happens if your application is recycled every 29 hours?
  • What happens if your application is always on, but during that 72 hour call, it was offline?
  • What happens if your task actually takes more than 72 hours to run?
  • What happens if the task fails?
  • How do you report errors, warnings, etc?

Scheduling is a hard problem. There are a lot of things that you actually need to consider. And the code above is really considering none of them. I would be very surprised if something like that ever run. in production. It most certainly can’t be made to run reliably.

Things that run every X time, where X is a long time (hours / days) tend to be pretty important. In some of the systems that we wrote, that include doing things like updating VAT and interest rates, pulling from external source, generating the weekly report, etc.

You do not want this to be messed up.

If you need to do anything like that, make use of the builtin scheduling features of the OS you are running on (Windows Task Scheduler is an amazingly full featured, and cron isn’t bad if you are running on Linux). If you still insist on doing this in code, at the very least do something like this:

public class TimedTask
{
  public static Timer Timer;
  
  public static void Init()
  {
    Timer = new Timer(()=>
    {
      if( (DateTime.UtcNow - GetLastExecutedTime()) > 72)
        ExecuteEvery72Hours();
    }, null, TimeSpan.FromMinutes(1), TimeSpan.FromMinutes(1));
  }
  
  public static void ExecuteEvery72Hours()
  {
    // do something important
  }
}

This still has a lot of problems, but at least it it solving some crucial problems for you (note that GetLastExecutedTime has to be a persisted value).

Of course, if you need something like this in your code, you have better use something like Quartz, instead. Don’t roll your own. Sure, this is ten lines of code to do so, but as I said, this is the very basics, and it gets complex really fast.

Tags:

Published at

Originally posted at

Comments (19)

Rotten Scheduling

“We need to run a specific task every 72 hours, I thought about this approach…”

public class TimedTask
{
  public static Timer Timer;
  
  public static void Init()
  {
    Timer = new Timer(ExecuteEvery72Hours, null, TimeSpan.FromHours(72), TimeSpan.FromHours(72));
  }
  
  public static void ExecuteEvery72Hours()
  {
    // do something important
  }
}

Let us assume that this code is being called properly. Why is this a bloody rotten idea?

Tags:

Published at

Originally posted at

Comments (28)

Reviewing Xenta and wishing I hadn’t

Xenta Framework is the extensible enterprise n-tier application framework with multilayered architecture.

I was asked by the coordinator for the project to review it.

This isn’t going to take long. I looked at the code, and I got this:

image

I am sure you remember the last time when I run into something like this, except that the number of projects at that solution was a quarter of what we had here, and I already had to hold my nose.

Looking a bit deeper:

image

Here is a hint, you are allowed to have more than one class per project.

Having that many project is a nightmare in trying to manage them. Finding things, actually making use of how things work. Not to mention that the level of abstractness required to support that is giving me a headache.

This is still without looking at the code, mind.

Now, let us look at the actual code. I like to start the controllers. The following code is from ForumController:

[HttpPost, ValidateInput(false)]
public ActionResult Update(int forumID, FormCollection form)
{
    ForumModel m = new ForumModel()
    {
        ForumID = forumID
    };

    if(!m.Load())
    {
        return HttpNotFound();
    }

    if(TryUpdateModel<ForumModel>(m, "Model", form))
    {
        try
        {
            m.Update();
        }
        catch(Exception ex)
        {
            ModelState.AddModelError("API", ex);
        }
    }

    if(!ModelState.IsValid)
    {
        TempData.OperationStatus("Failed");
        TempData.PersistObject("Model", m);
        TempData.PersistModelState(ModelState);
    }
    else
    {
        TempData.OperationStatus("Success");
    }

    return RedirectToAction("Edit", new
    {
        ForumID = m.ForumID
    });
}

Seriously, I haven’t seen this style of architecture in a while. Let us dig deeper:

public bool Update()
{
    using(ForumApiClient api = new ForumApiClient())
    {
        var dto = api.UpdateForum(ForumID, ParentForumID, DisplayOrder, Flags);

        return Load(dto);
    }
}

public bool Load()
{
    using(ForumApiClient api = new ForumApiClient())
    {
        var dto = api.GetForum(ForumID);

        return Load(dto);
    }
}

In case you are wondering, those are on the ForumModel class.

This piece of code is from the ForumPostModel class, it may look familiar:

public bool Load()
{
    using(ForumApiClient api = new ForumApiClient())
    {
        var dto = api.GetPost(PostID);

        return Load(dto);
    }
}

This ForumApiClient is actually a WCF Proxy class, which leads us to this interface:

image

I won’t comment on this any further, but will go directly to ForumApiService, where we actually update a forum using the following code:

public ForumDto UpdateForum(int forumID,
    int parentForumID,
    int displayOrder,
    ForumFlags flags)
{
    ForumService forumService = Infrastructure.Component<ForumService>();

    if(forumService == null)
    {
        throw new FaultException(new FaultReason("forum service"), new FaultCode(ErrorCode.Infrastructure.ToString()));
    }

    try
    {
        ForumEntity entity = forumService.UpdateForum(forumID, parentForumID, displayOrder, flags, DateTime.UtcNow);

        return Map(entity);
    }
    catch(XentaException ex)
    {
         throw new FaultException(new FaultReason(ex.Message), new FaultCode(ex.Code.ToString()));
    }
}

Infrastructure.Component represent a home grown service locator. Note that we have manual exception handling for absolutely no reason whatsoever (you can do this in a behavior once, and since this is repeated for each and every one of the methods…).

I apologize in advance, but here is the full UpdateForum method:

public ForumEntity UpdateForum(int forumID,
    int parentForumID,
    int displayOrder,
    ForumFlags flags, 
    DateTime updatedOn)
{
    ForumEntity oldForum = GetForum(forumID);
            
    if(oldForum == null)
    {
        throw new XentaException(ErrorCode.NotFound, "forum");
    }

    ForumEntity parentForum = null;

    if(parentForumID != 0)
    {
        parentForum = GetForum(parentForumID);

        if(parentForum == null)
        {
            throw new XentaException(ErrorCode.NotFound, "forum");
        }

        ForumEntity tmp = parentForum;

    HierarchyCheck:

        if(tmp.ForumID == forumID)
        {
            throw new XentaException(ErrorCode.InvalidArgument, "parentForumID");
        }
        if(tmp.ParentForumID != 0)
        {
            tmp = tmp.Parent;

            goto HierarchyCheck;
        }
    }

    ForumEntity newForum = null;
    bool res = Provider.UpdateForum(forumID,
        parentForumID,
        oldForum.LastTopicID,
        oldForum.TopicCount,
        oldForum.PostCount,
        displayOrder,
        flags,
        oldForum.CreatedOn,
        updatedOn);

    if(res)
    {
        if(Cache != null)
        {
            Cache.Remove("ForumService_GetForum_{0}", oldForum.ForumID);
        }

        newForum = GetForum(forumID);

        if(EventBroker != null)
        {
            EventBroker.Publish<PostEntityUpdate<ForumEntity>>(this, new PostEntityUpdate<ForumEntity>(newForum, oldForum));
        }
    }
    return newForum;
}

Yes, it is a goto there, for the life of me I can’t figure out why.

Note that there is also this Provider in here, which is an IForumProvider, which is implemented by… ForumProvider, which looks like this:

public bool UpdateForum(int forumID,
    int parentForumID,
    int lastTopicID,
    int topicCount,
    int postCount,
    int displayOrder,
    ForumFlags flags,
    DateTime createdOn,
    DateTime updatedOn)
{
    bool res = false;

    using(DbCommand cmd = DataSource.GetStoredProcCommand("fwk_Forums_Update"))
    {
        SqlServerHelper.SetInt32(DataSource, cmd, "ForumID", forumID);
        SqlServerHelper.SetInt32(DataSource, cmd, "ParentForumID", parentForumID);
        SqlServerHelper.SetInt32(DataSource, cmd, "LastTopicID", lastTopicID);
        SqlServerHelper.SetInt32(DataSource, cmd, "TopicCount", topicCount);
        SqlServerHelper.SetInt32(DataSource, cmd, "PostCount", postCount);
        SqlServerHelper.SetInt32(DataSource, cmd, "DisplayOrder", displayOrder);
        SqlServerHelper.SetInt32(DataSource, cmd, "Flags", (int)flags);
        SqlServerHelper.SetDateTime(DataSource, cmd, "CreatedOn", createdOn);
        SqlServerHelper.SetDateTime(DataSource, cmd, "UpdatedOn", updatedOn);

        res = DataSource.ExecuteNonQuery(cmd) > 0;
    }
    return res;
}

And… this is about it guys.

I checked the source control, and the source control history for this goes back only to the beginning of February 2012. I assume that this is older than this, because the codebase is pretty large.

But to summarize, what we actually have is a highly abstracted project, a lot of abstraction. A lot of really bad code even if you ignore the abstractions, seemingly modern codebase that still uses direct ADO.Net for pretty much everything, putting a WCF service in the middle of the application just for the fun of it.

Tons of code dedicated to error handling, caching, etc. All of which can be handled as cross cutting concerns.

This is the kind of application that I would expect to see circa 2002, not a decade later.

And please note, I actually got the review request exactly 34 minutes ago. I didn’t review the entire application (nor do I intend to). I merely took a reprehensive* vertical slide of the app and followed up on that.

*Yes, this is intentional.

Tags:

Published at

Originally posted at

Comments (32)

The best multi threading debugging tool is Microsoft Excel

In any system that gets to a certain size, especially one that is multi threaded, there are a certain class of bugs that are really a bitch to figure out.

They are usually boil down to some sort of a race condition. I have been doing that recently, trying to fix a bunch of race conditions in RavenDB. The problem with race conditions is that they are incredibly hard to reproduce, and even when you can reproduce them, you can’t really debug them.

I came up with the following test harness to try to narrow things down. Basically, create a test case that sometimes fails, and run it a thousand times.

Odds are that you’ll get the failure, but that isn’t the important bit. The important bit is that you setup logging properly. In my case, I set things up so the logging output to CSV and each test run had a different file.

This gives me output that looks like this:

23:54:34.5952,Raven.Database.Server.HttpServer,Debug,Request #  10: GET     -    12 ms - <default>  - 200 - /indexes/dynamic/Companies?query=&start=0&pageSize=1&aggregation=None,,11
23:54:34.5952,Raven.Client.Document.SessionOperations.QueryOperation,Debug,"Stale query results on non stale query '' on index 'dynamic/Companies' in 'http://reduction:8079/', query will be retried, index etag is: bcf0fed1-d975-43b4-bfb7-65221ef06b99",,1
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 10, for: TasksExecuter, will wait for additional work",,6
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,Incremented work counter to 11 because: WORK BY IndexingExecuter,,12
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 11, for: TasksExecuter, will wait for additional work",,6
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 11, for: ReducingExecuter, will wait for additional work",,21
23:54:34.5952,Raven.Database.Indexing.WorkContext,Debug,"No work was found, workerWorkCounter: 11, for: IndexingExecuter, will wait for additional work",,12
23:54:34.6952,Raven.Client.Document.SessionOperations.QueryOperation,Debug,Executing query '' on index 'dynamic/Companies' in 'http://reduction:8079/',,1
23:54:34.8172,Raven.Database.Indexing.Index.Querying,Debug,Issuing query on index Temp/Companies for all documents,,11
23:54:34.8172,Raven.Storage.Esent.StorageActions.DocumentStorageActions,Debug,Document with key 'companies/1' was found,,11

Not really helpful in figure out what is going on, right? Except for one tiny thing, we load them to Excel, and we use conditional formatting to get things to look like this:

image

The reason this is so helpful? You can actually see the threads interleaving. This usually help me get to roughly the right place, and then I can add additional logging so I can figure out better what is actually going on.

I was able to detect and fix several race conditions using this approach.

And yes, I know that this is basically printf() debugging. But at least it is printf() debugging with pretty colors.

Tags:

Published at

Originally posted at

Comments (11)

Answering customer questions

This made me laugh:

image

And yes, I am aware that laughing at my own action is… strange.

Tags:

Published at

Originally posted at

Comments (7)

Testing Rhino Service Bus applications

One of the really nice things about Rhino Service Bus applications is that we have created a structured way to handle inputs and outputs. You have messages coming in and out, as well as the endpoint local state to deal with. You don’t have to worry about how to deal with external integration points, because those are already going over messages.

And when you have basic input/output figured out, you are pretty much done.

For example, let us see the code that handles extending trail licenses in our ordering system:

public class ExtendTrialLicenseConsumer : ConsumerOf<ExtendTrialLicense>
{
    public IDocumentSession Session { get; set; }
    public IServiceBus Bus { get; set; }

    public void Consume(ExtendTrialLicense message)
    {
        var productId = message.ProductId ?? "products/" + message.Profile;
        var trial = Session.Query<Trial>()
            .Where(x => x.Email == message.Email && x.ProductId == productId)
            .FirstOrDefault();

        if (trial == null)
            return;
        
        trial.EndsAt = DateTime.Today.AddDays(message.Days);
        Bus.Send(new NewTrial
        {
            ProductId = productId,
            Email = trial.Email,
            Company = trial.Company,
            FullName = trial.Name,
            TrackingId = trial.TrackingId
        });
    }
}

How do we test something like this? As it turns out, quite easily:

public class TrailTesting : ConsumersTests
{
    protected override void PrepareData(IDocumentSession session)
    {
        session.Store(new Trial
        {
            Email = "you@there.gov",
            EndsAt = DateTime.Today,
            ProductId = "products/nhprof"
        });
    }

    [Fact]
    public void Will_update_trial_date()
    {
        Consume<ExtendTrialLicenseConsumer, ExtendTrialLicense>(new ExtendTrialLicense
        {
            ProductId = "products/nhprof",
            Days = 30,
            Email = "you@there.gov",
        });

        using (var session = documentStore.OpenSession())
        {
            var trial = session.Load<Trial>(1);
            Assert.Equal(DateTime.Today.AddDays(30), trial.EndsAt);
        }
    }

    // more tests here
}

All the magic happens in the base class, though:

public abstract class ConsumersTests : IDisposable
{
    protected IDocumentStore documentStore;
    private IServiceBus Bus = new FakeBus();

    protected ConsumersTests()
    {
        documentStore = new EmbeddableDocumentStore
        {
            RunInMemory = true,
            Conventions =
                {
                    DefaultQueryingConsistency = ConsistencyOptions.QueryYourWrites
                }
        }.Initialize();

        IndexCreation.CreateIndexes(typeof(Products_Stats).Assembly, documentStore);

        Products.Create(documentStore);
        using (var session = documentStore.OpenSession())
        {
            PrepareData(session);
            session.SaveChanges();
        }
    }

    protected T ConsumeSentMessage<T>()
    {
        var fakeBus = ((FakeBus)Bus);
        object o = fakeBus.Messages.Where(x => x.GetType() == typeof(T)).First();

        fakeBus.Messages.Remove(o);
        return (T) o;
    }

    protected void Consume<TConsumer, TMsg>(TMsg msg)
        where TConsumer : ConsumerOf<TMsg>, new()
    {
        var foo = new TConsumer();

        using (var documentSession = documentStore.OpenSession())
        {
            Set(foo, documentSession);
            Set(foo, Bus);
            Set(foo, documentStore);

            foo.Consume(msg);

            documentSession.SaveChanges();
        }
    }

    private void Set<T,TValue>(T foo, TValue value)
    {
        PropertyInfo firstOrDefault = typeof(T).GetProperties().FirstOrDefault(x=>x.PropertyType==typeof(TValue));
        if (firstOrDefault == null) return;
        firstOrDefault.SetValue(foo, value, null);
    }

    protected virtual void PrepareData(IDocumentSession session)
    {
    }

    public void Dispose()
    {
        documentStore.Dispose();
    }
}

And here are the relevant details for the FakeBus implementation:

public class FakeBus : IServiceBus
{
    public List<object>  Messages = new List<object>();

    public void Send(params object[] messages)
    {
        Messages.AddRange(messages);
    }
}

Now, admittedly, this is a fairly raw approach and we can probably do better. This is basically hand crafted auto mocking for consumers, and I don’t like the Consume<TConsumer,TMsg>() syntax very much. But it works, it is simple and it doesn’t really gets in the way.

I won’t say it is the way to go about it, but it is certainly easier than many other efforts that I have seen. We just need to handle the inputs & outputs and have a way to look at the local state, and you are pretty much done.

Strange production errors

The following code cause a really strange error in production:

new MailAddress("test@gmail.​com");

The specified string is not in the form required for an e-mail address.

Huh?!

Obviously it is!

After immediately leaping to the conclusion that .NET is crap and I should immediately start writing my own virtual machine, I decided to dig a little deeper:

Character Code
t 116
e 101
s 115
t 116
@ 64
g 103
m 109
a 97
i 105
l 108
. 46
? 8203
c 99
o 111
m 109

8203 stands for U+200B or zero width space.

I guess that someone with a software testing background decided to get medieval on one of our systems.

Published at

Originally posted at

Comments (7)

I hate this code

I really hate this code, it is so stupid it makes my head hurt, and it have so much important factors in it. In particular, there is a lot of logic in here.

image

You might not see it as such, but a lot of this is actually quite important, default values, config parsing, decisions.

This is important. And it is all handled in a a few methods that goes on forever and hide important details in the tediousness of parameter unpacking.

This approach works if you have 5 parameters, not when you have 50.

Tags:

Published at

Originally posted at

Comments (27)

Thou shall not delete

Ouch!

public ActionResult DeleteComment(int id)
{
  var userComment = RavenSession.Load<UserComment>(id);

  if (userComment == null)
    return new HttpStatusCodeResult(204);

  var user = RavenSession.GetUser(User.Identity.Name);
  if(user == null || (user.Role != UserRole.Moderator && user.Role != UserRole.Admin))
    return new HttpStatusCodeResult(403, "You must be logged in as moderator or admin to be able to delete comments");

  RavenSession.Delete(user);

  return new HttpStatusCodeResult(204);
}
Tags:

Published at

Originally posted at

Comments (34)

And sometimes Things Just Works

I am in the process of writing an article about RavenDB, and I just wrote the following code to demonstrate RavenDB schema less nature:

using (var session = documentStore.OpenSession())
{
    session.Store(new Customer
    {
        Name = "Joe Smith",
        Attributes =
            {
                {"IsAnnoyingCustomer", true},
                {"SatisfactionLevel", 8.7},
                {"LicensePlate", "B7D-12JA"}
            }
    });

    session.SaveChanges();
}

using (var session = documentStore.OpenSession())
{
    var customers = session.Query<Customer>()
        .Where(x => x.Attributes["IsAnnoyingCustomer"].Equals(true))
        .ToList();

    Console.WriteLine(customers.Count);

    session.SaveChanges();
}

This worked, flawlessly.

The amount of work that we have put into RavenDB to make such things work is really scary when you sit down to think about it.

But it works, it does what I expect it to do and it doesn’t get in my way, woohoo!

Tags:

Published at

Originally posted at

Comments (15)