Ayende @ Rahien

It's a girl

What should I work on next?

As we near completion with EF Prof, I am starting to think about what the next project* would be. I got some ideas, but I would like to see what you think.

Note that I won’t necessarily follow this, but it would be a good indication about where to proceed.

A small explanation about the choices:

  • Entity Framework Caching Layer is an idea about adding 2nd level caching (similar to the way NHibernate is doing this) to EF, giving you transparent caching integrated into Entity Framework.
  • Production Profiling means that you’ll be able to connect to a production application with the profiler and see what statements it is executing, without paying any performance price when you are not connected.
  • LLBLGen Prof – as the name says, adding a new profiler profile that would support LLBLGen
  • DAL Prof – this is interesting, instead of supporting a specific product, this is meant for users who have a custom DAL, but still want to get the profiler benefits.

And anything else that you might want to suggest.

* Please note that all options discussed here are commercial ones.

NH Prof new feature: Filter static files

One of the most annoying things about doing the TekPub episodes is finding the right session, because NH Prof will detect any session opening. This means that it usually looks like this:

image

Which also means that it is pretty hard to figure out. I got annoyed enough with that to add a specific filter to filter all of those in one shot (you could already do it, but it meant creating several filters):

image

image

Which results in a much better experience:

image

If you want me to buy something, I want a queue, damn it!

And no, I don’t mean a queue in the technical sense, I don’t care how you build your stuff. What I care about is the customer experience.

Here is a simple problem, I go to your site and see three things that I would like to buy, but I am only going to buy one thing right now. In most cases, I have no real way of putting away those things that I am interested at to a later time.

A common examples, both of which are really annoying: Amazon Kindle.

I usually buy books after spending some time looking for them, which takes time. But I generally don’t buy more than one Kindle book at a time, simply because I often get distracted by something else and skip reading the second or third book in a particular order. So something that I would have loved to see is a way to put away a book for later. You know, something like a wish list? Except that those aren’t available for Kindle for some strange reason (probably someone figured out that immediate satisfaction negates them).

I am left with tracking this sort of stuff externally, and that just add more work. If you want me to buy something, make it easier for me to do so. And I want to come back later, you really want to make it easier.

Time transitions should be explicit

Let us talk about time for a second, okay? We deal with in just about every application we write, but we treat it quite dismissively. But let me give an example first. We need to build a notification system, the system is based on timed notifications that should be displayed in a web page.

Thinking about it, I came up with the following design:

image

And this query:

SELECT TOP 3 Id, PublishAt, Title, Content FROM Notifications
WHERE PublishAt > GETDATE()
ORDER BY PublishAt DESC

That seems to satisfy the requirements, it is simple and it works. Done.

Not quite, this system design suffer from a pretty important problem, the time transitions are implicit. But why is that important?

Because the state transition from waiting-to-be-published and published is a meaningful transition in our domain. As a simple example, I can’t post a notification to Twitter when a notification is published, simply because I have absolutely no idea when that is going to happen. In many real applications, silent state transitions are going to lead to a lot of hacks. Likely something like adding WasPublished flag that we can check and then do some action if we get a notification that wasn’t published yet.

A much better plan is to model things so that time is an explicit state transition, instead of just checking for PublishAt, we will check the IsPublished flag, and we have a background process that will check for the PublishAt and the current date and explicitly set the IsPublished flag. That is also the place where we will place logic relating to the state transition. It also means that we aren’t depending on a side affect (someone viewing the page to cause the publication process) to make something important happen in our application.

You might have noticed a theme here, I like making things explicit, it means that it is easier to handle them.

Building a recommendation engine in NHibernate

Well, it isn’t really a recommendation engine, it is a sample of one, and I strongly recommend not using it, but I am getting ahead of myself.

In the 6th episode of the TekPub’s NHibernate webcast, me & Rob worked on creating statistical queries with NHibernate. To be totally honest, the reason that we did that is to show off NHibernate’s querying capabilities, not so you would be able to make use of this in your applications. A recommendation engine is not something that you should run out of your OLTP store, so please take that under advisement.

The reason for this post is to explain in details how the final result works. Here is the NHibernate code:

var orderIDsContainingCurrentSku = DetachedCriteria.For<OrderItem>()
            .Add<OrderItem>(x=>x.Product.SKU==sku)
            .SetProjection(Projections.Property("Order.id"));

var skusOfProductsAppearingInOrdersContainingCurrentSku = DetachedCriteria.For<OrderItem>()
    .SetProjection(Projections.GroupProperty("Product.id"))
    .AddOrder(NHibernate.Criterion.Order.Desc(Projections.Count("Order.id")))
    .Add<OrderItem>(x=>x.Product.SKU!=sku)
    .Add(Subqueries.PropertyIn("Order.id", orderIDsContainingCurrentSku))
    .SetMaxResults(15);


var recommended = _session.CreateCriteria<Product>()
    .SetFetchMode<Product>(x => x.Descriptors, FetchMode.Join)
    .Add(Subqueries.PropertyIn("id", skusOfProductsAppearingInOrdersContainingCurrentSku))
    .SetResultTransformer(Transformers.DistinctRootEntity)
    .List<Product>();

And here is the resulting SQL:

SELECT this_.SKU                 as SKU1_1_,
       this_.ProductName         as ProductN2_1_1_,
       this_.BasePrice           as BasePrice1_1_,
       this_.WeightInPounds      as WeightIn4_1_1_,
       this_.DateAvailable       as DateAvai5_1_1_,
       this_.EstimatedDelivery   as Estimate6_1_1_,
       this_.AllowBackOrder      as AllowBac7_1_1_,
       this_.IsTaxable           as IsTaxable1_1_,
       this_.DefaultImageFile    as DefaultI9_1_1_,
       this_.AmountOnHand        as AmountO10_1_1_,
       this_.AllowPreOrder       as AllowPr11_1_1_,
       this_.DeliveryMethodID    as Deliver12_1_1_,
       this_.InventoryStatusID   as Invento13_1_1_,
       descriptor2_.SKU          as SKU3_,
       descriptor2_.DescriptorID as Descript1_3_,
       descriptor2_.DescriptorID as Descript1_4_0_,
       descriptor2_.Title        as Title4_0_,
       descriptor2_.Body         as Body4_0_
FROM   Products this_
       left outer join ProductDescriptors descriptor2_
         on this_.SKU = descriptor2_.SKU
WHERE  this_.SKU in (SELECT   top 15 this_0_.SKU as y0_
                     FROM     OrderItems this_0_
                     WHERE    not (this_0_.SKU = 'Binoculars2' /* @p0 */)
                              and this_0_.OrderID in (SELECT this_0_0_.OrderID as y0_
                                                      FROM   OrderItems this_0_0_
                                                      WHERE  this_0_0_.SKU = 'Binoculars2' /* @p1 */)
                     GROUP BY this_0_.SKU
                     ORDER BY count(this_0_.OrderID) desc)
 

The problem is that both the NHibernate code and the SQL are pretty complicated, and mapping between the two might be pretty hard if you are not familiar with that. So let us take this in stages. First, let us understand the logic in the SQL itself. Most of the complexity happens in the where clause, so let us look at this in depth:

WHERE  this_.SKU in (SELECT   top 15 this_0_.SKU as y0_
 FROM     OrderItems this_0_
 WHERE    not (this_0_.SKU = 'Binoculars2' /* @p0 */)
          and this_0_.OrderID in (SELECT this_0_0_.OrderID as y0_
                  FROM   OrderItems this_0_0_
                  WHERE  this_0_0_.SKU = 'Binoculars2' /* @p1 */)
 GROUP BY this_0_.SKU
 ORDER BY count(this_0_.OrderID) desc)

What exactly is going on in here?

Let us look at the nested most select, this select the OrderId from an order that have a Binoculars item in it. That is then passed to the parent select, matching it to that OrderID and returning the SKU if it is not also a Binoculars.

What we are actually saying here is: Give me all the items from orders that contains Binoculars, except for Binoculars. The logic is simple, you are very likely to buy something that someone else also bought in an order together with stuff that you bought (complimentary products, another book in the same series, etc). Next, we have the order by, we use that to find the stuff that you are most likely to buy. By ordering the items based on the number of orders they appear in, we try to find the most popular items (hence, the stuff that you are likely to buy as well).

I think that this is fairly clear, and now that we have the logic of the statement, let us try to understand how that NHibernate code produced it.

The answer is actually very simple, NHibernate’s Criteria API is all about composability. We simply composed the query from all the tiny pieces. Let us look at each individual piece in detail:

var orderIDsContainingCurrentSku = DetachedCriteria.For<OrderItem>()
            .Add<OrderItem>(x=>x.Product.SKU==sku)
            .SetProjection(Projections.Property("Order.id"));

This query is using DetachedCriteria, this is a way to generate a query (or a sub query, as we will soon see), without having a direct reference to the session. This is mostly useful in cases like this, where you want to compose several queries into a single one.

image In this case, it is pretty obvious what is going on, we ask NHibernate to select from OrderItems (by using the OrderItem entity), where the product sku is equal to the appropriate SKU (Binoculars2, in this case), and we don’t want to get the entity back, instead, we want only a single field (this is what SetProjection is for), the order id. Note that OrderItem mapping is quite interesting:

<class name="OrderItem" table="OrderItems" >
  <composite-id>
      <key-many-to-one name="Product" column="SKU"/>
      <key-many-to-one name="Order" column="OrderID"/>
  </composite-id>
</class>

It is a class with a composite id, where each part of the PK is also a FK to a different table. With NHibernate, we map this using <key-many-to-one/> in a composite-id element.

When we want to query on that, we can either use the usual many-to-one approach, or, if we want to refer to a particular column, we use the “id” (all lower case) keyword. In other words, “Order.id” refers to the OrderItems.OrderID column, while “Product.id” or “Product.SKU” refers to the “OrderItems.SKU” column. I think that you can figure out what is going on now, this query generate the following SQL:

SELECT this_0_0_.OrderID as y0_
FROM   OrderItems this_0_0_
WHERE  this_0_0_.SKU = 'Binoculars2' /* @p1 */

And I think that can see the direct correlation between the NHibernate query and the generated SQL.

Next in line, and seemingly much more complicated, we have this:

var skusOfProductsAppearingInOrdersContainingCurrentSku = DetachedCriteria.For<OrderItem>()
    .SetProjection(Projections.GroupProperty("Product.id"))
    .AddOrder(NHibernate.Criterion.Order.Desc(Projections.Count("Order.id")))
    .Add<OrderItem>(x=>x.Product.SKU!=sku)
    .Add(Subqueries.PropertyIn("Order.id", orderIDsContainingCurrentSku))
    .SetMaxResults(15);

But it isn’t really complidated, let us look at this in details. The first line is already familiar for us, asking to select from OrderItems. Next, we use SetProjection again, to select just the “Product.id”, which is the OrderItems.SKU. Note that we are using something slightly different this time, where before we used Projections.Property, now we use Projections.GroupProperty. What is the difference between the two?

Projection.Property instructs NHibernate to put the matching column in the select clause, while Projection.GroupProperty instructs NHibernate to put the matching column in the select clause and in the group by clause. This is required because on the next line, we are using an aggregate function in the order by clause, aggregate functions must be used in conjunction with the appropriate group by clause. That line also specify that we are using a descending order on the count of the “Order.id” (which matches OrderItems.OrderID).

The following line is something we are already familiar with, we are adding a where clause to filter the current SKU. And now we get to the interesting part, we use a subquery to match the “Order.id” to the order ids containing the current SKU. Last, but not least, we limit the number of returned rows to 15. The resulting SQL is:

SELECT   top 15 this_0_.SKU as y0_
FROM     OrderItems this_0_
WHERE    not (this_0_.SKU = 'Binoculars2' /* @p0 */)
      and this_0_.OrderID in (/* orderIDsContainingCurrentSku */)
GROUP BY this_0_.SKU
ORDER BY count(this_0_.OrderID) desc

I think that again, once we have gone over this in details, you will agree that there is a pretty simple mapping between the query and the resulting SQL.

Now, let us look at the actual query code, which make use of the previous two subqueries:

var recommended = _session.CreateCriteria<Product>()
    .SetFetchMode<Product>(x => x.Descriptors, FetchMode.Join)
    .Add(Subqueries.PropertyIn("id", skusOfProductsAppearingInOrdersContainingCurrentSku))
    .SetResultTransformer(Transformers.DistinctRootEntity)
    .List<Product>();

We create a standard criteria query, ask it to eager load the Descriptors, then we perform a subquery, matching the product “id” (when specified using all lower case it is an NHibernate keyword referencing the current entity’s PK column) to the skus of products that appear in orders contains the current sku. Because we used a join to eager load the descriptors, we need to specify SetResultTransofer so we will get only distinct root entities.

All in all, when you break it up to pieces like that, I don’t think that it is an overly complex process. The query we were trying to get to is by no means a simple one, but we didn’t have any additional complexity when trying to create it using NHibernate.

Resharper is amazing, take N

I am trying out R# 5.0, and I accidently stumbled upon an awesome feature.

Let us say that you Ctrl+Click the int.ToString() method:

image

In previous versions of R#, that would take you to the Object Explorer, which is useful enough, but R# does something far more awesome.

I didn’t realize at first what was going on, first it popped up this thingie:

image

And then it asked me to accept a license:

image

But then it actually showed me the real code that is behind int.ToString(). This is not Reflector trick, this is the real source code that made it into the framework:

image

Now, this is basically just integrating Symbol Server & Source Server, both of which have been available for a while now, but making this so easy is something that is huge!

NHibernate is on the cover of MSDN Magazine

A while ago I run a poll about what posts you would like me to do, and the most requested topic was handling NHibernate in a Desktop application. I started writing a blog post about it, but when it hit twenty pages, I thought better on that and decided that I might as well post that as an article. MSDN Magazine just did.

You can read the about Building a Desktop To-Do Application with NHibernate in the latest issue of MSDN Magazine.

And now that the article is out, I can start posting about other topic in the code base that are pretty interesting as well.

Reading Frenzy

I don’t usually read non fiction books, I read some tech books, but that is work, more than anything else. I do read a lot, and I thought that I might post what I like, in hope to get recommendations for more stuff.

The following list is mixed between authors & characters, depending on what I find more memorable. I only included authors / books that I read in the last 6 months or so.

  • Robert Jordan
  • Kris Longknife
  • Miles Vorkosigan
  • Jim C. Hines
  • David Weber
    • Honorverse
    • Safehold
    • Dahak
  • John Moore
  • Rachel Caine
    • Weather Warden
    • Outcast
  • Southern Vampire – Charlaine Harris
  • The Lost Fleet – Jack Campbell
  • Terry Prachett – the entire Discworld series
  • Ilona Andrews
    • On the Edge
    • Kate Daniels
  • Votta War

Those aren’t all of them, but that should be enough for now. Those are all that pop into mind as good reads.

ÜberProf and Continuous Integration

One of the features that keep popping up for ÜberProf is that people want to use that in CI scenarios, usually to be able to programmatically check that they don’t have things like SELECT N+1 popping up, etc.

With build 562 of ÜberProf, this is now possible. How does this works? We now have a command line interface for ÜberProf, with the following options:

/CmdLineMode[+|-]         (short form /C)
/File:<string> (short form /F)
/ReportFormat:{Xml|Html} (short form /R)
/Port:<int> (short form /P)
/Shutdown[+|-] (short form /S)

Starting up with no options will result in the usual UI showing up, but you have a new mode available for you. Let us say that you want to output the results of your integration tests into a format that you can easily work with programmatically. Here is how it can be done:

nhprof.exe /CmdLineMode /File:Output.xml /ReportFormat:Xml <-- starts listening to applications
xunit.console.exe Northwind.IntegrationTests.dll
nhprof.exe /Shutdown <-- stop listening to applications and output the report

The example is using NH Prof, but the same holds for all the other variants.

The way it works is very simple and should be pretty easy to integrate into your CI process. The XML output can give you programmatic access to the report, while the HTML version is human readable.

One thing that you might want to be aware of, writing the report file is done in an async manner, so the shutdown command may return before writing the file is done. If you need to process the file as part of your build process, you need to wait until the first profiler instance is completed. Using PowerShell, this is done like this:

nhprof.exe /CmdLineMode /File:Output.xml /ReportFormat:Xml
xunit.console.exe Northwind.IntegrationTests.dll
nhprof.exe /Shutdown
get-Process nhprof | where {$_.WaitForExit() } <-- wait until the report export is completed

Please note that from a licensing perspective, the CI mode is the same as the normal GUI mode. On one hand, it means that you don’t need to do anything if you have the profiler already. On the other, if you want to run it on a CI machine, you would need an additional license for that.

Handling production errors in a messaging environment

So, today I got the first L2S Prof order. As you can imagine, I was pretty excited about that. However, it turned out that I had actually missed something when I built the backend for handling L2S Prof ordering. The details about what actually went wrong aren’t important (and are embarrassing).

But I logged into the server and checked out what was going on:

image

One of the major design criteria that I had with Rhino Service Bus is that it should be dead easy to handle production. As you can see in the screen shots, I have two messages of interest here, the first one is the actual message, and the second is the error information, which include the full stack trace. Using that, it was a piece of cake to isolate the problem, do the head slapping moment, and deploy a new version out. Once that was done, all I had to do is to move the message back to the processing queue, and I was done.

Just to give you an idea, here is how it looks like on my timeline:

image

I guarantee you that the customer in question didn’t have any idea that there was something wrong with his order processing.

I am loving it.

What should I talk about at QCon London?

I might be speaking at QCon London, but I am not sure what about.

The requirements are:

This track intends to showcase some of the practitioners, tools and technologies to provide an awareness of something other than the Microsoft mantra for software development on .NET

Each talk should show at least one thing that is new or unusual for the masses on .NET to know or use and compare it to the status quo. It should provide some in depth examples or code around that comparison. In the cases where the speaker is the author of an OSS product also give a broader rationale and explanation of the tool and when it is best used.

My main issue is that there are too many topics to talk about, and I thought that I might put it on the blog and see what people are interested in.