Building a recommendation engine in NHibernate

Well, it isn’t really a recommendation engine, it is a sample of one, and I strongly recommend not using it, but I am getting ahead of myself.

In the 6th episode of the TekPub’s NHibernate webcast, me & Rob worked on creating statistical queries with NHibernate. To be totally honest, the reason that we did that is to show off NHibernate’s querying capabilities, not so you would be able to make use of this in your applications. A recommendation engine is not something that you should run out of your OLTP store, so please take that under advisement.

The reason for this post is to explain in details how the final result works. Here is the NHibernate code:

var orderIDsContainingCurrentSku = DetachedCriteria.For<OrderItem>()
            .Add<OrderItem>(x=>x.Product.SKU==sku)
            .SetProjection(Projections.Property("Order.id"));

var skusOfProductsAppearingInOrdersContainingCurrentSku = DetachedCriteria.For<OrderItem>()
    .SetProjection(Projections.GroupProperty("Product.id"))
    .AddOrder(NHibernate.Criterion.Order.Desc(Projections.Count("Order.id")))
    .Add<OrderItem>(x=>x.Product.SKU!=sku)
    .Add(Subqueries.PropertyIn("Order.id", orderIDsContainingCurrentSku))
    .SetMaxResults(15);


var recommended = _session.CreateCriteria<Product>()
    .SetFetchMode<Product>(x => x.Descriptors, FetchMode.Join)
    .Add(Subqueries.PropertyIn("id", skusOfProductsAppearingInOrdersContainingCurrentSku))
    .SetResultTransformer(Transformers.DistinctRootEntity)
    .List<Product>();

And here is the resulting SQL:

SELECT this_.SKU                 as SKU1_1_,
       this_.ProductName         as ProductN2_1_1_,
       this_.BasePrice           as BasePrice1_1_,
       this_.WeightInPounds      as WeightIn4_1_1_,
       this_.DateAvailable       as DateAvai5_1_1_,
       this_.EstimatedDelivery   as Estimate6_1_1_,
       this_.AllowBackOrder      as AllowBac7_1_1_,
       this_.IsTaxable           as IsTaxable1_1_,
       this_.DefaultImageFile    as DefaultI9_1_1_,
       this_.AmountOnHand        as AmountO10_1_1_,
       this_.AllowPreOrder       as AllowPr11_1_1_,
       this_.DeliveryMethodID    as Deliver12_1_1_,
       this_.InventoryStatusID   as Invento13_1_1_,
       descriptor2_.SKU          as SKU3_,
       descriptor2_.DescriptorID as Descript1_3_,
       descriptor2_.DescriptorID as Descript1_4_0_,
       descriptor2_.Title        as Title4_0_,
       descriptor2_.Body         as Body4_0_
FROM   Products this_
       left outer join ProductDescriptors descriptor2_
         on this_.SKU = descriptor2_.SKU
WHERE  this_.SKU in (SELECT   top 15 this_0_.SKU as y0_
                     FROM     OrderItems this_0_
                     WHERE    not (this_0_.SKU = 'Binoculars2' /* @p0 */)
                              and this_0_.OrderID in (SELECT this_0_0_.OrderID as y0_
                                                      FROM   OrderItems this_0_0_
                                                      WHERE  this_0_0_.SKU = 'Binoculars2' /* @p1 */)
                     GROUP BY this_0_.SKU
                     ORDER BY count(this_0_.OrderID) desc)
 

The problem is that both the NHibernate code and the SQL are pretty complicated, and mapping between the two might be pretty hard if you are not familiar with that. So let us take this in stages. First, let us understand the logic in the SQL itself. Most of the complexity happens in the where clause, so let us look at this in depth:

WHERE  this_.SKU in (SELECT   top 15 this_0_.SKU as y0_
 FROM     OrderItems this_0_
 WHERE    not (this_0_.SKU = 'Binoculars2' /* @p0 */)
          and this_0_.OrderID in (SELECT this_0_0_.OrderID as y0_
                  FROM   OrderItems this_0_0_
                  WHERE  this_0_0_.SKU = 'Binoculars2' /* @p1 */)
 GROUP BY this_0_.SKU
 ORDER BY count(this_0_.OrderID) desc)

What exactly is going on in here?

Let us look at the nested most select, this select the OrderId from an order that have a Binoculars item in it. That is then passed to the parent select, matching it to that OrderID and returning the SKU if it is not also a Binoculars.

What we are actually saying here is: Give me all the items from orders that contains Binoculars, except for Binoculars. The logic is simple, you are very likely to buy something that someone else also bought in an order together with stuff that you bought (complimentary products, another book in the same series, etc). Next, we have the order by, we use that to find the stuff that you are most likely to buy. By ordering the items based on the number of orders they appear in, we try to find the most popular items (hence, the stuff that you are likely to buy as well).

I think that this is fairly clear, and now that we have the logic of the statement, let us try to understand how that NHibernate code produced it.

The answer is actually very simple, NHibernate’s Criteria API is all about composability. We simply composed the query from all the tiny pieces. Let us look at each individual piece in detail:

var orderIDsContainingCurrentSku = DetachedCriteria.For<OrderItem>()
            .Add<OrderItem>(x=>x.Product.SKU==sku)
            .SetProjection(Projections.Property("Order.id"));

This query is using DetachedCriteria, this is a way to generate a query (or a sub query, as we will soon see), without having a direct reference to the session. This is mostly useful in cases like this, where you want to compose several queries into a single one.

image In this case, it is pretty obvious what is going on, we ask NHibernate to select from OrderItems (by using the OrderItem entity), where the product sku is equal to the appropriate SKU (Binoculars2, in this case), and we don’t want to get the entity back, instead, we want only a single field (this is what SetProjection is for), the order id. Note that OrderItem mapping is quite interesting:

<class name="OrderItem" table="OrderItems" >
  <composite-id>
      <key-many-to-one name="Product" column="SKU"/>
      <key-many-to-one name="Order" column="OrderID"/>
  </composite-id>
</class>

It is a class with a composite id, where each part of the PK is also a FK to a different table. With NHibernate, we map this using <key-many-to-one/> in a composite-id element.

When we want to query on that, we can either use the usual many-to-one approach, or, if we want to refer to a particular column, we use the “id” (all lower case) keyword. In other words, “Order.id” refers to the OrderItems.OrderID column, while “Product.id” or “Product.SKU” refers to the “OrderItems.SKU” column. I think that you can figure out what is going on now, this query generate the following SQL:

SELECT this_0_0_.OrderID as y0_
FROM   OrderItems this_0_0_
WHERE  this_0_0_.SKU = 'Binoculars2' /* @p1 */

And I think that can see the direct correlation between the NHibernate query and the generated SQL.

Next in line, and seemingly much more complicated, we have this:

var skusOfProductsAppearingInOrdersContainingCurrentSku = DetachedCriteria.For<OrderItem>()
    .SetProjection(Projections.GroupProperty("Product.id"))
    .AddOrder(NHibernate.Criterion.Order.Desc(Projections.Count("Order.id")))
    .Add<OrderItem>(x=>x.Product.SKU!=sku)
    .Add(Subqueries.PropertyIn("Order.id", orderIDsContainingCurrentSku))
    .SetMaxResults(15);

But it isn’t really complidated, let us look at this in details. The first line is already familiar for us, asking to select from OrderItems. Next, we use SetProjection again, to select just the “Product.id”, which is the OrderItems.SKU. Note that we are using something slightly different this time, where before we used Projections.Property, now we use Projections.GroupProperty. What is the difference between the two?

Projection.Property instructs NHibernate to put the matching column in the select clause, while Projection.GroupProperty instructs NHibernate to put the matching column in the select clause and in the group by clause. This is required because on the next line, we are using an aggregate function in the order by clause, aggregate functions must be used in conjunction with the appropriate group by clause. That line also specify that we are using a descending order on the count of the “Order.id” (which matches OrderItems.OrderID).

The following line is something we are already familiar with, we are adding a where clause to filter the current SKU. And now we get to the interesting part, we use a subquery to match the “Order.id” to the order ids containing the current SKU. Last, but not least, we limit the number of returned rows to 15. The resulting SQL is:

SELECT   top 15 this_0_.SKU as y0_
FROM     OrderItems this_0_
WHERE    not (this_0_.SKU = 'Binoculars2' /* @p0 */)
      and this_0_.OrderID in (/* orderIDsContainingCurrentSku */)
GROUP BY this_0_.SKU
ORDER BY count(this_0_.OrderID) desc

I think that again, once we have gone over this in details, you will agree that there is a pretty simple mapping between the query and the resulting SQL.

Now, let us look at the actual query code, which make use of the previous two subqueries:

var recommended = _session.CreateCriteria<Product>()
    .SetFetchMode<Product>(x => x.Descriptors, FetchMode.Join)
    .Add(Subqueries.PropertyIn("id", skusOfProductsAppearingInOrdersContainingCurrentSku))
    .SetResultTransformer(Transformers.DistinctRootEntity)
    .List<Product>();

We create a standard criteria query, ask it to eager load the Descriptors, then we perform a subquery, matching the product “id” (when specified using all lower case it is an NHibernate keyword referencing the current entity’s PK column) to the skus of products that appear in orders contains the current sku. Because we used a join to eager load the descriptors, we need to specify SetResultTransofer so we will get only distinct root entities.

All in all, when you break it up to pieces like that, I don’t think that it is an overly complex process. The query we were trying to get to is by no means a simple one, but we didn’t have any additional complexity when trying to create it using NHibernate.

Print | posted on Monday, December 07, 2009 2:20 PM

Feedback


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 4:10 PM Bob

Does this really qualify as a "recommendation engine"? I say that because this seems like it's a common feature of a web app like the one in the screen cast. I would say in this case the NH query is actually more confusing (depending on experience of course) than the raw SQL.

So, If you don't recommend something like this is used in production, what would you recommendation someone using NH do in this case? I mean, this is a common need for an e-commerce app.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 4:12 PM Ayende Rahien

Bob,
As I said, what we did was show how we can use NHibernate to create moderately complex queries. That was the goal, not talking about the actual SQL.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 4:13 PM Ayende Rahien

Bob,
As for how I would build something like that?
Probably by utilizing an OLAP DB and querying from that.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 4:16 PM Bob

Ayende,

OK, I guess my point is that if the scope of the app is small, an OLAP DB may be overkill. So, something like the NH query solution you posted may indeed be necessary in a real production environment.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 4:22 PM Ayende Rahien

Bob,
And if you like it, you can use it. But we wanted to be crystal clear that this isn't the general recommendation.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 4:52 PM jbland

last year in a burst of energy i ported

http://taste.sourceforge.net/

to .net (the pre-Mahout version). There is sufficient extensibility to interface with external data stores, but i never got around to nHibernate integration. i plan to do so in a few months for a current project.

http://mymediaproject.codeplex.com/ seems like a full blown solution that i _would_ use if the license was right (free only for non-commercial use, but my project will be MIT ish)


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 5:09 PM Rafal

Well, I think you just recommended not using NHibernate for statistical queries. SQL is good enough for this... and the Criteria API is just ugly and difficult to understand


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 6:22 PM Dmitry

In my opinion OLAP would be way overkill for this. Why not just use HQL or a named SQL query? Criteria is ugly for non-trivial queries and hopefully a good LINQ provider for NHibernate will be ready soon.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 7:46 PM Demis Bellot

I tend to agree with the others. In this case SQL would provide much more clarity of the intent of the query than the obfuscated use of the Criteria API. Most likely find I would have to use a profiler to look at the SQL output to reverse-engineer what's going on in this situation.

When it comes to complex groupings and sub queries I would really need a good reason not to use LINQ or custom sql.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 7:48 PM Ajai Shankar

Hi Ayende

I was working on something similar yesterday and encountered a bizarre behavior with projections which might be a bug in NH...

Could you shed some insight into the following behavior I noticed:

"GroupProperty across associations doesn't seem to work if no alias is specified in the detached criteria"

The domain is of Customer having many Orders and we already have the detached criterias nicely wrapped in "finders"

I wanted to use the existing Order finder DetachedCriteria.For()) and project Sum(order.Total) by order.Customer.Name:

Projections.GroupProperty("Customer.Name") does NOT work and throws NHibernate.QueryException "could not resolve property: Customer.Name of: Domain.Order"

But if I add an alias in the criteria:

query.CreateAlias("Customer", "c") and group using alias Projections.GroupProperty("c.Name") it WORKS!

Also strangely a projection on the Customer.Id property works fine without any alias...

Ajai







Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 7:52 PM Rob Conery

Gents I think you're missing the point. RE the Criteria API we did it that way because I asked him too - if we went the HQL route I would have just did as you mention and used SQL or an SP.

The main reason it's "ugly" is because we tried it in the first place. And OLAP system doesn't need to be backed by a full OLAP service - you can "star up" a set of flattened tables, index them, and run a query that way which we be a lot prettier and probably a super simple Criteria call. I'll pick up this point in the next screencast.

OLAP is never overkill if you find yourself needing to introspect data for patterns etc. This is your business and directly at the heart of what it is you're doing. I would agree that the systems out there (like SQL Integration) are pathetic when it comes to this stuff - but again you don't need a full system like that to build out OLAP.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 9:08 PM Ayende Rahien

Dmitry,
You should not run these types of queries against your OLTP data store.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 9:09 PM Ayende Rahien

Ajai,
Please use the NHibernate mailing list
http://groups.google.com/group/nhusers

Alternatively, you may choose our commercial support option:
http://nhprof.com/commercialsupport


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 10:40 PM Oren

Would you run full-featured analysis services for this feature? Rob is talking about creating flat database tables. How would they be populated, with triggers?

I do not see SQL struggling to much with the query because it only returns up to 15 records. That is assuming the tables are indexed properly.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 10:47 PM Ayende Rahien

Oren,
That depends on too many conditions to answer.
Depending on the traffic on the site, the size of the data, etc.
Probably I'll go with a separate DB to hold renormalized schema with some process to sync the OLTP and reporting DB.


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 11:36 PM zvolkov

What is this generic overload of Add that takes an expression? Is this something new in the trunk, I don't think I have it in NH 2.1.2...


Gravatar

# re: Building a recommendation engine in NHibernate 12/7/2009 11:43 PM Ayende Rahien

zvolkov,
That is an extension method from NH Lambda Extensions


Gravatar

# re: Building a recommendation engine in NHibernate 12/11/2009 9:21 AM Michael Teper

Perhaps its the syntax that is foreign to me, but I kept stumbling on it. I am not saying I think its a good idea, but I can't wait to be able to do this sort of thing with LINQ. :-)

Comments have been closed on this topic.