Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,124 | Comments: 45,475

filter by tags archive

What is wrong here? Solution

time to read 1 min | 199 words

In my previous post, I showed the database schema and the UI and asked what was wrong with that. Before we move on, here is what I showed.



If you looked carefully, you might have noticed that there are duplicate PO# and Tracking# in the UI. More than that, we somehow double charged the customer for shipping.

What is going on?

It is actually fairly obvious, when you think about it. Look at the schema, there isn’t actually any association between the Tracking # and the PO #. In most orders, we have only 1 PO #, so it was easy to just add this information by just pulling it from the DB and adding a few columns. But when we got an order that has multiple POs… that is when all hell breaks lose.

This is a classic Cartesian Product problem.

The solution?  Actually model the UI to avoid suggesting that there is a relationship between the tracking and purchase orders, like this:




Spelling of OrderId is wrong in OrderLines table ;-)

Wayne M

That solution works in any case except one where management is adamant "We MUST be able to see purchase orders and tracking in the same grouping". I've run into a few places like that in the "real world" where someone absolutely MUST have things displayed a certain way whether it's convenient or not, and won't take no for an answer.


@Wayne - I have run into that too, but normally I say "how do you want it displayed when they have multiple POs and multiple tracking numbers? With what we have now, it would show double for shipping charges"

Making them make the decision makes them think about it, and when they flounder, you say "what if we did this?"

Note that nowhere in there am I telling them that they are wrong. I are not directly saying it's a bad design, and I am always using the term "we", including them.

You'd be surprised at much more effective that can be over saying "No."


@Philip - many customers are not interested at all on the limitations of the current technical solution - they just want to hear how long it will take to implement the requirement and how much it will cost. :)


So, any conclusions? Cartesian product considered harmful?

Frans Bouma

The whole model is so bad, that there's no real solution other than redoing it. For example, the 'purchase order' entity, what does it mean? It only adds a new attribute 'ApprovedAt' for the OrderId it's related to, but... does that mean an order can be approved multiple times? And what does each time mean?

Looking at the UI it doesn't make any sense either.

Oh, and there's no such thing as a 'cartesian product problem': a cartesian product is a set operation, that's all. This problem isn't about a cartesian product, as it's not a problem related to a set operation but related to a crappy modeled database.

Tip: first create an abstract entity model and then project it to a relational model (and code). You'll see your tables make much more sense.

@Rafal if you don't know what a cartesian product is, you should read up on it. Every join operation is based on it. http://en.wikipedia.org/wiki/Cartesian_product. In theory at least. (so take the cross-join, which is the cartasian product) and filter out the rows not matching the ON predicate). In practice, a hash-based join or other shortcuts are easier as cross-join based joins are often slow because it produces a lot of rows which can be discarded by the predicate (so you can better use the predicate to do the join to filter out rows which don't need to be considered in the first place.)

Steven Fox

@Frans Bouma, I'd be interested in seeing how you'd redo this model.

The ability to add more than one PO per order is required, as approval may come from different departments within our customers. So PO should also have approved_value.

In our case we have lineitems ->packs->tracking rather than orders->tracking. We can have lineitems from multiple orders in a single pack, for small items, but we can also have multiple packs for a single lineitem.



Frans Bouma

@Steven: First of all, the model doesn't enforce relational integrity properly. Look at Tracking. I can add multiple rows for the same Order, but what does that mean? Only if there's more information in the entity which tells me what the costs are for, it's not useful: as the information in the entity will not tell me the difference between row X and row Y (except the cost/sentat value differences). This is important as the model doesn't stop me from adding multiple times the same row (different ID, same values for OrderId, cost, sentat).

Same is true for purchaseorders: they're not related to anything, I can add 10 times the same values for Orderid and ApprovedAt.

OrderLines: the model allows multiple orderline items for the same product. This doesn't make sense. This can be fixed by making the PK (OrderId, ProductId).

If the actual, full, model is doing all that, OK. If not: it needs correction. Though looking at the fact that all tables have the same pk 'Id', I fear someone who thought to be clever made this model and giving every entity the pk 'Id', even if it didn't make any sense (like in OrderLine).

If there's an entity 'Department', PurchaseOrders should get as pk 'OrderId, DepartmentId', so the model already enforces the fact that you can't add multiple approvals of order X by the same department D.

If the model is this way because the o/r mapper is too lame to deal with compound PKs when it needs to, use another one which does.


Frans - relax, it was irony. Next time I'll try to make it more explicit.

Frans Bouma

@rafal hehe :) I didn't catch that, but then again... I sometimes have to switch off my Sheldon mode ;)


Actual question: Am I missing something? These two posts leave me with a "Well, duh..." feeling. An obvious error and the obvious fix is to....remove it.

Was it to show the pitfalls of sloppy relational querying?

Comment preview

Comments have been closed on this topic.


  1. RavenDB 3.5 whirl wind tour: You want all the data, you can’t handle all the data - one day from now
  2. The design of RavenDB 4.0: Making Lucene reliable - about one day from now
  3. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - 3 days from now
  4. The design of RavenDB 4.0: Physically segregating collections - 4 days from now
  5. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 5 days from now

And 14 more posts are pending...

There are posts all the way to May 30, 2016


  1. RavenDB 3.5 whirl wind tour (14):
    29 Apr 2016 - A large cluster goes into a bar and order N^2 drinks
  2. The design of RavenDB 4.0 (13):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats