Ayende @ Rahien

Refunds available at head office

RavenDB And Not Having Foreign Keys

This is something that we hear quite often in the mailing list, and I thought that I would spend the time to answer it in full. There tend to be two kinds of FK references in RDMBS, the essential ones and the one the DBA added just to make my life a living hell.

The first one include things that are essential for internal consistency within a single aggregate. For example, the OrderLines.OrderID FK reference to Orders.ID is quite important, since an order line without an order is meaningless. But what about the association between OrderLine.ProductID and Products.ID ? 

An OrderLine can most certainly exists without the product, in fact, an OrderLine already copied into it all of the properties of the product that are important for the order. But because of the FK happy nature of most DBAs (and developers, for that matter), we have a FK reference between the two. The problem is that it actually is perfectly fine to remove a product that we are no longer selling from the store.

Yes, there are order lines for that product, but they have been completed ages ago. With a RDBMS and a FK, you cannot do that. So you resort to hacks like IsDeleted = false, which in practice gives you the exact same behavior as a deleted product, except that the FK is happy. Your application has a 50/50 change to work or not work with that.

With RavenDB, we make distinctions between internal consistency, which is maintained inside the same document, and external references, which can come and go as they please. You cannot have an order line in RavenDB without an order, because the order is where the order line exists. But you can most certainly remove the product that an order line refers to, because that is outside the scope of the order, it is a separate aggregate.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Rafal
02/26/2013 10:36 AM by
Rafal

Are you saing that not having inter-document consistency checks in the DB is an advantage over having them? I beg to differ.

alastair
02/26/2013 10:41 AM by
alastair

Many RDBMS will allow you to specify DELETE behaviour such that deleting the parent record will automatically set the childs key to NULL in this situation.

Mango
02/26/2013 11:14 AM by
Mango

Ayende,

...i'm new to raven ...wath's about the costo of separate aggregates life cycles?

tyhanks

Ayende Rahien
02/26/2013 11:27 AM by
Ayende Rahien

Mango I don't follow the question

Jon Samwell
02/26/2013 11:28 AM by
Jon Samwell

@alastair - nullable fk's are evil and will just lead to confusion in the relational db world and cascade deletes cause pain. I think what Oren is trying to say is you cannot approach document db's the same way you approach relational. The things we have to do to keep relational databases happy do not exists when you look at the NOSQL paradigm.

Gilligan
02/26/2013 12:17 PM by
Gilligan

I think we have in the past used foreign keys thinking something like "Hey, now we don't have to think about this relationship in the domain". But despite our efforts to hide aspects of the domain from our minds, it comes back to haunt us. Foreign keys aren't a silver bullet and do end up causing much pain. The number of times I have seen "IsDeleted" and the pain that has also caused is astounding. I think Oren is saying exactly this: we cannot hide these relationships magically from the domain. We must face them head on in our app and deal with them from a business perspective.

Chris Ammerman
02/26/2013 12:51 PM by
Chris Ammerman

It's less common to see in an app database, but in the Data Warehouse world this is dealt with using "slowly changing dimensions." http://en.m.wikipedia.org/wiki/Slowlychangingdimension#section_7 (Usually the "Type 2" kind.)

David Boike
02/26/2013 05:10 PM by
David Boike

I think what Oren glossed over, and is critical in understanding this concept if you're coming from the RDBMS, is that when we model Order and OrderLines in a document database, we note the ProductID but we also bring in a few critical pieces of the Product as well - maybe just the name.

That way, when the product gets deleted later, it doesn't matter. We have all the information we need to display the details of that order I made 2 years ago.

The real kicker is that in a relational model, if i bought a Widget 2 years ago, and then 1 year ago the Widget product got updated to become a Super Widget, my order history would now indicate that I bought a Super Widget! But I didn't! I only bought a Widget!

By bringing the product name into the Order + OrderLines model, we maintain that historical accuracy no matter what happens to the product at some later date, PLUS we don't have to always remember to select where IsDeleted = 0.

Tim Murphy
02/26/2013 07:13 PM by
Tim Murphy

+1 David. I was going to make the same comment.

Whether it is a NoSQL or SQL database OrderLines must shield itself from changes to the Products by at least keeping the name and unit price at time of order.

Scooletz
02/26/2013 08:40 PM by
Scooletz

@Rafal +1 having no option of adding or not an FK it's not a feature. It's implementation/design decision which one have to live with. I'm not saying it's needed for document database like Raven. I'm saying that not having it is not a plus for the db.

Phil Degenhardt
02/26/2013 11:32 PM by
Phil Degenhardt

Snapshotting the relevant data is perfectly fine but it does not have to mean that the association should be discarded. Why not maintain both the association and the relevant snapshot? The association connects this entity to other entities in the system and thus permits analysis, which wasn't initially conceived, to be bolted onto the system later and applied retrospectively. Discarding the association seems to be taking YAGNI one step too far.

Steve Py
02/27/2013 03:06 AM by
Steve Py

@phil - Because to be in any way reliable this requires temporal data. Otherwise you have two pieces of related data at some point in time that will certainly conflict with one another. An example of this would be when a product's price changes. A customer comes back to dispute a price, the order says it was $500, the product says $450. The system has no record whether the product price was updated, let alone whether it was updated before or after the order was received. Real world scenarios will be more complicated, but keeping that product reference is only practically useful if the data supports a full append-only timestamp model. If it isn't crucially important to have that, then YAGNI.

Mike Minutillo
02/27/2013 03:31 AM by
Mike Minutillo

I really dislike these inter-aggregate FKs in RDBMS because it's so common to see someone generate an EF (or LLBLGen, or whatever) layer over the top and end up with a bunch of navigation properties that are never (or rarely) used. These tend to either cause unintended lazy-loading or end up as null-reference exceptions.

Justin
02/27/2013 04:18 PM by
Justin

Temporal data is no excuse for denormalizing or removing FK's. This problem was solved in the late 90's for relational databases using bitemporal tables.

Read http://www.cs.arizona.edu/~rts/tdbbook.pdf if you want to implement orders with products while still being able to see the product as it was at the time of the order or as it is now or any time in between without duplicating data elements in multiple places.

Oracle, DB2 and Teradata all have the SQL temporal standards implemented so you don't have to do it manually.

Gene Hughson
02/27/2013 05:43 PM by
Gene Hughson

I have to agree with all those who think snapshotting data is a poor attempt to fix the issue of a changing product dimension. It may prevent changes to the product data from corrupting the line items table, but it slams the door on a lot of other likely needs (e.g. tracking sales of the same product across name changes, analyzing price change effects on sales, etc.). If your particular application doesn't need that kind of referential integrity, then fine, but for something like the example, You Are Gonna Need It (and will be poorly positioned to get it).

Ayende Rahien
02/28/2013 06:57 AM by
Ayende Rahien

Gene, In any such scenarios, you usually have the product id as well as any other pertinent information.

Phillip Haydon
03/01/2013 12:59 AM by
Phillip Haydon

I wrote up something similar last year, but I refer to them as Relationships or References.

http://www.philliphaydon.com/2012/07/ravendb-what-am-i-persisting-what-am-i-querying-part-2/

Matt Johnson
03/29/2013 09:13 PM by
Matt Johnson

@Justin & @Steve - RavenDB can actually work with full temporal and bitemporal data using https://github.com/mj1856/RavenDB-TemporalVersioning

Comments have been closed on this topic.