RavenDB And Not Having Foreign Keys
This is something that we hear quite often in the mailing list, and I thought that I would spend the time to answer it in full. There tend to be two kinds of FK references in RDMBS, the essential ones and the one the DBA added just to make my life a living hell.
The first one include things that are essential for internal consistency within a single aggregate. For example, the OrderLines.OrderID FK reference to Orders.ID is quite important, since an order line without an order is meaningless. But what about the association between OrderLine.ProductID and Products.ID ?
An OrderLine can most certainly exists without the product, in fact, an OrderLine already copied into it all of the properties of the product that are important for the order. But because of the FK happy nature of most DBAs (and developers, for that matter), we have a FK reference between the two. The problem is that it actually is perfectly fine to remove a product that we are no longer selling from the store.
Yes, there are order lines for that product, but they have been completed ages ago. With a RDBMS and a FK, you cannot do that. So you resort to hacks like IsDeleted = false, which in practice gives you the exact same behavior as a deleted product, except that the FK is happy. Your application has a 50/50 change to work or not work with that.
With RavenDB, we make distinctions between internal consistency, which is maintained inside the same document, and external references, which can come and go as they please. You cannot have an order line in RavenDB without an order, because the order is where the order line exists. But you can most certainly remove the product that an order line refers to, because that is outside the scope of the order, it is a separate aggregate.
Comments
Are you saing that not having inter-document consistency checks in the DB is an advantage over having them? I beg to differ.
Many RDBMS will allow you to specify DELETE behaviour such that deleting the parent record will automatically set the childs key to NULL in this situation.
Ayende,
...i'm new to raven ...wath's about the costo of separate aggregates life cycles?
tyhanks
Mango I don't follow the question
@alastair - nullable fk's are evil and will just lead to confusion in the relational db world and cascade deletes cause pain. I think what Oren is trying to say is you cannot approach document db's the same way you approach relational. The things we have to do to keep relational databases happy do not exists when you look at the NOSQL paradigm.
I think we have in the past used foreign keys thinking something like "Hey, now we don't have to think about this relationship in the domain". But despite our efforts to hide aspects of the domain from our minds, it comes back to haunt us. Foreign keys aren't a silver bullet and do end up causing much pain. The number of times I have seen "IsDeleted" and the pain that has also caused is astounding. I think Oren is saying exactly this: we cannot hide these relationships magically from the domain. We must face them head on in our app and deal with them from a business perspective.
It's less common to see in an app database, but in the Data Warehouse world this is dealt with using "slowly changing dimensions." http://en.m.wikipedia.org/wiki/Slowly_changing_dimension#section_7 (Usually the "Type 2" kind.)
I think what Oren glossed over, and is critical in understanding this concept if you're coming from the RDBMS, is that when we model Order and OrderLines in a document database, we note the ProductID but we also bring in a few critical pieces of the Product as well - maybe just the name.
That way, when the product gets deleted later, it doesn't matter. We have all the information we need to display the details of that order I made 2 years ago.
The real kicker is that in a relational model, if i bought a Widget 2 years ago, and then 1 year ago the Widget product got updated to become a Super Widget, my order history would now indicate that I bought a Super Widget! But I didn't! I only bought a Widget!
By bringing the product name into the Order + OrderLines model, we maintain that historical accuracy no matter what happens to the product at some later date, PLUS we don't have to always remember to select where IsDeleted = 0.
+1 David. I was going to make the same comment.
Whether it is a NoSQL or SQL database OrderLines must shield itself from changes to the Products by at least keeping the name and unit price at time of order.
@Rafal +1 having no option of adding or not an FK it's not a feature. It's implementation/design decision which one have to live with. I'm not saying it's needed for document database like Raven. I'm saying that not having it is not a plus for the db.
Snapshotting the relevant data is perfectly fine but it does not have to mean that the association should be discarded. Why not maintain both the association and the relevant snapshot? The association connects this entity to other entities in the system and thus permits analysis, which wasn't initially conceived, to be bolted onto the system later and applied retrospectively. Discarding the association seems to be taking YAGNI one step too far.
@phil - Because to be in any way reliable this requires temporal data. Otherwise you have two pieces of related data at some point in time that will certainly conflict with one another. An example of this would be when a product's price changes. A customer comes back to dispute a price, the order says it was $500, the product says $450. The system has no record whether the product price was updated, let alone whether it was updated before or after the order was received. Real world scenarios will be more complicated, but keeping that product reference is only practically useful if the data supports a full append-only timestamp model. If it isn't crucially important to have that, then YAGNI.
I really dislike these inter-aggregate FKs in RDBMS because it's so common to see someone generate an EF (or LLBLGen, or whatever) layer over the top and end up with a bunch of navigation properties that are never (or rarely) used. These tend to either cause unintended lazy-loading or end up as null-reference exceptions.
Temporal data is no excuse for denormalizing or removing FK's. This problem was solved in the late 90's for relational databases using bitemporal tables.
Read http://www.cs.arizona.edu/~rts/tdbbook.pdf if you want to implement orders with products while still being able to see the product as it was at the time of the order or as it is now or any time in between without duplicating data elements in multiple places.
Oracle, DB2 and Teradata all have the SQL temporal standards implemented so you don't have to do it manually.
I have to agree with all those who think snapshotting data is a poor attempt to fix the issue of a changing product dimension. It may prevent changes to the product data from corrupting the line items table, but it slams the door on a lot of other likely needs (e.g. tracking sales of the same product across name changes, analyzing price change effects on sales, etc.). If your particular application doesn't need that kind of referential integrity, then fine, but for something like the example, You Are Gonna Need It (and will be poorly positioned to get it).
Gene, In any such scenarios, you usually have the product id as well as any other pertinent information.
I wrote up something similar last year, but I refer to them as Relationships or References.
http://www.philliphaydon.com/2012/07/ravendb-what-am-i-persisting-what-am-i-querying-part-2/
@Justin & @Steve - RavenDB can actually work with full temporal and bitemporal data using https://github.com/mj1856/RavenDB-TemporalVersioning
Comment preview