It Is Called Humor

time to read 8 min | 1412 words

Ted Neward is talking about Gavin's "In Defence of the Relational Database".

Sure enough, as people (apparently in this case, myself) start to talk about approaches to persistence that don't involve Hibernate, Gavin feels compelled to point to these other technologies using inflammatory terms and a certain amount of FUD.

I think that Ted is missing the point, the post is inflammatory intentionally, in order to make it funny. There is usually a distict difference betwen an inflammatory rant to such posts. For one, it is rare a rant that can make me guffaw.

Object-relational mapping isn't that hard, so there's no need to eliminate it. Sorry, Gavin, but the fact is, this remains, and always will remain, a point of difference between you and I, and between you and a fairly large number of developers I've spoken to over the years at conferences and consulting engagements and classes. For simple table-to-class mappings, you're right, it's a pretty simple thing. It is, however, still a "dual schema" problem, in that now you have two competing "sources of truth" that have to be reconciled to one another, the database schema, and the object model.

There are two cases here, where you have a legacy DB and when you get to create your own. For the first case, the complexity of the mapping in this case depends on how messed up the database is and how far from the existing schema do you want your domain model to be. For the second case, you get to create your own, in which case you simple let the tool handle all the persistence concerns, and don't really think about the persistence until you have to.

Oh, and the comment that "If you just want to "throw some objects in the database", you'll never need to write a single mapping annotation." really sort of proves the point I try to make in the paper: if you just want to "throw some objects in the database", why do you bother having an RDBMS in the first place?

Because I need to handle imports from the Monster DB, handle queries, reports, etc. I also need to scale well for cross-machine work (the DB server is a separate machine, across the network), the OR/M is a mature technology that is based on even more mature technology. Finding bottlenecks and fixing them is very easy, you get to see the data very easiy in tools such as management studio, etc. I can probably do the same with an OODBMS, I don't know, but I don't have a real need for that, and RDBMS is very simple and well understood.

There are DBAs that are in open revolt at the idea, particularly since you've also just conveniently left out any sort of indexing or other tuning decisions that will make the database perform at all reasonably.

Well, I think that the worst way to scale a database is described here, it is also one of the best answers to why I would like to move logic outside the database. I have a really nice DB schema that is generated from NHibernate, I haven't had a single issue with having to query it, so I don't think that a DBA would object to the schema. Ted's point about indexing is a very good one, I prefer to leave that until later, when I actually have data in the DB and I can actually do real queries on it and see where the cost is. Premature optimization applies to DB indexes as well.

The most common approach to performance with databases is to reduce the number of queries. This tend to change the queries sent to the database significantly. I don't see a lot of value in trying to place indexes on the "easy on the db, hard on the network" kind of queries, which is usually what you starts with. Going for production without indexes is stupid indeed.

(It's not like any self-respecting DBA is going to want to take your slapdash relational schema, anyway...)

I would disagree strongly on that, but I would like to hear more about the reasons for that first.

...rather than display his own benchmark that directly contradicts the benchmark offered by the OODBMS folks...

Benchamrks are an inefficent way to judge performance, unless you have tailored the benchmark for the current situation. Gavin has responded to such a benchmark here, and there is a Q&A about Performance in Hibernate here that discuss why there isn't a standard Hibernate benchmark. Specifically: "the number of variables in any decent benchmark make it almost impossible to transfer these results into reasonable conclusions about the performance of your own application"

But in a situation where you're just "throwing objects into the database", and nobody else is connecting to this data (in other words, you can be tightly coupled to the data storage), why take that overhead if it's not necessary? Choosing an out-of-proc database because "somebody may want to get to this data someday" is YAGNI, pure and simple.

Scaling, reliability, etc. I want to run it in a web farm and have the DB clustered and failed over. As for YAGNI, I challange anyone to take any reasonably sized application that is using in-process DB and try to run the DB in a separate machine. The technical issues are not relevant, this is a configuration change, mostly. The main issue is that you have vastly different trade offs for in-process DB vs. out-of-process DB. Number of remote calls is the key factor here.

"... the problem is that existing, mature RDBMS systems happen to not be written in Java (see Benefit #3)." Ouch. Don't let the Cloudscape developers hear you say that. Granted, HSQL is not what I'd call a "heavy-duty" RDBMS, but Gavin, not everything has to be stored in Oracle. Sometimes a lighter-weight database--MySQL, HSQL, Postgres, or even (gasp!) Access--is good enough. Or are you advocating that everybody should be using clustered J2EE servers to build their 5-user department calendar app?

Ted, in what cases would you recommend putting a HSQL in production?

I also think that you are ignoring a key part of Gavin's point. Hibernate is meant for use in scalable, high performance applications. If you need to write a 5 users department app, then Hibernate can offer a nicer programming model, but you wouldn't care about performance, now would you?

When you do care about performance, you usually move to those clusterd servers against a central (or RACed, or whatever) DB, and then you really want to have support from the tools for that.

OODBMS benchmarks suck because they measure ORM with caching turned off. As well they should, because not all ORM users can use caching. Particularly if they need to bypass the ORM for particularly sophisticated straight-up SQL queries.

If you can't use caching in your application, you are left with scaling at the database level, that hurts. And since Hibernate has a fairly sophisticated way of dropping things out of the cache if you tell it to (SessionFactory.Evict***() ) and several mechanisms that allows you to use SQL directly in Hibernate, I would say that this is invalid assumption to make.

If an O/R-M is doing more stuff than an OODBMS, but the end result is the same from the programmer's perspective, the fact tha the O/R-M has to do more stuff shouldn't be held against it?

See the previous points, the OR/M isn't doing more for nothing, it is doing more because it is focused on building scalable solution. So the end result is not the same at all.

The problem is with the general approach of trying to manage the associations of the object model and the fact that the complete object graph (which doesn't have to be a hierarchy, by the way) frequently is larger than the programmer wants to pull across the wire.

There are well understood solutions for that, that isn't the real issue. The issue is that for the hierarchical queries, there isn't a way right now to in a cross-db fashion. You can use the OR/M to do this, as shown here, and it isn't very hard, but it would be nicer to have it baked into the OR/M itself. I guess that will have to wait until we will have a standard way of doing that.