The fallacy of IRepository

time to read 4 min | 693 words

I didn't have a controversial title in a few days :-)

The cause for this post is a post by Rob Conery where he suggests removing the RDMBS from the equation, at least during development. In particular, he suggests using an OODB during development and switching to RDBMS late in the game, when going to production. The reason for doing so in order to reduce the friction of having to maintain a database during development.

One thing that I feel that I have to point out. DB is a friction point in one of several conditions. If your tool doesn't support your rapid changes, then it is usually an indication of a problem with the tool. I can tell you that on my most recent project, there wasn't a day in which the DB schema of the project hasn't been changed in some way, and several times I had to make significant changes (rearranging the entire model). NHibernate takes away a lot of the pain using a DB, because you don't really care about what is going on. And using Active Record attributes or Fluent NHibernate makes it an even easier task. I don't know what the state of convention based configuration is for Fluent NHibernate, but that is a very promising direction.

Anyway, that is not the point of this post.

I agree with a lot of the points that Rob is making, and I'll expand on them in an additional post, but right now I wanted to actually address a comment I made on Rob's post, which I feel wasn't clear enough.

One big problem is that for most applications, trying to change OODB to RDBMS would not work without a LOT of work.

There are a couple of things that I tried to put into a very terse comment. The first is that you should practice the way you play, and that include putting any constraints that you have for production into the development environment. But even this isn't the point of this post.

If you look at the title, you'll see that I am decrying the fallacy of IRepository. In particular, this is what I disagree to:

Hi Oren - if you implemented IRepository<T> as I've done here, how would this not work? Can you be specific in terms of "a lot of work" and what that means?

In this case, the problem is that the interface for IRepository contains a lot of unspoken assumptions about the way you deal with persistence storage. Let us take an example of moving an IRepository between OODB and RDMBS. OODB query access patterns are completely different than the ones that you would use for RDMBS. A trivial difference that has profound implications is getting Blog with all its Posts and all their Comments. The only way of doing this with RDBMS is using joins (in a single statement), which is going to cause Cartesian product, which is expensive in the DB and have to be dealt with in the app layer. In the case of OODB, you just let the OODB handle that and move on. It is not using relational algebra, and it can handle this specific scenario pretty well.

Let us take it from the other way now, all my IRepository implementation recently has been using the future pattern, in which they return an IEnumerable<T> implementation, which is aggregated with all the queries for the request and then sent to the DB as a single remote call. That works really well. But what is going to happen if the OODB doesn't support this notion? (a cursory search didn't reveal anything enlightening, so I am assuming it is not supported for now).

You code previously assumed 1 remote call for N queries, but now you are faced with N remote calls for N queries. Even assuming that each query time is constant, the performance difference between the two is significant and crippling.

IRepository is a good way of decoupling you from the nitty grity details of how things work, but it doesn't decouple you from the abstract notions. Not for any real world implementation, at least.

Tweet Share Share 19 comments

Tags:

Design

Comments

16 Nov 2008
09:56 AM

Tuna Toksoz

Linq would make this transition easier, i think. Linq provides another level of abstraction so that you don't have to do joins _manually_. Even though this is the case, this assumption is usually not really good, i have to admit because not all the providers are perfect, and one has to workaround for one provider and not for the other. I mean db4o may not fail for one specific expression but linq 2 sql would(or yield in perf problem such as n+1), this thing would hide the problem when transition to sql occur.

16 Nov 2008
12:28 PM

Tobin Harris

Correction, I think a Cartesian product is where you don't have a join?

16 Nov 2008
12:29 PM

Andrew Peters

"The only way of doing this with RDBMS is using joins (in a single statement), which is going to cause Cartesian product, which is expensive in the DB and have to be dealt with in the app layer"

Not true. Batched nested result sets is another approach.

16 Nov 2008
12:50 PM

Mark Nijhof

Why not stick with the OODB? Depending on your needs of course but collage's have been very happy with Cache on a large project and it was faster then Oracle or SQL Server. And that can of course be bad query design :)

-Mark

16 Nov 2008
13:52 PM

Simon

You mention that your current IRepository implementations uses the future pattern. Have you got an example of it up anywhere I could read, please?

16 Nov 2008
14:20 PM

Ryan Roberts

Using db4o remotely appears to be a relatively rare scenario, batching becomes much less useful when everything is in process.

The repository interface I have chosen where I intend to take db4o into production is narrower than Rob's - my generic base only supports fetches of aggregates by their identifier. It's just too dangerous to expose its pretty immature linq support (it's quite possible to confuse it with even simple queries that involve generics) outside of the repository, as well as it lacking index support in some scenarios. It can't construct an index on object types for one thing, and I have an EAV scenario where the types of some things cannot be known at compilation time.

I am building a Lucene index on commit and using that for complex queries, as well as supporting projections from stored index fields to avoid activations in situations where I do not want to have drill into a large numbers of returned aggregates.

16 Nov 2008
14:30 PM

Bunter

Why not just prove it - create a sample application supporting first OODBMS and then switch it to RDBMS... It doesn't have to be even UI based, just little if core going beyond two table/object setup and some real world queries as test cases.

Lot of warm air would be spared...

16 Nov 2008
15:20 PM

Demis

I guess it would depend on the way you work. Personally for every service I develop, I first start with the building the domain model, i.e. a set of POCO classes that represents the domain of the application I'm creating with all the data and relations it needs to maintain. The end result is the 'ideal domain model' without any considerations for a RDBMS back end or UI front end. To persist the domain objects to a RDBMS I would translate it to a set of nHibernate data classes (which are mapped 1:1 with a db table) and save that. Basically my nHibernate data classes do not represent my domain model, they are just relational data objects I use to persist to a RDBMS.

To move to an OODB I just do away with the translator classes and just persist the domain model directly.

DB4O is a very functional OODB, with transactional support and optional client/server access. I use it for all my transactional and 'process services' (i.e. transient data) though for my repository/catalogue db's I still opt to use a RDBMS as I still like the security/ future proofing/full-text searching that a mature RDBMS can provide.

16 Nov 2008
15:23 PM

Tuna Toksoz

Simon,

ayende.com/.../Future-Query-Of-implemented.aspx
Check this and also the implementation in that link.

16 Nov 2008
18:56 PM

Ayende Rahien

Tobin,

Huh?

select * from Blog join Post on Blog.Id = Post.BlogId

will result in Cartesian product of each blog being repeated for each of its posts.

16 Nov 2008
18:56 PM

Ayende Rahien

Andrew,

That is not a single statement.

You can see that I referred to that latest in the post.

16 Nov 2008
19:00 PM

Ayende Rahien

Ryan,

You just made my point.

In particular, the part about using the DB locally vs. remotely has huge implications on the application.

16 Nov 2008
19:02 PM

Ayende Rahien

Bunter,

something like that would take _time_. I don't have much of that.

16 Nov 2008
22:27 PM

Simon

Thanks Ayende

16 Nov 2008
22:49 PM

Tobin Harris

I thought that a query with join conditions isn't a cartesian product? Basically, I thought the cartesian product would be this:

select * from Blog, Post

16 Nov 2008
22:58 PM

Andrew Peters

@Tobin,,

It's because he's referring to an Outer Join.

@Ayende,

Sure, but it is a single _call_. And, one that returns much less redundant data and can span a wider/deeper load graph.

16 Nov 2008
23:09 PM

Tobin Harris

@Andrew

Thanks. Wouldn't an outer join just ensure rows are return even if a join condition is not satisfied? it wouldn't result in a cartesian product would it? Maybe I need to hit the SQL books again!...

Heard the news about Microsoft btw, congrats :)

16 Nov 2008
23:46 PM

Andrew Peters

Tobin,

Yeah, this could just be some loose terminology.

Extremely jealous of your guitar setup!

17 Nov 2008
07:34 AM

Andrey Shchekin

__The only way of doing this with RDBMS is using joins (in a single statement), which is going to cause Cartesian product, which is expensive in the DB and have to be dealt with in the app layer.

Why does it "have to be dealt with in the app layer" ? If you have either .SelectMany() from Repository or some way to say that you want to preload all posts/comments (considering that you do want to preload them and lazy load does not work well enough), why should the app layer care about how the Repository does it?

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB