Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 540 words

With regards to my recommendation to avoid the repository, Stan asks:

You returned to explain why repository pattern is evil. Just interesting to know what are you doing when you need in your model to access another aggregate. Do you reference NH from your model? I prefer to leave my model POCOed and wrap DB calls by repository pattern. Sleeping good with it.

This is an interesting question. There are several ways to answer that. To start with, assuming that we are using DDD model (which the usage of a repository would imply), you don’t have references between aggregates.

But let us assume that we somehow need that, Stan seems to suggest something like this proposed solution:

public class Person 
{
    public static readonly DateTime ImportantDate;
    public BirthPlace BirthPlace { get; set; } 

    public DateTime BirthDate 
    { 
        get; private set;
    } 

    public void CorrectBirthDate(IRepository<BirthPlace> birthPlaces, DateTime date)
    {
        if (BirthPlace != null && date < ImportantDate && BirthPlace.IsSpecial) 
        { 
            BirthPlace = birthPlaces.GetForDate(date); 
        }
    }
}

Here we have a business rule that states that this is required.

But do we actually need a repository here? What if we just said that whoever calls us need to provide us with a way to get the birth place by date?

public void CorrectBirthDate(
        Func<DateTime, BirthPlace> getBirthPlaceFordate, 
        DateTime date)
{
    if (BirthPlace != null && date < ImportantDate && BirthPlace.IsSpecial) 
    { 
        BirthPlace = getBirthPlaceFordate(date); 
    }
}

This can be done with a simple delegate, no need to introduce a heavy weight abstraction. This is a local solution for a local problem. It keeps the database out from your entities and more importantly, it allows you to actually craft the appropriate response to this at the time of the call.

time to read 2 min | 386 words

Matthew Bonig asks, with regards to a bug in RavenDB MVC Integration (RavenDB Profiler) that caused major slow down on this blog.:

I'd be very curious to know how this code got published to a production environment without getting caught. I would have thought this problem would have occurred in any testing environment as well as it did here. Ayende, can you comment on where the process broke down and how such an obvious bug was able to slip through?

Well, the answer for that comes in two parts. The first part  is that no process broke down. We use our own assets for final testing of all our software, that means that whenever there is a stable RavenDB release pending (and sometimes just when we feel like it) we move our infrastructure to the latest and greatest.

Why?

Because as hard as you try testing, you will never be able to catch everything. Production is the final test ground, and we have obvious incentives of trying to make sure that everything works.  It is dogfooding, basically. Except that if we get a lemon, that is a very public one.

It means that whenever we make a stable release, we can do that with high degree of confidence that everything is going to work, not just because all the tests are passing, but because our production systems had days to actually see if things are right.

The second part of this answer is that this is neither an obvious bug nor one that is easy to catch. Put simply, things worked. There wasn’t even an infinite loop that would make it obvious that something is wrong, it is just that there was a lot of network traffic that you would notice only if you either had a tracer running, or were trying to figure out why the browser was suddenly so busy.

Here is a challenge, try to devise some form of an automated test that would catch something like this error, but do so without actually testing for this specific issue. After all, it is unlikely that someone would have written a test for this unless they run into the error in the first place. So I would be really interested in seeing what sort of automated approaches would have caught that.

time to read 1 min | 149 words

With regards to my quests against repositories, Matt asks:

…if my aggregate root query should exclude entities that have, for example, and IsActive = false flag, I also don't want to repeatedly exclude the IsActive = false entities. Using the repository pattern I can expose my Get method where internally it ALWAYS does this.

The problem with this question is that it make a false assumption, then go ahead and follow on that false assumption. The false assumption here is that the only way to handle the IsActive = false in by directly querying that. But that is wrong.

With NHibernate, you can define that with a where condition, or as a filter. With RavenDB, you can define that inside a query listener. You can absolutely set those things up as part of your infrastructure, and you won’t need to create any abstractions for that.

time to read 2 min | 271 words

With regards to my quests against repositories, Matt asks:

For example, you dismiss the repository pattern, but what are the alternatives? For example, in an ASP.NET web application you have controllers. I do NOT want to see this code in my controllers:

var sessionFactory = CreateSessionFactory();

using (var session = sessionFactory.OpenSession()) { using (var transaction = session.BeginTransaction()) { // do a large amount of work

// save entities
session.SaveOrUpdate(myEntity);

transaction.Commit();

} }

That is ugly, repetitive code. I want in my service methods to Get, update, save, and not have to worry about the above.

This is a straw dummy. Set up the alternative as nasty and unattractive as possible, then call out the thing you have just set up as nasty and unattractive. It is a good tactic, except that this isn’t the alternative at all.

If you go with the route that Matt suggested, you are going to get yourself into problems. Serious ones. But that isn’t what I recommend. I talked about this scenario specifically in this post. This is how you are supposed to set things up. In a way that doesn’t get in the way of the application. Everything is wired in the infrastructure, and we can just rely on that to be there. And in your controller, you have a Session property that get the current property, and that is it.

For bonus points, you can move your transaction handling there as well, so you don’t need to handle that either. It makes the code so much easier to work with, because you don’t care about all those external concerns, they are handled elsewhere.

time to read 3 min | 421 words

With regards to my recommendation to not use repositories, Remyvd asks:

… if you have several kind of data sources in different technologies, then it would be nice if you have one kind of interface. Also when an object (like Customer) is combined from data out of different data sources, the repository is for me a good place to initialize the object and return it. How would you solve this cases?

My answer is: System.ArgumentException: Question assumes invalid state.

More fully, this is one of those times where, in order to actually answer the question, we have to correct the question. Why do I say that?

Well, the question makes the assumption that actually combining the customer entity out of different data stores is desirable. Having made that assumption, it proceed to see what is the best way to do that. I am not going to recommend a way to do that, because the underlying assumption is wrong.

If your Customer information is stored in multiple data stores, you have to ask yourself, is it actually the same thing in all places? For example, we may have Customer entity in our main database, Customer Billing History in the billing database, Customer credit report accessible over a web service, etc. Note what happens when we start actually drilling down into the entity design. It suddenly becomes clear that that information is in different data stores for a reason.

Those aren’t the druids you are looking for might be a good quote here. The fact that the information is split usually means that there is a reason for that. The information is handled differently, usually by different teams and applications, it deals with different aspects of the entity, etc.

Trying to abstract that away behind a repository layer loses that very important distinction. It also forces us to do a lot of additional work, because we have to load the customer entity from all of the different data stores every time we need it. Even if most of the data that we need is not relevant for the operation at hand.

If would be much easier, simpler and maintainable to actually expose the idea of the multiple data stores to the application at large. You don’t end up with a leaky abstraction and it is easy to see when and how you actually need to combine the different data stores, and what the implications of that are for the specific scenarios that requires it.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}