Ayende @ Rahien

Refunds available at head office

RavenDB on .NET Rocks

Carl and Richard talk to Oren Eini, aka Ayende Rahein, about RavenDB. RavenDB is a NoSQL JSON document database. Oren explains how he came to the realization that he needed to build his own data store, and the advantages of document databases over relational databases. Is SQL dead? Not hardly, but RavenDB is an interesting addition to your data solution!

You can listen to it here.

The NHibernate Course is arrived to Paris

My NHibernate Course (4 days of digging into the heart & core of NHibernate) is arriving to Paris.

On the 18th April Sebastien Lambla will be teaching this course (in French).

During the course we build a practical application together, that demonstrates all important data management patterns in Nhibernate. You will learn how to use this O/R mapping tool efficiently in your applications to save time and effort on communicating with database storage.

Click here to register.

Case Study: the first RavenDB’s production deployment story

When I sat down over some years ago to actually decide how to go about dealing with that document database idea that wouldn’t let me sleep, I did several things. One of them was draw down the architecture of the system, and how to approach the technical challenges that we were seeing. But a much more interesting aspect was deciding to make this into a commercial product. One of the things that I did there was build a business plan for RavenDB. According to that business plan, we were supposed to hit the first production deployment of RavenDB round about now.

In practice, adoption of RavenDB has far outstripped my hopes, and we actually had production deployments for RavenDB within a few months of the public release. What actually took longer than deploying RavenDB was getting the stories back about it Smile. I am going to start posting those as soon as I get the authorization for the customers to do so.

The following is from the first production deployment of RavenDB…


1.        Who are you? (Name, company, position)

Henning Christiansen, Senior Consultant at Webstep Fokus, Norway (www.webstep.no) I am working on a development team for a client in the financial sector.

2.        In what kind of project / environment did you deploy RavenDB?

The client is, naturally, very focused on delivering solutions and value to the business at a high pace. The solutions we build are both for internal and external end users. On top of the result oriented environment, there is a heavy focus on building sustainable and maintainable solutions. It is crucial that modules can be changed, removed or added in the future.

In this particular project, we built a solution based on NServiceBus for communication and RavenDB for persistence. This project is part of a larger development effort, and integrates with both old and new systems. The project sounds unimpressive when described in short, but I'll give it a go:

The project's main purpose was to replace an existing system for distribution of financial analysis reports. Analysts/researchers work on reports, and submit them to a proprietary system which adds additional content such as tables and graphs of relevant financial data, and generates the final report as XML and PDF. One of the systems created during this project is notified when a report is submitted, pulls it out of the proprietary system, tags it with relevant metadata and stores it in a RavenDB instance before notifying subscribing systems that a new report is available. The reports are instantly available on the client's website for customers with research access.

One of the subscribing systems is the distribution system which distributes the reports by email or sms depending on the customer's preferences. The customers have a very fine-grained control over their subscriptions, and can filter them on things such as sector, company, and report type among other things. The user preferences are stored in RavenDB. When a user changes preferences, notifications are given to other systems so that other actions can be performed based on what the customers are interested in.

3.        What made you select a NoSQL solution for your project?

We knew the data would be read a lot more than it would be written, and it needed to be read fast. A lot of the team members were heavily battle-scarred from struggling with ORMs in the past, and with a very tight deadline we weren't very interested in spending a lot of time maintaining schemas and mappings.

Most of what we would store in this project could be considered a read model (à la CQRS) or an aggregate root (DDD), so a NoSQL solution seemed like a perfect fit. Getting rid of the impedance mismatch couldn't hurt either. We had a lot of reasons that nudged us in the direction of NoSQL, so if it hadn't been RavenDB it would have been something else.

4.        What made you select RavenDB as the NoSQL solution?

It was the new kid on the block when the project started, and had a few very compelling features such as a native .NET API which is maintained and shipped with the database itself. Another thing was the transaction support. A few of us had played a bit with RavenDB, and compared to other NoSQL solutions it seemed like the most hassle-free way to get up and running in a .NET environment. We were of course worried about RavenDB being in an early development stage and without reference projects, so we had a plan B in case we should hit a roadblock with RavenDB. Plan B was never put into action.

5.        How did you discover RavenDB?

I subscribe to your blog (http://ayende.com/Blog/) :)

There was a lot of fuss about NoSQL at the time, and RavenDB received numerous positive comments on Twitter and in blog posts.

6.        How long did it take to learn to use RavenDB?

I assume you're asking about the basics, as there's a lot to be learnt if you want to. What the team struggled with the most with was indexes. This was before dynamic indexes, so we had to define every index up front, and make sure they were in the database before querying. Breaking free from the RDBMS mindset and wrapping our heads around how indexes work, map/reduce, and how and when to apply the different analyzers took some time, and the documentation was quite sparse back then. The good thing is that you don't really need to know a lot about this stuff to be able to use RavenDB on a basic level.

The team members were differently exposed to RavenDB, so guessing at how long it took to learn is hard. But in general I think it's fair to say that indexes was the team's biggest hurdle when learning how to use RavenDB.

7.        What are you doing with RavenDB?

On this particular project we weren't using a lot of features, as we were learning to use RavenDB while racing to meet a deadline.

Aside from storing documents, we use custom and dynamic indexes, projections, the client API, and transactions. We're also doing some hand-rolled Lucene queries.

On newer projects however, with more experience and confidence with RavenDB, and as features and bundles keep on coming, we're doing our best to keep up with the development and making the best use of RavenDB's features to solve our problems.

8.        What was the experience, compared to other technologies?

For one thing, getting up and running with RavenDB is super-easy and only takes a couple of minutes. This is very different from the RDBMS+ORM experience, which in comparison seems like a huge hassle. Working with an immature and rapidly changing domain model was also a lot easier, as we didn't need to maintain mappings. Also, since everything is a document, which in turn easily maps to an object, you're sort of forced to always work through your aggregate roots. This requires you to think through your domain model perhaps a bit more carefully than you'd do with other technologies which might easier allow you to take shortcuts, and thus compromise your domain model.

9.        What do you consider to be RavenDB strengths?

It's fast, easy to get started with, and it has a growing community of helpful and enthusiastic users. Our support experience has also been excellent, any issues we've had have usually been fixed within hours. The native .NET API is also a huge benefit if you're working in a .NET environment

10.        What do you consider to be RavenDB weaknesses?

If we're comparing apples to apples, I can't think of any weaknesses compared to the other NoSQL solutions out there aside from the fact that it's new. Hence it's not as heavily tested in production environments as some of the older NoSQL alternatives might be. The relatively limited documentation, which admittedly has improved tremendously over the last few months, was also a challenge. The community is very helpful, so anything you can't find in the documentation can normally be answered by someone on the forum. There's also a lot of blog posts and example applications out there.

I find the current web admin UI a bit lacking in functionality, but hopefully the new Silverlight UI will take care of that.

11.        Now that you are in production, do you think that choosing RavenDB was the right choice?

Yes, definitely. We've had a few pains and issues along the way, but that's the price you have to pay for being an early adopter. They were all quickly sorted out, and now everything's been ticking along like clockwork for months. I'm confident that choosing RavenDB over another persistence technology has allowed us to develop faster and spend more time on the problem at hand.

12.        What would you tell other developers who are evaluating RavenDB?

I have little experience with other document databases, but obviously tested a bit and read blog posts when evaluating NoSQL solutions for this project. Since we decided to go with RavenDB there's been a tremendous amount of development done, and at this time none of the competitors are even close featurewise.

Tags:

Published at

Originally posted at

Comments (8)

Looking at the RavenDB structure

Originally posted at 3/15/2011

I am trying to explain how RavenDB is structured, and we finally generated the following diagram:

image

I was actually surprised, because I don’t generally think about it this way, but having the diagram in front of us was really helpful in explain where everything goes.

And, of course, the inevitable nitpick: OMG it takes a loooong time to layout the graph.

There is no one correct way

One of the major complaints about my recent series of code reviews is that I only say what you shouldn’t do, not what you should.

Well, this blog currently have 4,782 posts, many of which are actually talking about how you should design systems. I am sorry, but I really can’t repeat everything for each post.

I can specifically recommend the following series:

But really you are going to have to go over the archives and look. I am pretty good at assigning categories, and you might want to start with the Design category.

If you want to see how I write code, at last count, I had quite a few projects here: https://github.com/ayende

And last, but certainly not least, I can’t really tell you what you should do. There are too many factors involved to be able to do that honestly. People who assume that there is one correct way to do something also assumes that all projects are the same, with the same requirements, constraints and resources. That is not the case, and I try very hard not to provide any prescriptive design out of thin air.

Scenes from the office

cross posted from the company’s blog

Posted by Ayende

That is me, working on a new feature for RavenDB:

WP_000167

Itamar is busy working on performance improvement for RavenDB (we are trying to avoid writing our own Json parser):

image

Alon is thinking about the website, it seems:

image

And Fitzchak is working on better Oracle support the secret feature for UberProf:

image

The wages of sin: Hit that database one more time…

Originally posted at 3/11/2011

This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it.

You might have wondered why I named this blog series the way I did, I named it because of the method outline below. Please note that I had to invent a new system to visualize the data access behavior on this system:

image

  • In red, we have queries that are executed once: 3 queries total.
  • In aqua, we have queries that are executed once for each item in the order: 2 queries per each product in the order.
  • In purple, we have queries that are executed once for each attribute in each of the products in the order: 1 query per attribute per product in the order.

Now, just to give you some idea, let us say that I order 5 items, and each item have 5 attributes…

We get the following queries:

  • 3 queries – basic method cost
  • 10 queries – 2 queries per each product
  • 25 queries – 1 query for each attribute for each product

Totaling in 38 queries for creating a fairly simple order. After seeing this during the course review of the application, I have used that term for this method, because it is almost too easy to make those sort of mistakes.

Tags:

Published at

Originally posted at

Comments (14)

The wages of sin: Inverse leaky abstractions

This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it.

As you might have noticed, I am pretty big fan of the NHibernate Futures feature (might be because I wrote it Smile), so I was encouraged when I saw it used in the project. It can drastically help the performance of the system.

Except, that then I saw how it is being used.

image

Which is then called from CurrencyService:

image

Which is then called…

image

Do you see that? We take the nice future query that we have, and turn it into a standard query, blithely crashing any chance to actually optimize the data access of the system.

The problem is that from the service API, there is literally nothing to tell you that you need to order your calls so all the data access happens up front, and it is easy to make mistakes such as this. The problem is that there is meaning lost at any additional abstraction layer, and pretty soon whatever good intention you had when you wrote a particular layer, 7 layers removed, you can’t really remember what is the best way to approach something.

Tags:

Published at

Originally posted at

Comments (29)

The wages of sin: Proper and improper usage of abstracting an OR/M

Originally posted at 3/11/2011

This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it.

In this case, I want to focus on the ProductRepository:

image

In particular, those methods also participate in Useless Abstraction For The Sake Of Abstraction Anti Pattern. Here is how they are implemented:

public AttributeItem GetAttributeItem(int attributeItemId)
{
    return Session.Get<AttributeItem>(attributeItemId);
}

public Attribute GetAttribute(int attrbuteId)
{
    return Session.Get<Attribute>(attrbuteId);
}

public IEnumerable<Attribute> GetAllAttributes()
{
    return Session.QueryOver<Attribute>()
        .Future<Attribute>();
}

public void SaveOrUpdate(Attribute attribute)
{
    Session.SaveOrUpdate(attribute);
}

And here is how they are called (from ProductService):

public AttributeItem GetAttributeItem(int attributeItemId)
{
    return productRepository.GetAttributeItem(attributeItemId);
}

public Attribute GetAttribute(int attrbuteId)
{
    return productRepository.GetAttribute(attrbuteId);
}

public void SaveAttribute(Attribute attribute)
{
    productRepository.SaveOrUpdate(attribute);
}

 public IList<Product> GetProducts()
 {
     return productRepository.GetAll();
 }

 public Product GetProduct(int id)
 {
     return productRepository.Get(id);
 }

 public void SaveOrUpdate(Product product)
 {
     productRepository.SaveOrUpdate(product);
 }

 public void Delete(Product product)
 {
     productRepository.Delete(product);
 }

 public IEnumerable<Attribute> GetAllAttributes()
 {
     return productRepository.GetAllAttributes();
 }

Um… why exactly?

But as I mentioned, this post is also about the proper usage of abstracting the OR/M. A repository was originally conceived as a to abstract away messy data access code into nicer to use code. The product repository have one method that actually do something meaningful, the Search method:

public IEnumerable<Product> Search(IProductSearchCriteria searchParameters, out int count)
{
    string query = string.Empty;
    if (searchParameters.CategoryId.HasValue && searchParameters.CategoryId.Value > 0)
    {
        var categoryIds = (from c in Session.Query<Category>()
                           from a in c.Descendants
                           where c.Id == searchParameters.CategoryId
                           select a.Id).ToList();

        query = "Categories.Id :" + searchParameters.CategoryId;
        foreach (int categoryId in categoryIds)
        {
            query += " OR Categories.Id :" + categoryId;
        }
    }

    if (!string.IsNullOrEmpty(searchParameters.Keywords))
    {
        if (query.Length > 0)
            query += " AND ";

        query += string.Format("Name :{0} OR Description :{0}", searchParameters.Keywords);
    }

    if (query.Length > 0)
    {
        query += string.Format(" AND IsLive :{0} AND IsDeleted :{1}", true, false);

        var countQuery = global::NHibernate.Search.Search.CreateFullTextSession(Session)
            .CreateFullTextQuery<Product>(query);

        var fullTextQuery = global::NHibernate.Search.Search.CreateFullTextSession(Session)
            .CreateFullTextQuery<Product>(query)
            .SetFetchSize(searchParameters.MaxResults)
            .SetFirstResult(searchParameters.PageIndex * searchParameters.MaxResults);

        count = countQuery.ResultSize;

        return fullTextQuery.List<Product>();
    }
    else
    {
        var results = Session.CreateCriteria<Product>()
            .Add(Restrictions.Eq("IsLive", true))
            .Add(Restrictions.Eq("IsDeleted", false))
            .SetFetchSize(searchParameters.MaxResults)
            .SetFirstResult(searchParameters.PageIndex * searchParameters.MaxResults)
            .Future<Product>();

        count = Session.CreateCriteria<Product>()
            .Add(Restrictions.Eq("IsLive", true))
            .Add(Restrictions.Eq("IsDeleted", false))
            .SetProjection(Projections.Count(Projections.Id()))
            .FutureValue<int>().Value;

        return results;
    }
}

I would quibble about whatever this is the best way to actually implement this method, but there is little doubt that something like this is messy. I would want to put this in a very distant corner of my code base, but it does provides a useful abstraction. I wouldn’t put it in a repository, though. I would probably put it in a Search Service instead, but that isn’t that important.

What is important is to understand where there is actually a big distinction between code that merely wrap code for the sake of increasing the abstraction level and code that provide some useful abstraction over an operation.

Tags:

Published at

Originally posted at

Comments (30)

The wages of sin: Re-creating the Stored Procedure API in C#

Originally posted at 3/10/2011

This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it. The problem is that this system seems to be drastically more complicated than it should be.

In this case, I want to look at the type of API that is exposed:

image

If this reminds you of the bad old days of having only Stored Procedure API available, that is not by chance. Far worst than that, however, is the call paths where this is used.

  • IEmailTemplateRepository.Get(EmailTemplateLookup) is implemented as
    public EmailTemplate Get(EmailTemplateLookup emailId)
    {
        return Get((int)emailId);
    }
    and is only used in:
    • EmailService.Get(EmailTemplateLookup) whose implementation is:
      public EmailTemplate GetEmail(EmailTemplateLookup template)
      {
          return emailTemplateRepository.Get(template);
      }
  • ICategoryRepository.GetParentCategories is only used from:
    • CategoryService.GetParentCategories which is implemented as:
      public IEnumerable<Category> GetParentCategories()
      {
          IEnumerable<Category> categories = categoryRepository.GetParentCategories();
      
          return categories;
      }
  • ICurrencyRepository.GetEnabledCurrencies is only used from:
    • CurrencyService.GetEnabledCurrencies which is implemented as:
      public IEnumerable<Currency> GetEnabledCurrencies()
      {
           return currencyRepository.GetEnabledCurrencies();
      }
  • For that matter, let us take a look at the entire CurrencyService class, shall we?

    public class CategoryService : ICategoryService
    {
        private readonly ICategoryRepository categoryRepository;
    
        public CategoryService(ICategoryRepository categoryRepository)
        {
            this.categoryRepository = categoryRepository;
        }
    
        public IList<Category> GetCategories()
        {
            return categoryRepository.GetAll();
        }
    
        public Category GetCategory(int id)
        {
            return categoryRepository.Get(id);
        }
    
        public void SaveOrUpdate(Category categoryModel)
        {
            categoryRepository.SaveOrUpdate(categoryModel);
        }
    
        public void Delete(Category category)
        {
            categoryRepository.Delete(category);
        }
    
        public IEnumerable<Category> GetParentCategories()
        {
            IEnumerable<Category> categories = categoryRepository.GetParentCategories();
    
            return categories;
        }
    }

    To be honest, I really don’t see the point.

    Now, just a hint on the next few posts, there are places where I think wrapping the usage of the NHibernate API was a good idea, even if I strongly disagree with how this was done.

    Tags:

    Published at

    Originally posted at

    Comments (18)

    The wages of sin: Over architecture in the real world

    Originally posted at 3/10/2011

    This time, this is a review of the Sharp Commerce application. Again, I have stumbled upon the application by pure chance, and I have very little notion about who wrote it. The problem is that this system seems to be drastically more complicated than it should be.

    I am going to focus  on different parts of the system in each of those posts. In this case, I want to focus on the very basis for the application data access:

    image

    Are you kidding me? This is before you sit down to write a single line of code, mind you. Just the abstract definitions for everything makes my head hurt.

    It really hits you over the head when you get this trivial implementation:

    public class EmailTemplateRepository : Repository<EmailTemplate>, IEmailTemplateRepository
    {
        public EmailTemplate Get(EmailTemplateLookup emailId)
        {
            return Get((int)emailId);
        }
    }

    Yes, this is the entire class.  I am sorry, but I really don’t see the point. The mental weight of all of this is literally crashing.

    Published at

    Originally posted at

    Comments (53)

    What am I so hung up on Unbounded Result Set?

    Originally posted at 3/10/2011

    You might have noticed that I am routinely pointing out that there are issues, drastic issues, with any piece of code that issues a query without setting a limit to the number of results.

    I got several replies saying that I am worrying too much about this issue. I don’t really understand that sort of thinking. Leaving aside the possibility of literally killing our application (as a result of Out of Memory Exception), unbounded result sets are dangerous. The main problem is that they aren’t just a theoretical problem, it is a problem that happens with regular intervals in production systems.

    The real issue is that it is not just System Down issues, this might be the best scenario, actually. In most production deployment, you are actually paying for the amount of data that you are passing around. When you start dealing with unbounded result set, you are literally writing an open check and handing it to strangers.

    I don’t know many people who can do something like this with equanimity.

    It doesn’t take a lot to get to the point where this sort of thing really hurts. Moreover, there are truly very few cases where you actually need to have access to the entire result set. For the most part, when I see developers doing that, it is usually out of sheer laziness.

    Architecting in the pit of doom: The evils of the repository abstraction layer

    Originally posted at 3/10/2011

    This is part of the series of posts about the Whiteboard Chat project. In this post, I am going to explore the data access methodology and why is it so painful to work with. In particular, I am going to focus on the problems arising in this method:

    [Transaction]
    [Authorize]
    [HttpPost]
    public ActionResult GetLatestPost(int boardId, string lastPost)
    {
        DateTime lastPostDateTime = DateTime.Parse(lastPost);
        
        IList<Post> posts = 
            _postRepository
            .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId))
            .OrderBy(x=>x.Id).ToList();
    
        //update the latest known post
        string lastKnownPost = posts.Count > 0 ? 
            posts.Max(x => x.Time).ToString()
            : lastPost; //no updates
        
        Mapper.CreateMap<Post, PostViewModel>()
            .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString()))
            .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name));
    
        UpdatePostViewModel update = new UpdatePostViewModel();
        update.Time = lastKnownPost; 
        Mapper.Map(posts, update.Posts);
    
        return Json(update);
    }

    You can clearly see that there was a lot of though dedicated to making sure that the architecture was done right. Here is the overall data access approach:

    image

    So are using the Specification & Repository patterns, which seems okay, at first glance.  Except that the whole system is seriously over engineered, and it doesn’t really provide you with any benefits. The whole purpose of a repository is to provide an in memory collection interface to a data source, but that pattern was established at a time where the newest thing on the block was raw SQL calls. Consider what NHibernate is doing, and you can see that here, one implementation of an in memory collection interface on top of a data store is wrapped in another, and it does nothing for us except add additional code and complexity.

    As an aside, even ignoring the uselessness of the repository pattern here, I really don’t like that the FindOne() method returns an IQueryable<T>, and I don’t really see a reason for that there.

    Then there is the issue of the Specification itself. It is not quite pulling its own weight. To be frank, it is actually sinking down the entire ship, rather than helping any. For example, let us look at the GetPostLastestForBoardById implementation:

    public class GetPostLastestForBoardById : Specification<Post>
    {
        private readonly DateTime _lastPost;
        private readonly int _boardId;
    
        public GetPostLastestForBoardById(DateTime lastPost, int boardId)
        {
            _lastPost = lastPost;
            _boardId = boardId;
        }
    
        public override Expression<Func<Post, bool>> MatchingCriteria
        {
            get
            {
                return x =>
                    x.Time > _lastPost &&
                    x.Board.Id == _boardId;
            }
        }
    }

    Are you kidding me, all of this infrastructure, all of those nested generics, for something like this? Is this really something that deserves its own class? I don’t think so.

    Worse than that, this sort of design implies that there is a value in sharing queries among different parts of the application. And the problem with that sort of thinking is that this premise is usually false. Except for querying by id, most queries in an application tend to be fairly unique to their source, and even when they are the actual physical query, they usually have different logical meaning. For example, “select all the late rentals” is a query that we might use as part of the rental system home screen, and also as part of the business logic to charge people for late rentals. The problem is that while on the surface, this query is the same, the meaning of what we are doing is usually drastically different. And because of that, trying to reuse that query in some manner is actually going to cause us problems. We have a single piece of code that now have two masters, and two reasons for change. That is not a good place to be. You’ll probably have to touch that piece of code too often, and have to make it more complicated than it would have been because it has dual purposes.

    You can see an example of how this is problematic in the method above. In my last post, I pointed out that there is a SELECT N+1 issue in the code. For each of the loaded posts, we also load the Owner’s name. NHibernate provides easy ways to fix this issue, by simply specifying what is the fetch paths that we want to use.

    Except… this architecture make is drastically harder to make something like that. The repository abstraction layer hides the NHibernate API that allows you to specify the fetch paths. So you are left with a few options. The most common approach for this issue is usually to re-architect specifications so that they can also provide the fetch path for the query. It looks something like this:

    public class GetPostLastestForBoardById : Specification<Post>
    {
        private readonly DateTime _lastPost;
        private readonly int _boardId;
    
        public GetPostLastestForBoardById(DateTime lastPost, int boardId)
        {
            _lastPost = lastPost;
            _boardId = boardId;
        }
    
        public override Expression<Func<Post, bool>> MatchingCriteria
        {
            get
            {
                return x =>
                    x.Time > _lastPost &&
                    x.Board.Id == _boardId;
            }
        }
        
        public override IEnumerable<Expression<Func<Post, bool>>> FetchPaths
        {
          get
          {
              yield return x => x.Owner;
          }
        }
    }

    Note that this design allows you to specify multiple fetch paths, which seems to fix the issue that we had in our API.  Expect that it doesn’t really work, even identical queries near one another (say, in different actions of the same controller) usually have different fetching requirements.

    Oh, and we haven’t even talked about the need to handle projections.

    Okay, so we can do it differently, we can let developer the option to configure the fetch paths for each query at the location where that query is made. Of course, you then get the following architecture:

    image

    If you think that this is ridiculous, I agree. If you don’t think so… well, never mind, I was told I can’t actually revoke someone’s Touch-The-Keyboard license.

    But let us say that you have solved even that problem somehow, probably by adding yet another abstraction layer on specifying fetch paths. At that point, your code is so abstract, it deserve being put in the Museum for Modern Art, but I’ll let you get away with it.

    The problem is that fetch paths are actually only part of the solution for things like Select N+1. One very powerful option that we have with NHibernate is the use of Future Queries, and we can utilize that feature to significantly reduce both the number of time that we have to go to the database and the size of the data that we have to read from it.

    Except… this design means that I am pretty much unable to implement this sort of  behavior. I will have to drastically change the system design before I can actually do a meaningful performance improvement.

    I see this sort of problem all the time when I am doing code reviews. And the problem with this is that it is usually a stop ship issue because we literally cannot fix things properly without doing a lot of very painful changes.

    Now, allow me to explain the underlying logic behind this post. Reading from the database is a common operation, and it should be treated as such. There should be very few abstractions involved, because there isn’t much to abstract. For the vast majority of cases, there is no logic involved in read operations, and even when there is, it is most often cross cutting logic. The answer that I give for things like "well, so how can I apply security filtering” is that you throw that into an extension method and do something like this:

    var query = 
      (
            from post in session.Query<Post>()
            where post.DueDate > DateTime.Now 
            select post
       )
       .FilterAccordingToUserPermissions(CurrentUser)
       .ToList();

    This make it very easy to review such queries and see that the appropriate actions where taken, and this type of action is only required when you can’t handle this as truly a part of the infrastructure.

    It is also important that I am talking specifically about read operations here. For write operations, the situation is (slightly) different. Write operations usually have business logic, and you want to keep that separate. I would still want to avoid doing anything too abstract, and I would certainly not introduce a repository for writes.

    What I would do is either create a service layer for writes that handle the actual business of doing writes, but (and this is important), I would only do so if there is actually logic to be executed. If this is merely a simple CUD operation that we are executing, there is really very little point to going with a service. Yes, that means using NHibernate directly from the controller method to save whatever it is that you are editing, assuming that there is nothing that requires business logic there.

    To be frank, for the pieces that actually require business logic, I would probably prefer to avoid an explicit service implementation and move to using an internal publish / subscribe (event publisher) in order to have a nicer internal architecture. But at that point, it is more of a preference than anything else.

    To summarize, do try to avoid needless complexity. Getting data from the database is a common operation, and should be treated as such. Adding additional layers of abstractions usually only make it hard. Worse, it creates, in your own application code, the sort of procedural operations that made it so hard to work with systems using Stored Procedures.

    You want to avoid that, it was a bad time, and we don’t want to go back there. But code such as the above is literally something that you would write for those sort of systems, where you have only the pre-defined paths for accessing the data store.

    It is no longer the case, and it is time that our code started acknowledging this fact.

    Published at

    Originally posted at

    Comments (64)

    Reviewing OSS Project: Whiteboard Chat–The Select N+1 issue

    As a reminder, I am reviewing the problems that I found while reviewing the Whiteboard Chat project during one of my NHibernate’s Courses. Here is the method:

    [Transaction]
    [Authorize]
    [HttpPost]
    public ActionResult GetLatestPost(int boardId, string lastPost)
    {
        DateTime lastPostDateTime = DateTime.Parse(lastPost);
        
        IList<Post> posts = 
            _postRepository
            .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId))
            .OrderBy(x=>x.Id).ToList();
    
        //update the latest known post
        string lastKnownPost = posts.Count > 0 ? 
            posts.Max(x => x.Time).ToString()
            : lastPost; //no updates
        
        Mapper.CreateMap<Post, PostViewModel>()
            .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString()))
            .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name));
    
        UpdatePostViewModel update = new UpdatePostViewModel();
        update.Time = lastKnownPost; 
        Mapper.Map(posts, update.Posts);
    
        return Json(update);
    }

    In this post, I am going to discuss the SELECT N+1 issue that exists in this method.

    Can you see the issue? It is actually hard to figure out.

    Yep, it is in the Mapper.Map call, for each post that we return, we are going to load the post’s owner, so we can provide its name.

    So far, I have had exactly 100% success rate in finding SELECT N+1 issues in any application that I reviewed. To be fair, that also include applications that I wrote.

    And now we get to the real point of this blog post. How do you fix this?

    Well, if you were using plain NHibernate, that would have been as easy as adding a Fetch call. But since this application has gone overboard with adopting repository and query objects mode, it is actually not an issue of just fixing this.

    Welcome to the re-architecting phase of the project, because your code cannot be fixed to work efficiently.

    I’ll discuss this in more details in my next post…

    Code review ranking methods

    Originally posted at 3/8/2011

    I use a sort of a ranking sheet when I am doing code reviews. I recently run into one that was really horrible…

    image

    Tags:

    Published at

    Originally posted at

    Comments (6)

    Reviewing OSS Project: Whiteboard Chat–Unbounded Result Sets and Denial of Service Attacks

    Originally posted at 3/8/2011

    As a reminder, I am reviewing the problems that I found while reviewing the Whiteboard Chat project during one of my NHibernate’s Courses. Here is the method:

    [Transaction]
    [Authorize]
    [HttpPost]
    public ActionResult GetLatestPost(int boardId, string lastPost)
    {
        DateTime lastPostDateTime = DateTime.Parse(lastPost);
        
        IList<Post> posts = 
            _postRepository
            .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId))
            .OrderBy(x=>x.Id).ToList();
    
        //update the latest known post
        string lastKnownPost = posts.Count > 0 ? 
            posts.Max(x => x.Time).ToString()
            : lastPost; //no updates
        
        Mapper.CreateMap<Post, PostViewModel>()
            .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString()))
            .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name));
    
        UpdatePostViewModel update = new UpdatePostViewModel();
        update.Time = lastKnownPost; 
        Mapper.Map(posts, update.Posts);
    
        return Json(update);
    }

    In this post, I want to focus on a very common issue that I see over & over. The problem is that people usually don’t notice those sort of issues.

    The problem is quite simple, there is no limit to the amount of information that we can request from this method. What this mean is that we can send it 1900-01-01 as the date, and force the application to get all the posts in the board.

    Assuming even a relatively busy board, we are talking about tens or hundreds of thousands of posts that are going to be loaded. That is going to put a lot of pressure on the database, on the server memory, and on the amount of money that you’ll pay in the end of the month for the network bandwidth.

    There is a reason why I strongly recommend to always use a limit, especially in cases like this, where it is practically shouting at you that the number of items can be very big.

    On the next post, we will analyze the SELECT N+1 issue that I found in this method (so far, my record is 100% success in finding this type of issue in any application that I reviewed)…

    NHibernate Profiler update: Client Profile & NHibernate 3.x updates

    We just finished a pretty major change to how the NHibernate Profiler interacts with NHibernate. That change was mostly driven out of the desire to fully support running under the client profile and to allow us to support the new logging infrastructure in NHibernate 3.x.

    The good news, this is done Smile, you can now use NH Prof from the client profile, and you don’t need to do anything to make it work for NHibernate 3.x.

    The slightly bad news is that if you were relying on log4net conifguration to configure NH Prof, there is a breaking change that affects you. Basically, you need to update your configuration. You can find the full details on how to do this in the documentation.

    Reviewing OSS Project: Whiteboard Chat–setup belongs in the initializer

    Originally posted at 3/8/2011

    As a reminder, I am reviewing the problems that I found while reviewing the Whiteboard Chat project during one of my NHibernate’s Courses. Here is the method:

    [Transaction]
    [Authorize]
    [HttpPost]
    public ActionResult GetLatestPost(int boardId, string lastPost)
    {
        DateTime lastPostDateTime = DateTime.Parse(lastPost);
        
        IList<Post> posts = 
            _postRepository
            .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId))
            .OrderBy(x=>x.Id).ToList();
    
        //update the latest known post
        string lastKnownPost = posts.Count > 0 ? 
            posts.Max(x => x.Time).ToString()
            : lastPost; //no updates
        
        Mapper.CreateMap<Post, PostViewModel>()
            .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString()))
            .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name));
    
        UpdatePostViewModel update = new UpdatePostViewModel();
        update.Time = lastKnownPost; 
        Mapper.Map(posts, update.Posts);
    
        return Json(update);
    }

    In this post, I want to focus on the usage of AutoMapper, in the form of Mapper.CreateMap() call.

    It was fairly confusing to figure out why that was going on there. In fact, since I am not a regular user of AutoMapper, I assumed that this was the correct way of doing things, which bothered me. And then I looked deeper, and figured out just how troubling this thing is.

    Usage of Mapper.CreateMap is part of the system initialization, in general. Putting it inside the controller method result in quite a few problems…

    To start with, we are drastically violating the Single Responsibility Principle, since configuring the Auto Mapper really have very little to do with serving the latest posts.

    But it actually gets worse, Auto Mapper assumes that you’ll make those calls at system initialization, and there is no attempt to make sure that those static method calls are thread safe.

    In other words, you can only run the code in this action in a single thread. Once you start running it on multiple thread, you are open to race conditions, undefined behavior and plain out weirdness.

    On the next post, I’ll focus on the security vulnerability that exists in this method, can you find it?

    Reviewing OSS Project: Whitboard Chat–overall design

    As a reminder, I am reviewing the problems that I found while reviewing the Whiteboard Chat project during one of my NHibernate’s Courses. Here is the method:

    [Transaction]
    [Authorize]
    [HttpPost]
    public ActionResult GetLatestPost(int boardId, string lastPost)
    {
        DateTime lastPostDateTime = DateTime.Parse(lastPost);
        
        IList<Post> posts = 
            _postRepository
            .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId))
            .OrderBy(x=>x.Id).ToList();
    
        //update the latest known post
        string lastKnownPost = posts.Count > 0 ? 
            posts.Max(x => x.Time).ToString()
            : lastPost; //no updates
        
        Mapper.CreateMap<Post, PostViewModel>()
            .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString()))
            .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name));
    
        UpdatePostViewModel update = new UpdatePostViewModel();
        update.Time = lastKnownPost; 
        Mapper.Map(posts, update.Posts);
    
        return Json(update);
    }

    In this post, I’ll just cover some of the easy to notice issues:

    The method name is GetLatestPost, but it is actually getting the latest posts (plural). Yes, it is minor, but it is annoying.

    Action filter attributes for authorization or transactions shouldn’t be there. To be rather more exact, the [Authorize] filter attribute shows up on all the methods, and transactions with NHibernate should always be used.

    Using them as attributes means that you can select to use them or not (and indeed, there are some actions where the [Transaction] attribute is not found).

    Stuff that is everywhere should be defined once, and you don’t have to re-define it all over the place.

    Reviewing OSS Project: Whiteboard Chat

    I am currently giving a course, and one of the things that we do during the course is put an OSS project on the board and analyze it. The project for this course is Whiteboard Chat Project.

    Overall, it seems to be a nice project, there are some problems, but most of them are “rich men’s problems”, the kind of problems that are sort of good to have. That said, I intend to write a series of blog posts on the following method:

    [Transaction]
    [Authorize]
    [HttpPost]
    public ActionResult GetLatestPost(int boardId, string lastPost)
    {
        DateTime lastPostDateTime = DateTime.Parse(lastPost);
        
        IList<Post> posts = 
            _postRepository
            .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId))
            .OrderBy(x=>x.Id).ToList();
    
        //update the latest known post
        string lastKnownPost = posts.Count > 0 ? 
            posts.Max(x => x.Time).ToString()
            : lastPost; //no updates
        
        Mapper.CreateMap<Post, PostViewModel>()
            .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString()))
            .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name));
    
        UpdatePostViewModel update = new UpdatePostViewModel();
        update.Time = lastKnownPost; 
        Mapper.Map(posts, update.Posts);
    
        return Json(update);
    }

    This method is a pretty good example of a multitude of anti patterns and other issues that annoys me.

    That said, I would like to clarify that I am merely using this method as an example of some bad issues, I wouldn’t want to give the impression that this is any sort of attack of the project or its authors. I have literally looked at the project for the first time today, and I haven’t even checked who the authors are.

    More on that, in the next posts, and in the meantime, I’ll let you figure out how many issues there are there that I am going to talk about…

    Support dynamic fields with NHibernate and .NET 4.0

    A common theme in many application is the need to support custom / dynamic fields. In other words, the system admin may decide that the Customer needs to have a few additional fields that aren’t part of the mainline development.

    In general, there are a few ways of handling that:

    • DateField1, DateField2, StringField1, StringField2, etc, etc – And heaven helps if you need more than 2 string fields.
    • Entity Attribute Value – store everything in a n EAV model, which basically means that you are going to have tables named: Tables, Rows, Attributes and Values.
    • Dynamically updating the schema.

    In general, I would recommend anyone that needs dynamic fields to work with a data storage solution that supports it (like RavenDB Smile, for example). But sometimes you have to use a relational database, and NHibernate has some really sweet solution.

    First, let us consider versioning. We are going to move all of the user’s custom fields to its own table. So we will have the Customers table and Customers_Extensions table. That way we are free to modify our own entity however we like it. Next, we want to allow nice syntax both for querying and for using it, even if there is custom code written against our code.

    We can do it using the following code:

    public class Customer
    {
        private readonly IDictionary attributes = new Hashtable();
        public virtual int Id { get; set; }
        public virtual string Name { get; set; }
    
        public virtual dynamic Attributes
        {
            get { return new HashtableDynamicObject(attributes);}
        }
    }

    Where HashtableDynamicObject is implemented as:

    public class HashtableDynamicObject : DynamicObject
    {
        private readonly IDictionary dictionary;
    
        public HashtableDynamicObject(IDictionary dictionary)
        {
            this.dictionary = dictionary;
        }
    
        public override bool  TryGetMember(GetMemberBinder binder, out object result)
        {
            result = dictionary[binder.Name];
            return dictionary.Contains(binder.Name);
        }
    
        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            dictionary[binder.Name] = value;
            return true;
        }
    
        public override bool TryGetIndex(GetIndexBinder binder, object[] indexes, out object result)
        {
            if (indexes == null)
                throw new ArgumentNullException("indexes");
            if (indexes.Length != 1)
                throw new ArgumentException("Only support a single indexer parameter", "indexes");
            result = dictionary[indexes[0]];
            return dictionary.Contains(indexes[0]);
        }
    
        public override bool TrySetIndex(SetIndexBinder binder, object[] indexes, object value)
        {
    
            if (indexes == null)
                throw new ArgumentNullException("indexes");
            if (indexes.Length != 1)
                throw new ArgumentException("Only support a single indexer parameter", "indexes");
            dictionary[indexes[0]] = value;
            return true;
        }
    }

    This is fairly basic so far, and not really interesting. We expose a hashtable as the entry point for a dynamic object that exposes all the dynamic fields. The really interesting part happens in the NHibernate mapping:

    <class name="Customer"
                 table="Customers">
    
        <id name="Id">
            <generator class="identity"/>
        </id>
        <property name="Name" />
    
        <join table="Customers_Extensions" optional="false">
            <key column="CustomerId"/>
            <dynamic-component name="Attributes" access="field.lowercase">
                <property name="EyeColor" type="System.String"/>
            </dynamic-component>
        </join>
    </class>

    As you can see, we used both a <join/> and a <dynamic-component/> to do the work. We used the <join/> to move the fields to a separate table, and then mapped those fields via a <dynamic-component/>, which is exposed an IDictionary.

    Since we want to allow nicer API usage, we don’t expose the IDictionary directly, but rather expose a dynamic object that provides us with a nicer syntax.

    The following code:

    using (var session = sessionFactory.OpenSession())
    using (var tx = session.BeginTransaction())
    {
        session.Save(
            new Customer
            {
                Name = "Ayende",
                Attributes =
                {
                    EyeColor = "Brown"
                }
            });
        
        tx.Commit();
    }

    Will produce the following SQL:

    image

    And that is quite a nice solution all around Open-mouthed smile

    New Profiler Feature: Avoid Writes from Multiple Sessions In The Same Request

    Because I keep getting asked, this feature is available for the following profilers:

    This new feature detects a very interesting bad practice, write to the database from multiple session in the same web request.

    For example, consider the following code:

    public void SaveAccount(Account account)
    {
        using(var session = sessionFactory.OpenSession())
        using(session.BeginTransaction())
        {
               session.SaveOrUpdate(account);
               session.Transaction.Commit();    
        }
    }
    public Account GetAccount(int id)
    {
        using(var session = sessionFactory.OpenSession())
        {
            return session.Get<Account>(id);
        }
    }

    It is bad for several reasons, micro managing the session is just one of them, but the worst part is yet to come…

    public void MakePayment(int fromAccount, int toAccount, decimal ammount)
    {
        var from = Dao.GetAccount(fromAccount);
        var to = Dao.GetAccount(toAccount);
        from.Total -= amount;
        to.Total += amount;
        Dao.SaveAccount(from);
        Dao.SaveAccount(to);
    }

    Do you see the error here? There are actually several, let me count them:

    • We are using 4 different connections to the database in a single method.
    • We don’t have transactional safety!!!!

    Think about it, if the server crashed between the fifth and sixth lines of this method, where would we be?

    We would be in that wonderful land where money disappear into thin air and we stare at that lovely lawsuit folder and then jump from a high window to a stormy sea.

    Or, of course, you could use the profiler, which will tell you that you are doing something which should be avoided:

    image

    Isn’t that better than swimming with the sharks?