Ayende @ Rahien

It's a girl

Ask Ayende: What about the QA env?

Matthew Bonig asks, with regards to a bug in RavenDB MVC Integration (RavenDB Profiler) that caused major slow down on this blog.:

I'd be very curious to know how this code got published to a production environment without getting caught. I would have thought this problem would have occurred in any testing environment as well as it did here. Ayende, can you comment on where the process broke down and how such an obvious bug was able to slip through?

Well, the answer for that comes in two parts. The first part  is that no process broke down. We use our own assets for final testing of all our software, that means that whenever there is a stable RavenDB release pending (and sometimes just when we feel like it) we move our infrastructure to the latest and greatest.

Why?

Because as hard as you try testing, you will never be able to catch everything. Production is the final test ground, and we have obvious incentives of trying to make sure that everything works.  It is dogfooding, basically. Except that if we get a lemon, that is a very public one.

It means that whenever we make a stable release, we can do that with high degree of confidence that everything is going to work, not just because all the tests are passing, but because our production systems had days to actually see if things are right.

The second part of this answer is that this is neither an obvious bug nor one that is easy to catch. Put simply, things worked. There wasn’t even an infinite loop that would make it obvious that something is wrong, it is just that there was a lot of network traffic that you would notice only if you either had a tracer running, or were trying to figure out why the browser was suddenly so busy.

Here is a challenge, try to devise some form of an automated test that would catch something like this error, but do so without actually testing for this specific issue. After all, it is unlikely that someone would have written a test for this unless they run into the error in the first place. So I would be really interested in seeing what sort of automated approaches would have caught that.

Tags:

Published at

Originally posted at

Comments (19)

Bug Hunt: What made this blog slow?

A while ago the blog start taking 100% CPU on the client machines. Obviously we were doing something very wrong there, but what exactly was it?

We tracked down the problem to the following code:

image_thumb

image_thumb[1]

As you can probably guess, the problem is that we have what is effective an infinite loop. On any Ajax request, we will generate a new Ajax request. And that applies to our own requests as well.

The fix was pretty obvious when we figured out what was going on, but until then…

image

Tags:

Published at

Originally posted at

Comments (14)

Bug Hunt: What made this blog slow?

A while ago the blog start taking 100% CPU on the client machines. Obviously we were doing something very wrong there, but what exactly was it?

We track down the problem to the following code, can you figure out what the problem?

image

image

Tags:

Published at

Originally posted at

Comments (11)

Northwind Starter Kit Review: Conclusion

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

A while ago I said:

Seriously?!  22(!) projects to do a sample application using Northwind?

And people took me up to task about it. The criticism was mostly focused on two parts:

  • I didn’t get that the project wasn’t about Northwind, but about being a sample app for architectural design patterns.
  • I couldn’t actually decide that a project was bad simply by looking at the project structure and some minor code browsing.

I am sad to say that after taking a detailed look at the code, I am even more firmly back at my original conclusion.  I started to do a review of the UI code, but there really is no real need to do so.

The entire project, as I said in the beginning, is supposed to be a sample application for Northwind. Northwind is a CRUD application. Well, not exactly, it is supposed to be an example of an Online Store, which is something much bigger than just Northwind. But it isn’t.

Say what you will, the Northwind Starter Kit is a CRUD application. It does exactly that, and nothing else. It does so in an incredibly complicated fashion, mind, but that is what it does.

Well, it doesn’t do updates, or deletes, or creates. So it is just an R application (I certainly consider the codebase to be R rated, not for impressionable developers).

If you want to have a sample application to show off architectural ideas, make sure that the application can actually, you know, show them. The only thing that NSK does is loading stuff from the database, try as I might, I found no real piece of business logic, no any reason why it is so complicated.

So, to the guys who commented on that, it isn’t a good project. If you like it, I am happy for you, there are also people who loves this guy:

Personally, I would call pest control.

Ask Ayende: Handling filtering

With regards to my quests against repositories, Matt asks:

…if my aggregate root query should exclude entities that have, for example, and IsActive = false flag, I also don't want to repeatedly exclude the IsActive = false entities. Using the repository pattern I can expose my Get method where internally it ALWAYS does this.

The problem with this question is that it make a false assumption, then go ahead and follow on that false assumption. The false assumption here is that the only way to handle the IsActive = false in by directly querying that. But that is wrong.

With NHibernate, you can define that with a where condition, or as a filter. With RavenDB, you can define that inside a query listener. You can absolutely set those things up as part of your infrastructure, and you won’t need to create any abstractions for that.

Tags:

Published at

Originally posted at

Comments (17)

Northwind Starter Kit Review: That CQRS thing

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

It is obvious from reading the code that there was some attention given to CQRS. Unfortunately, I can’t really figure out what for.

To start with, both the Read Model and the Domain Model are actually sitting on the same physical location. If you are doing that, there is a 95% chance that you don’t need CQRS. If you have that, you are going to waste a lot of time and effort and are very unlikely to get anything from it.

In the case of NSK, here is the domain model vs. the read model for the customer.

imageimage

I marked the difference.

I am sorry, there is nothing that justify a different model here. Just needless complexity.

Remember, our job is to make things simpler, not make it hard to work with the application.

Northwind Starter Kit Review: It is all about the services

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

Okay, enough about the data access parts. Let us see take a look at a few of the other things that are going on in the application. In particular, this is supposed to be an application with…

Domain logic is implemented by means of a Domain Model, onto a layer of services adds application logic. The model is persisted by a DAL designed around the principles of the "Repository" patterns, which has been implemented in a LINQ-friendly way.

Let us try to figure this out one at a time, okay?

The only method in the domain model that have even a hint of domain logic is the CalculateTotalIncome method. Yes, you got it right, that is a method, as in singular. And that method should be replaced with a query, it has no business being on the domain model.

So let us move to the services, okay? Here are the service definitions in the entire project:

image

Look at the methods carefully. Do you see any action at all? You don’t, the entire thing is just about queries.

And queries should be simple, not abstracted away and made very hard to figure out.

The rule of the thumb is that you try hard to not abstract queries, it is operations that you try to abstract. Operations is where you usually actually find the business logic.

Northwind Starter Kit Review: From start to finishing–tracing a request

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

One of the things that I repeatedly call out is the forwarding type of architecture, a simple operation that is hidden away by a large number of abstractions that serves no real purpose.

Instead of a controller, let us look at a web service, just to make things slightly different. We have the following:

image

Okay, let us dig deeper:

image

I really like the fact that this repository actually have have FindById method, which this service promptly ignores in favor of using the IQueryable<Customer> implementation. If you want to know how that is implemented, just look (using the EF Code First repository implementations, the others are fairly similar):

image

 

All in all, the entire thing only serves to make things harder to understand and maintain.

Does anyone really think that this abstraction adds anything? What is the point?!

Ask Ayende: Life without repositories, are they worth living?

With regards to my quests against repositories, Matt asks:

For example, you dismiss the repository pattern, but what are the alternatives? For example, in an ASP.NET web application you have controllers. I do NOT want to see this code in my controllers:

var sessionFactory = CreateSessionFactory();

using (var session = sessionFactory.OpenSession()) { using (var transaction = session.BeginTransaction()) { // do a large amount of work

// save entities
session.SaveOrUpdate(myEntity);

transaction.Commit();

} }

That is ugly, repetitive code. I want in my service methods to Get, update, save, and not have to worry about the above.

This is a straw dummy. Set up the alternative as nasty and unattractive as possible, then call out the thing you have just set up as nasty and unattractive. It is a good tactic, except that this isn’t the alternative at all.

If you go with the route that Matt suggested, you are going to get yourself into problems. Serious ones. But that isn’t what I recommend. I talked about this scenario specifically in this post. This is how you are supposed to set things up. In a way that doesn’t get in the way of the application. Everything is wired in the infrastructure, and we can just rely on that to be there. And in your controller, you have a Session property that get the current property, and that is it.

For bonus points, you can move your transaction handling there as well, so you don’t need to handle that either. It makes the code so much easier to work with, because you don’t care about all those external concerns, they are handled elsewhere.

Tags:

Published at

Originally posted at

Comments (53)

Northwind Starter Kit Review: If you won’t respect the database, there will be pain

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

The database is usually a pretty important piece in your application, and it likes to remind you that it should be respected. If you don’t take care of that, it will make sure that there will be a lot of pain in your future. Case in point, let us look at this method:

image

It looks nice, it is certainly something that looks like a business service. So let us dig down and see how it works.

image

It seems like a nice thing, the code is clear, and beside the bug where you get 100% discount if you buy enough and the dissonance between the comment and the code, fairly clear. And it seems that we have service logic and entity logic, which is always nice.

Except that this piece of code issues the following queries (let us assume a customer with 50 orders).

1 Query to load the customer, line 34 in this code. And now let us look at line 35… what is actually going on here:

image

Okay, so we have an additional query for loading the customer’s orders. Let us dig deeper.

image

And for each order, we have another query for loading all of that order’s items. Does it gets worse?

image

Phew! I was worried here for a second.

But it turns out that we only have a Select N+2 here, where N is the number of orders that a customer has.

What do you want, calculating the discount for the order is complicated, it is supposed to take a lot of time. Of course, the entire thing can be expressed in SQL as:

SELECT 
  SUM((UnitPrice * Quantity) * (1 - Discount) Income
FROM OrderItems o
WHERE o.OrderID in (
  SELECT Id FROM Orders
  WHERE CustomerId = @CustomerId
)

But go ahead and try putting that optimization in. The architecture for the application will actively fight you on that.

Ask Ayende: Repository for abstracting multiple data sources?

With regards to my recommendation to not use repositories, Remyvd asks:

… if you have several kind of data sources in different technologies, then it would be nice if you have one kind of interface. Also when an object (like Customer) is combined from data out of different data sources, the repository is for me a good place to initialize the object and return it. How would you solve this cases?

My answer is: System.ArgumentException: Question assumes invalid state.

More fully, this is one of those times where, in order to actually answer the question, we have to correct the question. Why do I say that?

Well, the question makes the assumption that actually combining the customer entity out of different data stores is desirable. Having made that assumption, it proceed to see what is the best way to do that. I am not going to recommend a way to do that, because the underlying assumption is wrong.

If your Customer information is stored in multiple data stores, you have to ask yourself, is it actually the same thing in all places? For example, we may have Customer entity in our main database, Customer Billing History in the billing database, Customer credit report accessible over a web service, etc. Note what happens when we start actually drilling down into the entity design. It suddenly becomes clear that that information is in different data stores for a reason.

Those aren’t the druids you are looking for might be a good quote here. The fact that the information is split usually means that there is a reason for that. The information is handled differently, usually by different teams and applications, it deals with different aspects of the entity, etc.

Trying to abstract that away behind a repository layer loses that very important distinction. It also forces us to do a lot of additional work, because we have to load the customer entity from all of the different data stores every time we need it. Even if most of the data that we need is not relevant for the operation at hand.

If would be much easier, simpler and maintainable to actually expose the idea of the multiple data stores to the application at large. You don’t end up with a leaky abstraction and it is easy to see when and how you actually need to combine the different data stores, and what the implications of that are for the specific scenarios that requires it.

Tags:

Published at

Originally posted at

Comments (43)

Northwind Starter Kit Review: Refactoring to an actual read model

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

In my previous post, I talked about how the CatalogController.Product(id) action was completely ridiculous in the level of abstraction that it used to do its work and promised to show how to do the same work on an actual read model in a much simpler fashion. Here is the code.

image

There are several things that you might note about this code:

  • It is all inline, so it is possible to analyze, optimize and figure out what the hell is going on easily.
  • It doesn’t got to the database to load data that it already has.
  • The code actually does something meaningful.
  • It only do one thing, and it does this elegantly.

This is actually using a read model. By, you know, reading from it, instead of abstracting it.

But there is a problem here, I hear you shout (already reaching for the pitchfork, at least let me finish this post).

Previously, we have hidden the logic of discontinued products and available products behind the following abstraction:

image

Now we are embedding this inside the query itself, what happens if this logic changes? We would now need to go everywhere we used this logic and update it.

Well, yes. Except that there are two mitigating factors.

  • This nice abstraction is never used elsewhere.
  • It is easy to create our own abstraction.

Here is an example on how to do this without adding additional layers of abstractions.

image

Which means that our action method now looks like this:

image

Simple, easy, performant, maintainable.

And it doesn’t make my head hurt or cause me to stay up until 4 AM feeling like an XKCD item.

A meta post about negative code reviews

A lot of people seems to have problems whenever I post a code review. The general theme of the comments is mostly along the lines of:

  1. You are an evil person and a cyber bully to actually do those sort of things and humiliate people.
  2. You have something against the author of the personally, and you set out to avenge them.
  3. There is no value in doing a negative code review.
  4. You never do any good reviews, only bad ones.
  5. You only tells us what is wrong, but not what is right.

There are a bunch of other stuff, but those are the main points.

For point 1 & 2, I have the same answer:

I talk about the code, not the person. I am actually very careful with my phrasing whenever I do this sort of a review. The code or the architecture is wrong, not the person. There is a difference, and a big one.

For point 3:

Most of those code reviews are generated because someone asks me to do them. And given some of the responses to the posts, I feel that they serve a very good purpose. Here is one such comment. I redacted the project name, because it doesn’t really matter, but the point stands:

I grew up on *** as it was my first layered application, almost 5 years ago and I personally believe that the effort ****** has put into this project during the years is simply amazing. The architecture reflect the most common architectural design patterns and represents almost the concepts expressed by Fowler and Evans. *** is not a project to teach you how to work with Northwind and it is not a project designed exclusively for Nhb. It is designed to show how a layered application should be architected in .NET and there is also a book wrote around this project.

My professional opinioned, backed by over a decade of practical experience and work in the field tells me that the project in question is actually not a good template for an application. And I feel that by pointing out exactly why, I am doing a service to the community.

Look, it is actually quite simple. The major reason that I do those negative code reviews is because I keep seeing the same types of mistakes repeated over and over again at customer sites. And the major reason is that those projects follow best practices as they see it. The problem is that they usually ignore the context of those best practices, so it becomes a horrible mess.

What is worse, there is the issue of the non coding architect, when you have someone that doesn’t actually have responsibility for the output making decisions about how it is going to be built. And those things are actually hard to fight against, precisely because they are considered to be best practices. One of the reasons that I am pointing out the problems in those projects is to serve as a reference point for other people when they need a way to escape the over architecture.

For point 4:

It takes something out of the ordinary to get me to actually post something to the blog. The barrier for a negative code review is how much is annoys me. The barrier for a positive code review is how much it impresses me. It is easier to annoy me to impress me, admittedly, but I quite frequently do good code reviews.

The problem is that most good code bases are actually fairly boring. That is pretty much the definition of a good codebase, of course Smile, so there isn’t really that much to talk about.

For point 5:

That usually come up when I do negative code reviews, “you only show the bad stuff, it doesn’t teach us how to do it right”. Well, that is pretty much the point of a negative code review. To show the bad stuff so you would know how to avoid it. There are literally thousands of posts in this blog, and quite a few of them are actually devoted to discussions on how to do it right. I have very little inclination to repeat that advice again in every post.

Even a blog post must obey the single responsibility principle.

Tags:

Published at

Originally posted at

Comments (61)

Northwind Starter Kit Review: Data Access review thoughts

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

In my last few posts, I have gone over the data access strategy used in NSK. I haven’t been impressed. In fact, I am somewhere between horrified, shocked and amused in the same way you feel when you see a clown slipping on a banana peel.  Why do I say that? Let us trace a single call from the front end all the way to the back.

The case in point, CatalogController.Product(id) action. This is something that should just display the product on the screen, so it should be fairly simple, right? Here is how it works when drawn as UML:

image

To simplify things, I decided to skip any method calls on the same objects (there are more than a few).

Let me show you how this looks like in actual code:

image

Digging deeper, we get:

image

We will deal with the first method call to CatalogServices now, which looks like:

image

I’ll skip going deeper, because this is just a layer of needless abstraction on top of Entity Framework and that is quite enough already.

Now let us deal with the second call to CatalogServices, which is actually more interesting:

image

Note the marked line? This is generating a query. This is interesting, because we have already loaded the product. There is no way of optimizing that, of course, because the architecture doesn’t let you.

Now, you need all of this just to show a single product on the screen. I mean, seriously.

You might have noticed some references to things like Read Model in the code. Which I find highly ironic. Read Models are about making the read side of things simple, not drowning the code in abstraction on top of abstraction on top of abstraction.

In my next post, I’ll show a better way to handle this scenario. A way that is actually simpler and make use an of actual read model and not infinite levels of indirection.

Northwind Starter Kit Review: The parents have eaten sour grapes, and the children’s teeth are set on edge

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

In my previous posts, I focused mostly on the needlessness of the repositories implementation and why you want to avoid that (especially implementing it multiple times). In this post, I want to talk about other problems regarding the data access. In this case, the sudden urge to wash my hands that occurred when I saw this:

image

I mean, you are already using an OR/M. I don’t like the Repository implementation, but dropping down to SQL (and unparapetrized one at that) seems uncalled for.

By the way, the most logical reason for this to be done is to avoid mapping the Picture column to the category, since the OR/M in use here (EF) doesn’t support the notion of lazy properties.

Again, this is a problem when you are trying to use multiple OR/Ms, and that is neither required nor really useful.

Okay, enough about the low level data access details. On my next post I’ll deal with how those repositories are actually being used.

Northwind Starter Kit Review: Data Access and the essence of needless work, Part II

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

Update: Andrea, the project’s leader has posted a reply to this series of posts.

Yes, this is another repositories are evil if you are using an OR/M post.

That is probably going to cause some reaction, so I am going to back this up with code from this NSK project. Let us talk about repositories, in particular. Let us see what we have here:

image

Okaay…

Now here are a few problems that I have with this:

  • There is no value gained by introducing this abstraction. You aren’t adding any capability what so ever.
  • In fact, since all OR/Ms provide an abstraction that isn’t dependent on type, creating IRepository<T> and things like ICustomerRepository is just making things more complicated.
  • There are going to be changes in behavior between different repositories implementations that will break your code.

Let us see what we actually have as a result. This is the Entity Framework POCO implementation:

image

You can probably guess how the rest of it is actually implemented. Yes, we have a LOT of code that is dedicated solely for this sort of forwarding operations.

And then we have the actual implementation of the delete:

image

Just to remind you, here is the NHibernate implementation of the same function:

image

Leaving aside the atrocious error handling code, the EF POCO version will do an immediate delete. The NHibernate version will wait for the transaction to be committed.

And don’t worry, I do remember the error handling. This is simply wrong.

And then we have implementations such as this:

image

This is for the Entity Framework Code First implementation. There is a message here that is coming to me loud and clear. This code wants to be deleted. It is neglected and abused and doesn’t serve any purpose in life except gobble up pieces of valuable disk space that could be filled with the much more valuable result of  reading from/dev/random.

Northwind Starter Kit Review: Data Access and the essence of needless work, Part I

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

Update: Andrea, the project’s leader has posted a reply to this series of posts.

I like to start reviewing applications from their database interactions. That it usually low level enough to tell me what is actually going on, and it is critical to the app, so a lot of thought usually goes there.

In good applications, I have hard time finding the data access code, because it isn’t there. It is in the OR/M or the server client API (in the case of RavenDB). In some applications, if they work against legacy databases or without the benefit of OR/M or against a strange data source (such as a remote web service target) may need an explicit data layer, but most don’t.

NSK actually have 5 projects dedicated solely to data access. I find this.. scary.

image

Okay, let me start outlying things in simple terms. You don’t want to do things with regards to data access the way NSK does them.

Let us explore all the ways it is broken. First, in terms of actual ROI. There is absolutely no reason to have multiple implementations with different OR/Ms. There is really not a shred of reason to do so. The OR/M is already doing the work of handling the abstraction from the database layer, and the only thing that you get is an inconsistent API, inability to actually important features and a whole lot more work that doesn’t' get you anything.

Second, there are the abstractions used:

image

I don’t like repositories, because they abstract important details about the way you work with the database. But let us give this the benefit of the doubt and look at the implementation. There is only one implementation of IRepository, which uses NHibernate.

image

As you can see, this is pretty basic stuff. You can also see that there are several methods that aren’t implemented. That is because they make no sense to a data. The reason they are there is because IRepository<T> inherits from ICollection<T>. And the reason for that is likely because of this:

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

The fact that this is totally the wrong abstraction to use doesn’t enter to the design, it seems.

Note that we also violate the contract of ICollection<T>.Remove:

true if item was successfully removed from the ICollection<T>; otherwise, false. This method also returns false if item is not found in the original ICollection<T>.

There are other reasons to dislike this sort of thing, but I’ll touch on that on my next post.

Orders Search in RavenDB

One of the things that we have been working on recently is our internal ordering system. We did a major upgrade on how it works, and along the way, we moved it to RavenDB. This post is to talk specifically about one feature in the system, Order Search.

As you can probably guess, order search is something that is quite important. It is annoying to an extent, because users come to us with all sort of information about an order, from “I ordered last year some stuff” to “with order id E12312312”. The later is very easy, the former tends to be somewhat of a challenge to find.

So I set out to try to figure out if we can do something better than just the standard gigantic search screen. For example, here is how you can search an order using Plimus:

image

There is way too much stuff there. Not to mention that the backend of such a screen is likely complex. Instead, I choose to go with a different metaphor:

image

The question is, how to do it?

It turns out that this is really quite simple to do with RavenDB, here is the full index:

public class Orders_Search : AbstractIndexCreationTask<Order, Orders_Search.ReduceResult>
{
    public class ReduceResult
    {
        public string Query { get; set; }
        public DateTime LastPaymentDate { get; set; }
    }

    public Orders_Search()
    {
        Map = orders => from order in orders
                        select new
                        {
                            Query = new object[]
                            {
                                order.FirstName, 
                                order.LastName, 
                                order.OrderNumber, 
                                order.Email, 
                                order.Email.Split('@'),
                                order.CompanyName,
                                order.Payments.Select(payment => payment.PaymentIdentifier),
                                order.LicenseIds
                            },
                            LastPaymentDate = order.Payments.Last().At
                        };
    }
}

And here is the UI for that:

image

But the question here is, how does this work?

Well, it turns out that RavenDB has some really interesting behavior when you feed it an IEnumerable. Instead of trying to index the IEnumerable as a single value, it index that as a multi value field.

That means that for each order, we are going to actually store the following data in the Query field:

  • Ayende  (FirstName)
  • Rahien   (LastName)
  • E11111111 (Order Number)
  • ayende@ayende.com (Email)
  • ayende (Email.Split(‘@’) first part)
  • ayende.com (Email.Split(‘@’) second part)
  • Hibernating Rhinos (Company)
  • E111111 (PaymentIdentifier for first payment)
  • E111234  (PaymentIdentifier for second payment)
  • E123412 (PaymentIdentifier for third payment)
  • EFA32826-1E09-48FA-BC0E-9A9AAF0FDD70 (LicenseIds#1)
  • 95CDDED2-2D19-48AF-991D-1284446CB7A3 (LicenseIds #2)

Because we store all of those values inside a single field, we can query for either one of them. Hell, we can even enable full text search on the field and allow even more interesting searches.

And the best thing, it pretty much Just Works. The user can put anything that might identify the order, and we will figure it out for him. And since the user is often me, I am very happy about it.

Tags:

Published at

Originally posted at

Comments (17)

RavenDB Course in London: 28–29 February

imageItamar will be giving our 2 days RavenDB course in London on the 28 – 29 February. You can find more details in the course info page.

imageItamar Syn-Hershko is a software developer writing mostly for .NET but also in Java and C/C++, and is a core developer of RavenDB since joining Hibernating Rhinos in 2011.

Author of open-source projects like HebMorph and NAppUpdate, and an active participant of others (CLucene for example), Itamar strongly believes in the power of open-source projects and the innovation they can bring to the table.

Itamar's current focus is on modern Information Retrieval - mostly search engines and databases, and he blogs about those and others in his blog http://code972.com

 

 

There is also some time there for on site consulting, so if you want to get one of the core RavenDB developers at your company to advice you on how to best use RavenDB, please ping us.

Tags:

Published at

Originally posted at

Architecture > Code

Steve Py asks an interesting question in one of the comments to my On Infinite Scalability post:

Can you elaborate more on: "Note, those changes are not changes to the code, they are architectural and system changes. Where before you had a single database, now you have many. Where before you could use ACID, now you have to use BASE. You need to push a lot more tasks to the background, the user interaction changes, etc."

When you talk about jumping from 1 server to multiple servers, ACID to BASE, and how user interaction changes, how do you quantify that this is done without code changes?

The answer to that is that there is a mistaken assumption here. Changing the architecture is going to change the code. But usually that is rarely relevant, because changing the architecture is a big change. If you are moving from a single DB to multiple database, for example, there are going to be code changes, but that isn’t what you worry about. The major change is the architecture differences (how do you split the data, how do you do reporting, can some of the dbs be down, etc).

Moving from ACID to BASE is an even greater change. The code might change a little or change drastically, but that isn’t where a lot of the effort is. Just defining the new system behavior on those scenarios is going to be much more complex. For example, taking something as simple as “user names are unique” would move from being a unique constraint in the database to something that needs to be able to handle those sort of things in a reasonable fashion.

Depending on your original architecture, it might be anything from replacing a single service implementation to re-writing significant parts of the code.

On Infinite Scalability

Udi Dahan posted about the myth of infinite scalability. It is a good post, and I recommend reading it. I have my own 2 cents to add, though.

YAGNI!

When building a system, I am always assuming an order of magnitude increase in the work the system has to do (usually, we are talking about # of requests / time period).

In other words, if we have 50,000 users, and we have roughly 5,000 active users during work hours, we typically have 100,000 requests per hour. My job is to make sure that we can manage to get up to 1,000,000 requests per hour. (Just to give you the numbers, we are talking about moving from ~30 requests per second to ~300 requests per second).

Why do we need that buffer?

There are several reasons to want to do that. First, we assume that the system is going to be in production for a relatively long time, so the # of user or their activity is going to grow. Second, in most systems, we are usually talking about some % of the users being active, but there are usually times when you have a significantly more users being active. For example, at end of tax year, an accounting system can be expected to see a greater utilization, as well as at every end of the month.

And within that buffer, we don’t need to change the system architecture, we just need to add more resources to the system.

And why a full order of magnitude?

Put simply, because it is relatively easy to get there. Let us assume end of year again, and now we have 15,000 active users (vs. the “normal” 5,000). But the amount of work we do isn’t directly related to the number of users. It is related to how active they are and what operations they are doing. In such a scenario, it is more likely that the users will be more active and stress the system more.

Finally, there is another aspect. Usually, moving a single order of magnitude up is no problem. In fact, I feel comfortable running on a single server (maybe replicate for HA) from 1 request per second to 50 or so. That is 3,600 request per hour to 180,000 requests per hour. But beyond that, we are going to start talking about multiple servers. We can handle a millions requests per hour on a web farm without much problems, but moving to the next level is likely to require more changes, and when you get beyond that, you require more changes still.

Note, those changes are not changes to the code, they are architectural and system changes. Where before you had a single database, now you have many. Where before you could use ACID, now you have to use BASE. You need to push a lot more tasks to the background, the user interaction changes, etc.

Of course, you can probably start your application design with a 10,000,000 requests per hour. That is big enough that you can move to a hundred million requests per hour easily enough. But, and this is important, is this something that is valuable? Is this really something that you can justify to the guys writing the checks.

Scalability is a cost function, as Udi has mentioned. And you need to be aware that this cost incurs during development and during production. If your expected usage can be handled (including the buffer) with a simpler architecture, go for it. If you grow beyond an order of magnitude, there are three things that will happen:

  1. You have the money to deploy a new system for the new load.
  2. You now have a better idea on the actual usage, rather than just guessing about the future.
  3. You are going to change how the system work anyway, from the UI to the internal works.

The last one is important. I am guessing that many people would say “but we can do it right the first time”. And the problem is that you really can’t. You don’t know what users will do, what they will like, etc. Whatever you are guessing now, you are bound to be wrong. I have had that happen to me on multiple systems.

More than that, remember time to market is a big feature as well.

Out of context, architecture is nothing but modern art

When I am thinking about modern art, I am thinking about an experience that I had that was remarkably similar to this one:

I got a lot of comments regarding my review of the Northwind Starter Kit project.

Here is the deal, if you want to demonstrate complex ways to solve a problem, you had better make sure that you are actually solving a problem that requires a complex solution. If you are demonstrating how to solve a simple problem in a complex way, you are basically doing disservice to the reader.

When I wanted to write a sample app to demonstrate something, I either chose to demonstrate the actual technology (writing a ToDo app) or I spent dozens of posts establishing the context (yes, that is Macto, I’ll get back to it).

But since so many people seems to have been offended by my slight of dismissing the project based on what seemed like just the number of projects, I’ll do a full review series on that. My point was to make it clear that creating complex solutions for simple problems is wrong, especially if you are trying to demonstrate a real workable system. Without the proper context, all of this stuff is just cargo cult.

Application review: Northwind Starter Kit

I continue to try to find a sample application for Northwind to use as my contrasting example for an article that I am writing, and I found the Northwind Starter Kit project. Even the project summary page gave me a few twitches, here is a small piece with the twitch inducing stuff bolded:

The application has been designed using common patterns, such as the ones defined within the "classic" "Designs Patterns" by Erich Gamma et al. and "Pattern of Enterprise Application Architecture", by Martin Fowler; though not required, these lectures are strongly recommended.

Guys! This is Northwind, the likelihood that you’ll need design patterns to build this application is nil to none! That just screens complexity overload.

Domain logic is implemented by means of a Domain Model, onto a layer of services adds application logic. The model is persisted by a DAL designed around the principles of the "Repository" patterns, which has been implemented in a LINQ-friendly way.

Northwind is a CRUD app, at its core, all of those things are adding complexity, and they aren’t really adding much at all. In fact, they are going to create just noise, and make working with things that much harder.

And then I opened the project, and I got this:

image

I mean, really? Seriously?!  22(!) projects to do a sample application using Northwind?

This is the point where I gave up on this as something that could be useful, but here are a few other gems as well:

image

image

I really like how the Update method does what it is meant to do, right?

Note that in either implementation, we are looking at totally and drastically different behavior.

Let us look at the interface, too:

image

The design is straight out of Patterns of Enterprise Application Architecture, and it is totally the wrong design to be using if you are using a modern OR/M.

Seriously, this is Northwind we are talking about, why make things so freaking complex?!

Tags:

Published at

Originally posted at

Comments (47)

Structuring your Unit Tests, why?

I am a strong believer in automated unit tests. And I read this post by Phil Haack with part amusement and part wonder.

RavenDB currently has close to 1,400 tests in it. We routinely ask for failing tests from users and fix bugs by writing tests to verify fixes.

But structuring them in terms of source code? That seems to be very strange.

You can take a look at the source code layout of some of our tests here: https://github.com/ayende/ravendb/tree/master/Raven.Tests/Bugs

It is a dumping ground, basically, for tests. That is, for the most part, I view tests as very important in telling me “does this !@#! works or not?” and that is about it. Spending a lot of time organizing them seems to be something of little value from my perspective.

If I need to find a particular test, I have R# code browsing to help me, and if I need to find who is testing a piece of code, I can use Find References to get it.

At the end, it boils down to the fact that I don’t consider tests to be, by themselves, a value to the product. Their only value is their binary ability to tell me whatever the product is okay or not. Spending a lot of extra time on the tests distract from creating real value, shippable software.

What I do deeply care about with regards to structuring the tests is the actual structure of the test. It is important to make sure that all the tests looks very much the same, because I should be able to look at any of them and figure out what is going on rapidly.

I am not going to use the RavenDB example, because that is system software and usually different from most business apps (although we use a similar approach there). Instead, here are a few tests from our new ordering system:

[Fact]
public void Will_send_email_after_trial_extension()
{
    Consume<ExtendTrialLicenseConsumer, ExtendTrialLicense>(new ExtendTrialLicense
    {
        ProductId = "products/nhprof",
        Days = 30,
        Email = "you@there.gov",
    });

    var email = ConsumeSentMessage<NewTrial>();

    Assert.Equal("you@there.gov", email.Email);
}

[Fact]
public void Trial_request_from_same_customer_sends_email()
{
    Consume<NewTrialConsumer, NewTrial>(new NewTrial
    {
        ProductId = "products/nhprof",
        Email = "who@is.there",
        Company = "h",
        FullName = "a",
        TrackingId = Guid.NewGuid()
    });
    Trial firstTrial;
    using (var session = documentStore.OpenSession())
    {
        firstTrial = session.Load<Trial>(1);
    }
    Assert.NotNull(ConsumeSentMessage<SendEmail>());
    
    Consume<NewTrialConsumer, NewTrial>(new NewTrial
    {
        TrackingId = firstTrial.TrackingId,
        Email = firstTrial.Email,
        Profile = firstTrial.ProductId.Substring("products/".Length)
    });

    var email = ConsumeSentMessage<SendEmail>();
    Assert.Equal("Hibernating Rhinos - Trials Agent", email.ReplyToDisplay);
}

As you can probably see, we have a structured way to send input to the system, and we can verify the output and the side affects (creating the trial, for example).

This leads to a system that can be easily tested, but doesn’t force us to spend too much time in the ceremony of tests.