Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 177 words

I have wasted the entire day trying to troubleshoot some idiotic issues with MS CRM.

Allow me to reiterate my previous statements on Microsoft CRM. It is an application built on feet of clay, marketed as a development platform and undeserving of consideration as anything but a vanilla install.

Highlights of the CRM:

  • Trying to update an entity via web service call. You get a "platform error" message. You have no input why this is not working working. Fast forward a day, turn out a PreUpdate callout is throwing an exception, which the CRM just swallows and abort.
  • Trying to update an entity via web service call. The call completes successfully, but the update is never made. Fast forward a day or two, turn out a PostUpdate callout is throwing an exception, which the CRM will swallow, appear to continue and discard the supposedly saved data.
  • Changing the type of a field in the entity will ensure that you cannot do an import/export from that entity ever again. You have to do a reinstall.

Yuck!

Tricky Code

time to read 1 min | 121 words

Without running the code, what is the result of this masterpiece?

class Program
{
	static void Main(string[] args)
	{
		DoSomething("a","b");
	}

	public static void DoSomething<T>(IList<T> set)
	{
		Console.WriteLine(set.Count);
	}

	public static void DoSomething<T>(params T[] items)
	{
		List<T> set = new List<T>();
		foreach (T t in items)
		{
			if (t == null)
				continue;
			set.Add(t);
		}
		DoSomething(set);
	}
}

It surprised the hell out of me until I figured out what was going on, then I was very amused. This code works exactly as it should be, to produce a very different result than the expected one.

time to read 2 min | 364 words

I am coming back to a project that I haven't been a part of for about six months. It was in active development during that time, and I was happy to get back to it, since it is a really good code base, and a fun project beside. The team that handle this project is top notched, but my first reaction was something in the order: "how could you let it turn so hard!"

Then I toned it down a bit and started Big Refactoring. About fifteen minutes later, I was done, which didn't match at all my initial response, and took some thinking. Why did I react this way? Why did it took ~15 minutes to turn something that I found awkward to something that was really pleasant to work with?

How could the other team members, all of whom are very good, have gotten into this situation?

The answer is both that I over reacted and that the expectations that I had from the project where high. We have made working on this project very smooth experience, but as time passed, it started to get awkward to work with. By that time, I wasn't around, and the developers just dealt with that. It isn't painful or hard, it is not bad or annoying. It is just that it began to get awkward, in a project that was really smooth sailing.

Those 15 minutes were spent mostly in breaking apart a few services, setting up a registration rule in Binsor and relaxing in the glow of content that Things Just Work once again.

I came to the conclusion that different pain tolerance levels are responsible for my reaction. I have very high expectations from my code, and I expected that it would continue to be as easy as it was in the start. The devs in the team would have likely performed the same actions as I did, at a later date. But I found it painful now.

It is especially interesting in light of the recent discussion about code size and scaling a project higher.

I think that having a lower pain tolerance level is a good thing, if kept in check.

time to read 5 min | 818 words

When is a DSL applicable? When will using a DSL make our job easier?

There are several layers to the answer, depending on the type of DSL that we want, and the context that we want to apply it.

When building a technical DSL, we will generally use that as bootstrapping and configuration mechanism, to make it easier to modify and change a part of the system.

In general, those DSL are focused enabling recurring types of tasks, usually of one off nature. Configuration is a common scenario, since it is probably the simplest to explain and to start, but there are many other example. Build scripts comes to mind, in fact, scripting in general is a common area for technical DSL. Combining the power of a flexible language with a DSL directed at the task at hand makes for a powerful tool.

Another interesting task for DSL is mapping layers, we have the destination in mind, usually some domain object or DTO, and we get some input that we transform to that object. Here we use the DSL for ease of modification, and ease of just adding new handlers.

Again, technical DSL are shortcuts to code and to avoiding pain. You can do everything you do in a technical DSL using code. The DSL should make it easier, but the main benefit is one-of-a-kind solution to a type of task.

Note that this one of a kind solution doesn't mean throw-away code. it means that you would usually have a singular need in an application. Configuring the IoC container is a need that you have once per application, for example, but it is critically important part of the application, and something that you go back. For the last year and half or so, we have used Binsor to just that, as a DSL that can configure the IoC container for us. It allowed very good flexibility in that it would allow use to define per project conventions very easily.

Technical DSL are usually the glue that holds stuff together.

What about business DSL?

Here, the situation is different. Business DSL are almost always about rules and the actions to be taken when those rules are met.

This sounds like a very narrow space, doesn't it? But let me state in another way, the place of a business DSL is to define policy, while the application code defined the actual operations. A simple examples will be defining the rules for order processing. Those, in turn, will affect the following domain objects:

  • Discounts
  • Payment plans
  • Shipping options
  • Authorization Rules

The application code then takes this and act upon it.

Policy is usually the place where we make most changes, while the operations of the system are mostly fixed. We are also not limited to a single DSL per application, in fact, we will probably have several, both technical and business focused DSL. Each of those will handle a specific set of scenarios (processing orders, authorizing payments, suggesting new products, etc).

What about building the entire system as a set of DSL?

That may be a very interesting approach to the task. In this scenario, we inverse the usual application code to DSL metrics, and decide that the application code that we would like to build would be about mostly infrastructure concerns and the requirements of the DSL. I would typically like to use this type of approach in backend processing systems. Doing UI on top of a DSL is certainly possible, but at this point, I think that we will hit the point of diminishing returns. Mostly because UI are usually complex, and you want to be able to handle them appropriately. A complex DSL is a programming language, and at the point, you would probably want to use a programming language to work with rather than a DSL.

There is an exception to that, assuming that your application code is in Boo. They you are working with a programming language, and then you can build a technical DSL that will work in concrete with the actual frameworks that you are using. Rails is a good example of this approach.

Assuming that you write your application code in a language that is not suited for DSL, however, you would probably want to have a more strict separation of the two approaches. Using DSL to define policy and using the application code to define the framework and the operations that can be executed by the system.

Building such a system turns out to be almost trivial, because all you need to do is apply the operations (usually fairly well understood) and then can play around with the policy at will. If you have done your job well, you'll likely have the ability to sit down with the customer and define the policy and have them review it at the same time.

I wonder how and why you would test those...

time to read 3 min | 496 words

Daniel and I are having an interesting discussion about mock frameworks, and he just posted this: What's wrong with the Record/Reply/Verify model for mocking frameworks.

Daniel also pulled this quote from the Rhino Mocks documentation:

Record & Replay model - a model that allows for recording actions on a mock object and then replaying and verifying them. All mocking frameworks uses this model. Some (NMock, TypeMock.Net, NMock2) use it implicitly and some (EasyMock.Net, Rhino Mocks) use it explicitly.

Daniel go on to say:

I find this record/replay/verify model somewhat unnatural.

I suggest that you would read the entire post, since this is a response to that. I just want to point out that I still hold this view. Mock frameworks all use this model, because verification is a core part of mocking.

The problem that I have with what Daniel is saying is that we seem to have a fundamental difference in opinion about what mocking is. What Daniel calls mocks I would term stubs. Indeed, Daniel's library, by design, is not a mock framework library. It is a framework for providing stubs.

Going a little bit deeper than that, it seems that Daniel is mostly thinking about mocks in tests as a way to make the test pass. I am thinking of those in terms of testing the interactions of an object with its collaborators.

This project does a good job of showing off how I think mocking should be used, and it represent the accumulation of several years of knowledge and experience in testing and using mocks. It also represent several spectacular failures in using both (hint: mocking the database when going to production may not be a good idea), from which I learned quite a bit.

Mocking can be abused to cause hard to change tests, so can other methods. Deciding to throw the baby with the bath water seems to be a waste to me. There is a lot of guidance out there about correct application of mocking, including how to avoid the over specified tests. The first that comes to mind is Hibernating Rhinos #1 web cast, which talks about Rhino Mocks and its usage.

Rhino Mocks can be used in the way that Daniel is describing. I would consider this appropriate at certain times, but I think that to say that this is the way it should be is a mistake. The Rhino Mocks web cast should do a good job in not only showing some of the more interesting Rhino Mocks features, but also where to use them.

To conclude, the record, replay, verify model stands at the core of mocking. You specify what you think that should happen, you execute the code under test and then you verify that the expected happened. Taking this to the limit would produce tests that are hard to work with, but I am in the opinion that taking a worst practice and starting to apply conclusions from that is not a good idea.

time to read 2 min | 348 words

One of the problems of multi threading is that there are a lot of intricacies that you have to deal with. Recently I run into issues that dictated that I would have to write an AsyncBulkInserterAppender  for log4net.

One of the reasons that I want to do that is to avoid locking the application if the database is down or the logging table is locked.I just had a recent issue where this casued a problem.

When I implemented that, I started to worry about what would happen if the database is locked for a long duration. There is a chance that this async logging would block for a long time, and then another async batch would start, also blocking, etc. Eventually, it will fill the thread pool and halt the entire system.

This is the approach I ended up with, it should ensure that there is at most, only two threads that are writing to the database at a time. Since I wrote it, I already found at least two bugs in there. It looks fine now, but I can't think of any good way to really test that.

I am afraid that multi threading can't really be tested successfully. This is something where code review is required.

Here is the code:

protected override void SendBuffer(LoggingEvent[] events)
{
	// we accept some additional complexity here
	// in favor of better concurrency. We don't want to
	// block all threads in the pool (train) if we have an issue with
	// the database. Therefor, we perform thread sync to ensure
	// that only a single thread will write to the DB at any given point
	ThreadPool.QueueUserWorkItem(delegate
	{
		lock (syncLock)
		{
			eventsList.AddLast(events);
			if (anotherThreadAlreadyHandlesLogging)
				return;
anotherThreadAlreadyHandlesLogging = true; } while (true) { LoggingEvent[] current; lock (syncLock) { if(eventsList.Count == 0) { anotherThreadAlreadyHandlesLogging = false; return; } current = eventsList.First.Value; eventsList.RemoveFirst(); } PerformWriteToDatabase(current); } }); }
time to read 7 min | 1276 words

Via Frans, I got into these two blog posts:

In both posts, Steve & Jeff attack code size as the #1 issue that they have with projects. I read the posts with more or less disbelieving eyes. Some choice quotes from them are:

Steve: If you have a million lines of code, at 50 lines per "page", that's 20,000 pages of code. How long would it take you to read a 20,000-page instruction manual?

Steve: We know this because twenty-million line code bases are already moving beyond the grasp of modern IDEs on modern machines.

Jeff: If you personally write 500,000 lines of code in any language, you are so totally screwed.

I strongly suggest that you'll go over them (Steve's posts is long, mind you), and then return here to my conclusions. 

Frans did a good job discussing why he doesn't believe this to be the case, he takes a different tack than mine, however, but that is mostly business in usual between us. I think that the difference is more a matter of semantics and overall approach than the big gulf it appears at time.

I want to focus on Steve's assertion that at some point, code size makes project exponentially harder. 500,000 LOC is the number he quotes for the sample project that he is talking about. Jeff took that number and asserted that at that point you are "totally screwed".

Here are a few numbers to go around:

  • Castle: 386,754
  • NHibernate: 245,749
  • Boo: 212,425
  • Rhino Tools: 142,679

Total LOC: 987,607

I think that this is close enough to one million lines of code to make no difference.

This is the stack on top of which I am building my projects. I am often in & out of those projects.

1 million lines of code.

I am often jumping into those projects to add a feature or fix a bug.

1 million lines of code.

I somehow manage to avoid getting "totally screwed", interesting, that.

Having said that, let us take a look at the details of Steve's post. As it turn out, I fully agree with a lot of the underlying principals that he base his conclusion on. 

Duplication patterns - Java/C# doesn't have the facilities to avoid duplication that other languages do. Let us take the following trivial example. I run into it a few days ago, I couldn't find a way to remove the duplication without significantly complicating the code.

DateTime start = DateTime.Now;
// Do some lengthy operation
DateTime duration = DateTime.Now - start;
if (duration > MaxAllowedDuration)
{
    SlaViolations.Add(duration, MaxAllowedDuration, "When performing XYZ with A,B,C as parameters");
}

I took this example to Boo and extended the language to understand what SLA violation means. Then I could just put the semantics of the operations, without having to copy/paste this code.

Design patterns are a sign of language weakness - Indeed, a design pattern is, most of the time, just a structured way to handle duplication. Boo's [Singleton] attribute demonstrate well how I would like to treat such needs. Write it once and apply it everywhere. Do not force me to write it over and over again, then call it a best practice.

There is value in design patterns, most assuredly. Communication is a big deal, and having a structured way to go about solving a problem is important. That doesn't excuse code duplication, however.

Cyclomatic complexity is not a good measure of the complexity of a system - I agree with this as well. I have seen unmaintainable systems with very low CC scores. It was just that changing anything in the system require a bulldozer to move the mountains of code required. I have seen very maintainable systems that had a high degree of complexity at parts. CC is not a good indication.

 Let us go back to Steve's quotes above. It takes too long to read a million lines of code. IDE breaks down at 20 millions lines of code.

Well, of the code bases above, I can clearly and readily point outs many sections that I have never read, have no idea about how they are written or what they are doing. I never read those million lines of code.

As for putting 20 millions lines of code in the IDE...

Why would I want to do that?

The secret art of having to deal with large code bases is...

To avoid dealing with large code bases.

Wait, did I just agree with Steve? No, I still strongly disagree with his conclusions. It is just that I have a very different approach than he seems to have for this.

Let us look at a typical project structure that I would have:

image

Now, I don't have the patience (or the space) to do it in a true recursive manner, but imagine that each of those items is also composed of smaller pieces, and each of those are composed of smaller parts, etc.

The key hole is that you only need to understand a single part of the system at a time. You will probably need to know some of the infrastructure, obviously, but you don't have to deal with it.

Separation of concerns is the only way to create maintainable software. If your code base doesn't have SoC, it is not going to scale. What I think that Steve has found was simply the scaling limit of his approach in a particular scenario. That approach, in another language, may increase the amount of time it takes to hit that limit, but it is there nevertheless.

Consider it the inverse of the usual "switch the language for performance" scenario, you move languages to reduce the amount of things you need to handle, but that scalability limit is there, waiting. And a language choice is only going to matter about when you'll hit it.

I am not even sure that the assertion that 150,000 lines of dynamic language code would be that much better than the 500,000 lines of Java code. I think that this is utterly the wrong way to look at it.

Features means code, no way around it. If you state that code size is your problem, you also state that you cannot meet the features that the customer will eventually want.

My current project is ~170,000 LOC, and it keeps growing as we add more features. We haven't even had a hitch in our stride so far in terms of the project complexity. I can go in and figure out what each part of the system does in isolation. If I can't see this in isolation, it is time to refactor it out.

On another project, we have about 30,000 LOC, and I don't want to ever touch it again.

Both projects, to be clear, uses NHiberante, IoC, DDD (to a point). The smaller project has much higher test coverage as well and much higher degree of reuse.

The bigger project is much more maintainable (as a direct result of learning what made the previous one hard to maintain).

To conclude, I agree with many of the assertions that Steve makes. I agree that C#/Java encourage duplication, because there is no way around it. I even agree that having to deal with a large amount of code at a time is bad. What I don't agree is saying that the problem is with the code. The problem is not with the code, the problem is with the architecture. That much code has no business being in your face.

Break it up to manageable pieces and work from there. Hm... I think I have heard that one before...

time to read 6 min | 1194 words

Over a year ago I was asked how we can query a many to many association with NHibernate using the criteria API. At the time, that was not possible, but the question came up again recently, and I decided to give it another try.

First, let us recall the sample domain model:

Blog
    m:n Users
    1:m Posts
        n:m Categories
        1:m Comments

And what we want to do, which is to find all Posts where this condition is met:

Blog.Users include 'josh' and Categories includes 'Nhibernate'  and a Comment.Author = 'ayende'.

At the time, it wasn't possible to express this query using the criteria API, although you could do this with HQL. Doing this with HQL, however, meant that you were back to string concat for queries, which I consider a bad form.

I did mention that a year have passed, right?

Now it is possible, and easy, to do this using the criteria API. Here is the solution:

DetachedCriteria blogAuthorIsJosh = DetachedCriteria.For<User>()
	.Add(Expression.Eq("Username", "josh")
	.CreateCriteria("Blogs", "userBlog")
	.SetProjection( Projections.Id())
	.Add(Property.ForName("userBlog.id").EqProperty("blog.id"));

DetachedCriteria categoryIsNh = DetachedCriteria.For(typeof(Category),"category")
    .SetProjection(Projections.Id())
    .Add(Expression.Eq("Name", "NHibernate"))
    .Add(Property.ForName("category.id").EqProperty("postCategory.id "));

session.CreateCriteria(typeof (Post),"post")
    .CreateAlias("Categories", "postCategory")
    .Add(Subqueries.Exists(categoryIsNh))
    .CreateAlias("Comments", "comment")
    .Add(Expression.Eq("comment.Name", "ayende"))
    .CreateAlias("Blog", "blog")
    .Add(Subqueries.Exists(blogAuthorIsJosh))
    .List();

And this produces the following SQL:

SELECT This_.Id              AS Id1_3_,
       This_.Title           AS Title1_3_,
       This_.TEXT            AS Text1_3_,
       This_.Postedat        AS Postedat1_3_,
       This_.Blogid          AS Blogid1_3_,
       This_.Userid          AS Userid1_3_,
       Blog3_.Id             AS Id7_0_,
       Blog3_.Title          AS Title7_0_,
       Blog3_.Subtitle       AS Subtitle7_0_,
       Blog3_.Allowscomments AS Allowsco4_7_0_,
       Blog3_.Createdat      AS Createdat7_0_,
       Comment2_.Id          AS Id4_1_,
       Comment2_.Name        AS Name4_1_,
       Comment2_.Email       AS Email4_1_,
       Comment2_.Homepage    AS Homepage4_1_,
       Comment2_.Ip          AS Ip4_1_,
       Comment2_.TEXT        AS Text4_1_,
       Comment2_.Postid      AS Postid4_1_,
       Categories7_.Postid   AS Postid__,
       Postcatego1_.Id       AS Categoryid,
       Postcatego1_.Id       AS Id3_2_,
       Postcatego1_.Name     AS Name3_2_
FROM   Posts This_
       INNER JOIN Blogs Blog3_
         ON This_.Blogid = Blog3_.Id
       INNER JOIN Comments Comment2_
         ON This_.Id = Comment2_.Postid
       INNER JOIN Categoriesposts Categories7_
         ON This_.Id = Categories7_.Postid
       INNER JOIN Categories Postcatego1_
         ON Categories7_.Categoryid = Postcatego1_.Id
WHERE  EXISTS (SELECT This_0_.Id AS Y0_
               FROM   Categories This_0_
               WHERE  This_0_.Name = @p0
                      AND This_0_.Id = Postcatego1_.Id)
       AND Comment2_.Name = @p1
       AND EXISTS (SELECT This_0_.Id AS Y0_
                   FROM   Users This_0_
                          INNER JOIN Usersblogs Blogs3_
                            ON This_0_.Id = Blogs3_.Userid
                          INNER JOIN Blogs Userblog1_
                            ON Blogs3_.Blogid = Userblog1_.Id
                   WHERE  This_0_.Username = @p2
                          AND Userblog1_.Id = Blog3_.Id);

I am pretty sure that this is already in 1.2, but I don't have that handy to check.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}