Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,856 | Comments: 49,160

filter by tags archive
time to read 3 min | 425 words

I have a tremendous amount of respect to Michael Feathers, so it is a no brainer to see his presentation.

Michael is talking about why Global Variables are not evil. We already have global state in the application, removing it is bad/impossible. Avoiding global variables leads to very deep argument passing chains, where something needs an object and it passed through dozens of objects that just pass it down. We already have the notions on how to test systems using globals (Singletons). He also talks about Repository Hubs & Factory Hubs – which provide the scope for the usage of a global variable.

  • Refactor toward explicit seams, do not rely on accidental seams, make them explicit.
  • Test Setup == Coupling, excessive setup == excessive coupling.
  • Slow tests indicate insufficient granularity of coupling <- I am not sure that I agree with, see my previous posts about testing for why.
  • It is often easier to mock outward interfaces than inward interfaces (try to avoid mocking stuff that return data)
  • One of the hardest things in legacy code is making a change and not knowing what it is affecting. Functional programming makes it easier, because of immutability.
  • Seams in a functional languages are harder. You parameterize functions in order to get those seams.
  • TUF – Test Unfriendly Feature – IO, database, long computation
  • TUC – Test Unfriendly Construct – static method, ctor, singleton
  • Never Hide a TUF within a TUC
  • No Lie principal – Code should never lie to you. Ways that code can lie:
    • Dynamically replacing code in the source
    • Addition isn’t a problem
    • System behavior should be “what I see in the code + something else”, never “what I see minus something else”
    • Weaving & aspects
    • Impact on inheritance
  • The Fallacy of Restricted Languages
  • You want to rewrite if the architecture itself is bad, if you have issues in making changes rapidly, it is time for refactor the rough edges out.
time to read 4 min | 777 words

image If you have heard me speak, you are probably are aware that I tend to use this analogy a lot. Any 3rd party system that I have to integrate with was written by a drunken monkey typing with his feet.

So far, I am sad to say, that assumption has been quite accurate over a large range of projects.

You are probably already familiar with concepts of System Boundary and Anti Corruption Layer and how to apply them in order to keep the crazy monkey shtick away from your system, so I am not going to talk about those.

What I do want to talk about now is something slightly different, it is not related to the actual 3rd party system itself, it is related to its management.

One of the more annoying things about 3rd party stuff is that it is usually broken in… interesting ways. So you really want to be able to test that thing during the integration phase, so you would know what to expect. That seems like a very simple concept, right? All you have to do is to hit the QA env. and have a ball. That bring us back to the system management, and the really big screw ups that are happening there.

Before we continue, I want to state that usually, when I am integrating with a 3rd party, I usually do so out of some business reason, so it tend to be the case that I want to give them money, customers, pay per operation, or something else that is to the benefit of that 3rd party! Making integration with your software hard has a direct affect on the number of people who are trying to give you money! 

That is why I am so surprised by the amount of trouble that you have to go through in some organizations (I would say, the majority of the organization). Here is a partial list of things that I run into recently:

  • Having no QA env. – We were told to basically just work off of the spec and push it to production. You can imagine how big a success that was.
  • Having a QA env. that was a different version than the one in production, don’t tell the people that are integrating with you anything about that.
  • A variation of the above, have QA env. that is significantly different than production.
  • Having a QA env. that has real world effects. For example, if I am testing my bulk mail integration, I do not expect all those emails to be actually sent! Or, in another memorable case, if I am integrating with a merchant provider and testing authorization, I do not want to see those in my real credit report!
  • Having a QA env. that requires frequent human intervention. For example:
    • In order to validate that your integration has been successful, you have to call someone and wait until they verify that yes, the appropriate values are there in the appropriate system. Each and every time.
    • The integration is a one way operation (imagine something like CreateUser, which would fail if it already exists), and you cannot use any dummy values (imagine that you need to pass a valid credit card to that function), you have to have real ones. So every time you test the integration, you have to call someone and have them reset that information.
  • Having a QA env. that is down for two weeks just as we were suppose to test the integration.

That is why I am saying that the system management is such a crucial thing. And why I am so surprised and disappointed to see so many organizations get it wrong in ways that are so not funny.

If you are building a system that people will integrate with, consider, as soon as possible, the implications of not having a good testing environment for your system. As a matter of fact, I suggest building QA hooks from day one, so you can pass a flag to the system that would tell it “this is only for tests”, which would mean that any external action on the system would be omitted, but all the logic and behavior are retained. 

time to read 4 min | 626 words

Recently I had a few conversations about tooling, lack thereof and the impact that this has on the overall system.

I don’t think that someone can argue that starting from scratch is a very good idea in most scenarios. This is especially true when you are talking about any significantly complicate system, which we all do.

There is still a place for quick & dirty solutions, but that is for throwaway utilities. Hell, even calc.exe isn’t simple (arbitrary precision numbers) anymore.

Therefore, I am constantly surprise by people that chose to go without. I could understand that if they chose that path out of ignorance, but all too often a conscious decision.

It is based on reasons such as: “It would take me too long to get approval to buy this tool”, or “I have to go through the Proper Channels to use that tool”.

I don’t like those reasons, but they are real, and I had encountered them several times.

By now you have probably understood that I have firm opinions about how to go about building successful software systems.

Those opinions are backed by the tools that I am using, but are not constrained to them (just check the commit logs when I am doing something new J ).

 

So, where is this headed?

Right now I am working on top of the  Naked CLR, which means that I am mostly cannot use my usual stack. This is not fun, and I assure you that I bitch about this more than enough.

Nevertheless, the inaccessibility of my usual stack doesn’t mean that I can just ignore my experience so far. I have the aforementioned opinions for a reason.

IoC and AoP are two simple concepts that I have embraced whole heartedly. Auto registration and letting the application figure it out is paramount to the way I develop software.

I tried to do it otherwise, and I have myself constrained and annoyed.

How do I solve this issue? By using Field Expedients replacements.

What do I mean by that?

A container, auto registration, method interception and AoP are fairly easy to build. You can take a look at those implementations, to get some ideas.

I implore you, however, that you will not use those. They are replacements, hacks and temporary. They are there so I can work using familiar methods and practices, although not using my familiar tools.

If you’ll tell me the implementation is bad, I’ll agree. But it is there, and it can be used. As a scaffolding if nothing else, but it can be used.

This post is mostly to note that not having the tools is not a good enough reason. You can build the tools.

This post is written to make a point that most people seems to miss. You don’t need to get it perfect. You don’t need to drop down to assembly to get every erg of speed you need. You don’t need it to be functional for the general case, you don’t even need it to be pretty.

The so called container that I have created is a good candidate for the Daily WTF.

 

I think that it took about 4 hours overall to build everything that I needed. I didn’t build it all in one go, just as I needed it. You can actually take a look at the history and see how it went.

 

Don’t mistake what I am saying, however. I am calling those Field Expedients for a reason. They are crutches, and are not nearly as capable as the real things. But they are close enough, and I am told that I am good at mocking.

time to read 2 min | 327 words

Roy is talking about legacy code, and has some great tips about it. I would like to re-iterate the recommendation for TypeMock for legacy code. I still think that it is preferred to limit TypeMock to legacy code, but that is another issue.

There is another approach that I would like to mention, which is fairly radical and takes a while.

Break The Code.

You can do a lot with R# just blindingly doing Extract Method (Ctrl+Alt+M). At one point I had a classs with:

  • CommandHandler.Run_Maybe_ParameterValidation()
  • CommandHandler.Run_SecondPart()
  • CommandHandler.Run_WhyDoWeCallTheWebServiceHere()
  • CommandHandler.Run_UnnecessaryThreadingCode_WTF()

Those were an intermediatery state, just to allow me to try and figure out what the hell CommandHandler.Run() should do. For large code bases, Extract & Delegate is also an option, you push the functionality to another class, but keep the same interface and behavior.

I had several cases where I not only had code bases with red tests, but the code could not compile for long period of times (two hours!) as I worked to push everything into place. I tend to scare some people because I am very trigger happy on the delete key (CTRL+Z & Subversion), to the point where I here this tiny "Eek!" if they see that I am selecting code :-)

Just to be clear, this is rarely a valid approach unless you are familiar with the code base and/or sure that you have any reasonable chance of at least manual QA. The worst position is where you can't understand neither the code nor its results. (Part of the reason that I hate doing trouble shooting on other people's code, I can't just delete stuff that I don't understand).

Even so, just following the structure of the code can give you some ideas about how to do some extraction, which can be of great help in understanding the code. Unfortantely, all of the above are time consuming tasks, maybe a better approach would be Cover & Ignore.

time to read 2 min | 268 words

I have updated NHibernate Generics to support NHibernate 1.2 . This update is very minimal and was done to ensure that you can move code to the NHibernate 1.2 without breaking stuff.

Please consider NHibernate Generics library use as deprecated with NHibernate 1.2 and above.

NHibernate Generics (NG from now on) always had two purposes. The first was to give generics capabilities to NHibernate, so I wouldn't have to do casts all over the place. The second was to automatically handle assoications between objects using anonymous delegates and a sprinkle of magic.

I wrote this library nearly a year ago, and I have learned a lot since then, not the least of which, I learned what is the cost of magic. There are design assumption built into the library that are now proven to be false, specifically, when you want/don't want to force lazy load and working with the second level cache.

When I find myself scratching my head and trying to understand what is happening in the code I know that it is doing too much behind my back.

We no longer need external Generics support, since it is now (1.2) built into NHibernate.  And I had a lot of fun and a number of serious issues with the automatica assoication handling, so I don't see much of a point in continuing that.

You can find the code here, just check out and compile, I am currently not planning to create a binary release.

time to read 1 min | 101 words

Things I discovered today about the legacy project I am working on:

  • If you say the method names with a french accent, they actually sound meaningful. If you keep this up for long enough, you might make a song out of it. I don't think it will be a top 10 hit, though.
  • There are several crucial differences between the "DO_CR_DA" and "DO_DA_CR", although both are called from the same place and looks nearly the same.
  • It doesn't matter how long it takes the code to run, you can keep yourself busy for days, trying to gues what the method names mean.

My Daily WTF

time to read 3 min | 583 words

The term "churning code" has no direct equivalent in Hebrew, sadly. I did a code churn of ~12,000 Lines of code a couple of weeks ago, and I got some... peculiar results. I recently got a call from a co-worker about a piece of code that I wrote during this churn. The code was something like:

SELECT

      OrderId,

      Avg(Price)

FROM Orders

GROUP BY OrderId, Price

For some reason, this code produced the wrong results. I was gently asked what I was thinking when I wrote this piece, and I really couldn't provide a decent explanantion. At least fixing this mistake was easy, if I weren't doing much the same thing in four other spots, in big queries.

*Sigh* I heard somewhere that the output of the average programmer is ~20 lines of debugged code a day. It's not really surprising, actually. I knew it during the churn, now i need to find out just how much more such WTF are awaiting me.

By the way, if you work in a code base long enough, you learn to read the patterns of the code well enought that you no longer need to read the code. From the structure of the code you can figure out what is it trying to do (if not why, or the sepsific of the how). Do it long enough and you can "restructure*" the code to be much nicer, without reading the code.  The problem with this method is the result above.

Boy, is legacy programming fun or what?

* It is not refactoring unless I have tests in place.

time to read 1 min | 139 words

I'm seeing too much cursors around right now. Way too many. I also keep seeing the same code over & over again. I know that I'm in trouble because to work with this code base I need to:
  • constantly use grep (and deriatives, like AstroGrep, which is cool);
  • get a diff program that support drag & drop from the shell as well as quickly changing only one of the files. (Found WinMerge, which looks cool);
  • pronounce variable names out loud in order to understand their meaning*;
  • watch ingenious workarounds to all sorts of limitations in the number of concurrent nested corsurs in the database;
  • cut code by 70% by deciding that I don't need to verify stuff over & over again.
Argh! Excuse me while I go bang my head against the wall.

* (Hebrew words in English characters)

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmortem (26):
    07 Jun 2019 - Printer out of paper and the RavenDB hang
  2. Reviewing Sled (3):
    23 Apr 2019 - Part III
  3. RavenDB 4.2 Features (5):
    21 Mar 2019 - Diffing revisions
  4. Workflow design (4):
    06 Mar 2019 - Making the business people happy
  5. Data modeling with indexes (6):
    22 Feb 2019 - Event sourcing–Part III–time sensitive data
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats