Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,971 | Comments: 44,508

filter by tags archive

Career planningThe immortal choices aren't


In response for my previous post, Eric had the folowing comment (well, tweet):

I guess some baskets last longer or some eggs don't seem to rot e.g. C, C++, SQL, Java*, etc

And that is true, in some sense of the word. In other words, there isn't any expected shortage of C or C++ opportunities anywhere in the medium to long future. The problem is that this isn't the same language, framework or enviornment over time.

In the late 90s / early 2000s I was deep into C++. I read Effective C++ and More Effective C++, I gone through the entire STL with a fine tooth comb, and I was a pretty enthusiastic (and bad) C++ developer. But let assume that I was a compotent C++ dev in the late 90s.

What was the environment like at the time? Pretty much all 32 bits, STL was still a hotly debated topic. MFC and ATL were all the rage, and making the C++ compiler die via template meta programming was extremely common. COM and Windows DNA were all the rage. 

Assume that you freeze the knoweldge at that time, and skip forward 15 years. Where are you at?

Modern C++ has embraced STL, then moved beyond it to Boost. In Windows land, MFC and ATL are only used for legacy stuff. COM is still there, but you try to avoid it. And cross platform code isn't something esoteric.

Now, I stopped doing C++ a few years after getting starting with .NET, so I'm pretty sure that the kind of changes that I can see are just the tip of the iceberg.

In short, just because the title of your job didn't change doesn't mean that what you did hasn't changed considerablely. And choosing safe (from a job prospect) programming language and sticking to it, knowing that you can always rely on that is a pretty good way to perfom career suicide.

On the other hand, I know people looking for Cobol programmers...

Career planningThe age of least resistance


On Sunday, there was a news program about how tough it is to find work after 40s. It was full of the usual stuff about employers only looking for young people who can work 30 hours days*, and freezing out anyone too old for their taste, etc.

This is a real problem in many cases, and one that I find abhorrent. Not the least because I plan to have a long career in my chosen field, and I really don’t like the idea of having a certain age after which I should be shuffled off to do data entry tasks, if that. Especially since that age seems not too far at all. Currently, the oldest person we have in a development role is over fifty, although most of our team is late twenties to mid thirties.

So we reached out to one of them, asking to get a CV so we can look at that. And it took me very little time to realize why this person had a hard time finding a job. In particular, while the news program was about people who are unable to find a job, this particular person actually had a job. It just wasn’t a job that he was happy with. As far as I understand, he was paid a lot less than what he was used to, comparable to someone just starting out, rather than someone with close to two decades of experience.

Looking at the CV, it was obvious why that was. This particular person job history included a 15 years stretch in a large municipality. During which he worked mostly on VB6 programs. It has only been in the last couple of years that he started working in .NET.

Now, Microsoft released VB6 in 1998. And announced that it is moving to VB.Net in early 2000, with the .NET framework being released on 2002. By 2005, VB6 was no longer supported, and by 2008 even extended support run out. So we are talking about 7 – 8 years in which the main tool at their disposal was quite clearly a dead end.

While I’ve fond memories of VB6, and I’m pretty sure that there is still a lot of software using it. It is not surprising that demand for people with VB6 expertise has already peaked and is currently in a decline that I don’t really see changing. Note that this isn’t really surprising, and you would have to willfully ignore reality to believe that there is a strong future in VB6 in the past decade.

So we have a person with expertise in obsolete tech, trying to find a job in the market with effectively 1-2 years of experience using C#. It isn’t surprising that he got what is effectively a starter position, even given his age.  It isn’t that his age affected the offered position, it is that it didn’t.

This lead back to the advice I gave previously on the matter of career planning. Saying “I got a nice job” and resting on the laurels is a good way to end up in a marginalized position down the road.

Keeping your skills up to date (ideally as part of your job, but outside of it is if isn’t possible) is crucial, otherwise you are the guy with one year of actual experience, repeated many times over.

* Not a typo, it is intentionally stupid.

Comparing developers


Recently I had to try to explain to a non technical person how I rate the developers that I work with. In technical terms, it is easy to do:

int Compare(devA, devB, ctx)

But it is very hard to do:

int Compare(devA, devB);

var score = Evaluate(dev);

What do I mean by that? I mean that it is pretty hard (at least for me), to give an objective measure of a developer with the absence of anyone to compare him to, but it very easy to compare two developers, but even so, only in a given context.

An objective evaluation of a developer is pretty hard, because there isn’t much that you can objectively measure. I’m sure that no reader of mine would suggest doing something like measuring lines of code, although I wish it was as easy as that.

How do you measure the effectiveness of a developer?

Well, to start with, you need to figure out the area in which you are measuring them. Trying to evaluate yours truly on his HTML5 dev skills would be… a negative experience. But in their areas of expertise, measuring the effectiveness of two people is much easier. I know that if I give a particular task to Joe, he will get it done on his own. But if I give it to Mark, it will require some guidance, but finish it much more quickly.  And Scott is great at finding the root cause of a problem, but is prune to analysis paralysis unless prodded.

This came up when I tried to explain why a person spending 2 weeks on a particular problem was a reasonable thing, and that in many cases you need a… spark of inspiration for certain things to just happen.

All measurement techniques that I’m familiar with is subject to the observer effect, which means that you might get a pretty big and nasty surprise by people adapting their behavior to match the required observations.

The problem is that most of the time, development is about things like stepping one foot after the other, getting things progressively better by making numerous minor changes that has major effect. And then you have a need for brilliance. A customer with a production problem that require someone to have the entire system in their head all at once to figure out. A way to optimize a particular approach, etc.

And the nasty part is that there is very little way to actually get those sparks on inspiration. But there is usually a correlation between certain people and the number of sparks of inspiration per time period they get. And one person’s spark can lead another to the right path and then you have an avalanche of good ideas.

But I’ll talk about the results of this in another post Smile.

Boldly & confidently fail, it is better than the alternative


Recently I had the chance to sit with a couple of the devs in the RavenDB Core Team to discuss “keep & discard” habits*.

The major problem we now have with RavenDB is that it is big. And there are a lot of things going on there that you need to understand. I run the numbers, and it turns out that the current RavenDB contains:

  • 835,000 Lines of C#
  •   67,500 Lines of Type Script
  •   87,500 Lines of HTML

That is divided into many areas of functionalities, but that is still a big chunk of stuff to go through. And that is ignoring things that require we understand additional components (like Esent, Lucene, etc). What is more, there is a lot of expertise in understanding what is going on in term of the full picture. We limit this value here because too much of it would result in high memory consumption under this set of circumstances, for example.

The problem is that it take time, and sometime a lot of it, to get good understanding on how things are all coming together. In order to handle that, we typically assign new devs issues from all around the code base. The idea isn’t so much to give them a chance to become expert in a particular field, but to make sure that they get the general idea of how come is structured and how the project comes together.

Over time, people tend to gravitate toward a particular area (M** is usually the one handling the SQL Replication stuff, for example), but that isn’t fixed (T fixed the most recent issue there), and the areas of responsibility shifts (M is doing a big task, we don’t want to disturb him, let H do that).

Anyway, back to the discussion that we had. What I realized is that we have a problem. Most of our work is either new features or fixing issues. That means that nearly all the time, we don’t really have any fixed template to give developers “here is how you do this”. A recent example was an issue where invoking smuggler with a particular set of filters would result in very high cost. The task was to figure out why, and then fix this. But the next task for this developer is to do sharded bulk insert implementation.

I’m mentioning this to explain a part of the problem. We don’t see a lot of “exactly the same as before” and a new dev on the team lean on the other members quite heavily initially. That is expected, of course, and encouraged. But we identified a key problem in the process. Because the other team members also don’t have a ready made answer, they need to dig into the problem before they can offer assistance, which sometimes (all too often, to be honest) lead to a “can you slide the keyboard my way?” and taking over the hunt. The result is that the new dev does learn, but a key part of the process is missing, the finding out what is going on.

We are going to ask both sides of this interaction to keep track of that, and stop it as soon as they realize that this is what is going on.

The other issue that was raised was the issue of fear. RavenDB is a big system, and it can be quite complex. It is quite reasonable apprehension, what if I break something by mistake?

Here it comes back to the price of failure. Trying something out means that at worst you wasted a work day, nothing else. We are pretty confident in our QA process and system, so we can allow people to experiment. Analysis paralysis is a much bigger problem. And I wasn’t being quite right, trying the wrong thing isn’t wasting a day, you learned what doesn’t work, and hopefully also why.

“I have not failed. I've just found 10,000 ways that won't work.”
Thomas A. Edison

* Keep & discard is a literal translation of a term that is very common in the IDF. After most activities, there is an investigation performed, and one of the first questions asked is what we want to keep (good things that happened that we need to preserve for the next time we do this) and what we need to discard (bad things that we need to watch out for).

** The actual people are not relevant for this post, so I’m using letters only.

Project Tamar


I’m happy to announce that despite the extreme inefficiencies involved in the process, the performance issues and what are sure to be multiple stop ship bugs in the way the release process is handled. We have successfully completed Project Tamar.

The result showed up as 2.852 Kg bundle, and is currently sleeping peacefully. I understand that this is a temporary condition, and lack of sleep shall enthuse shortly. I’ll probably regret saying that, but I can’t wait.

 

This little girl is named Tamar and to celebrate, I’m offering a 28.52% discount for all our products. This include:

Just use coupon code: Tamar

This will be valid for the next few days.

Lambda methods and implicit context


The C# compiler is lazy, which is usually a very good thing, but that can also give you some issues. We recently tracked down a memory usage issue to code that looked roughly like this.

var stats = new PerformatStats
{
    Size = largeData.Length
};
stats.OnCompletion += () => this.RecordCompletion(stats);

Write(stats, o =>
{
    var sp = new Stopwatch();
    foreach (var item in largeData)
    {
        sp.Restart();
        // do something with this
        stats.RecordOperationDuration(sp);
    }
});

On the surface, this looks good. We are only using largeData for a short while, right?

But behind the scene, something evil lurks. Here is what this actually is translated to by the compiler:

__DisplayClass3 cDisplayClass3_1 = new __DisplayClass3
{
    __this = this,
    largeData = largeData
};
cDisplayClass3_1.stats = new PerformatStats { Size = cDisplayClass3_1.largeData.Length };

cDisplayClass3_1.stats.OnCompletion += new Action(cDisplayClass3_1, __LambdaMethod1__);

Write(cDisplayClass3_1.stats, new Action(cDisplayClass3_1, __LambdaMethod2__));

You need to pay special attention to what is going on. We need to maintain the local state of the variables. So the compiler lift the local parameters into an object. (Called __DisplayClass3).

Creating spurious objects is something that we want to avoid, so the C# compiler says: “Oh, I’ve two lambdas in this call that need to get access to the local variables. Instead of creating two objects, I can create just a single one, and share it among both calls, thereby saving some space”.

Unfortunately for us, there is a slight issue here. The lifetime of the stats object is pretty long (we use it to report stats). But we also hold a reference to the completion delegate (we use that to report on stuff later on). Because the completion delegate holds the same lifted parameters object, and because that holds the large data object. It means that we ended up holding a lot of stuff in memory far beyond the time they were useful.

The annoying thing is that it was pretty hard to figure out, because we were looking at the source code, and the lambda that we know is likely to be long running doesn’t look like it is going to hold a referece to the largeData object.

Ouch.

Buffer Managers, production code and alternative implementations


We are porting RavenDB to Linux, and as such, we run into a lot of… interesting issues. Today we run into a really annoying one.

We make use of the BufferManager class inside RavenDB to reduce memory allocations. On the .Net side of things, everything works just fine, and we never really had any issues with it.

On the Mono side of things, we started getting all sort of weird errors. From ArgumentOutOfRangeException to NullReferenceException to just plain weird stuff. That was the time to dig in and look into what is going on.

On the .NET side of things, BufferManager implementation is based on a selection criteria between large (more than 85Kb) and small buffers. For large buffers, there is a single large pool that is shared among all the users of the pool. For small buffers, the BufferManager uses a pool per active thread as well as a global pool, etc. In fact, looking at the code we see that it is really nice, and a lot of effort has been made to harden it and make it work nicely for many scenarios.

The Mono implementation, on the other hand, decides to blithely discard the API contract by ignoring the maximum buffer pool size. It seems because “no user code is designed to cope with this”. Considering the fact that RavenDB is certainly dealing with that, I’m somewhat insulted, but it seems par the course for Linux, where “memory is infinite until we kill you”* is the way to go.

But what is far worse is that this class is absolutely not thread safe. That was a lot of fun to discover. Considering that this piece of code is pretty central for the entire WCF stack, I’m not really sure how that worked. We ended up writing our own BufferManager impl for Mono, to avoid those issues.

* Yes, somewhat bitter here, I’ll admit. The next post will discuss this in detail.

Long running async and memory fragmentation


We are working on performance a lot lately, but performance isn’t just an issue of how fast you can do something, it is also an issue of how many resources we use while doing that. One of the things we noticed was that we are using more memory than we would like to, and even after we were freeing the memory we were using. Digging into the memory usage, we found that the problem was that we were suffering from fragmentation inside the managed heap.

More to the point, this isn’t a large object heap fragmentation, but actually fragmentation in the standard heap. The underlying reason is that we are issuing a lot of outstanding async I/O requests, especially to serve things like the Changes() API, wait for incoming HTTP requests, etc.

Here is what this looks like inside dotProfiler.

image

As you can see, we are actually using almost no memory, but heap fragmentation is killing us in terms of memory usage.

Looking deeper, we see:

image

We suspect that the issue is that we have pinned instances that we sent to async I/O, and that match what we have found elsewhere about this issue, but we aren’t really sure how to deal with it.

Ideas are more than welcome.

.NET Packaging mess


In the past few years, we had:

  • .NET Full
  • .NET Micro
  • .NET Client Profile
  • .NET Silverlight
  • .NET Portable Class Library
  • .NET WinRT
  • Core CLR
  • Core CLR (Cloud Optimized)*
  • MessingWithYa CLR

* Can’t care enough to figure out if this is the same as the previous one or not.

In each of those cases, they offered similar, but not identical API and options. That is completely ignoring the versioning side of things ,where we have .NET 2.0 (1.0 finally died a while ago), .NET 3.5, .NET 4.0 and .NET 4.5. I don’t think that something can be done about versioning, but the packaging issue is painful.

Here is a small example why:

image

In each case, we need to subtly tweak the system to accommodate the new packaging option. This is pure additional cost to the system, with zero net benefit. Each time that we have to do that, we add a whole new dimension to the testing and support matrix, leaving aside the fact that the complexity of the solution is increasing.

I wouldn’t mind it so much, if it weren’t for the fact that a lot of those are effectively drive-bys, it feels. Silverlight took a lot of effort, and it is dead. WinRT took a lot of effort, and it is effectively dead.

This adds a real cost in time and effort, and it is hurting the platform as a whole.

Now users are running into issues with the Core CLR not supporting stuff that we use. So we need to rip out MEF from some of our code, and implement it ourselves just to get things in the same place as before.

FUTURE POSTS

  1. Paying the rent online - about one day from now

There are posts all the way to Aug 03, 2015

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats