Ayende @ Rahien

Unnatural acts on source code

When using the Task Parallel Library, Wait() is a BAD warning sign

Take a look at the following code:

public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IEnumerable<RdcNeed> needList)
{
    return Task.Factory.StartNew(() =>
    {
        foreach (var item in needList)
        {
            switch (item.BlockType)
            {
                case RdcNeedType.Source:
                    source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
                    break;
                case RdcNeedType.Seed:
                    seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
                    break;
                default:
                    throw new NotSupportedException();
            }
        }
    });
}

Do you see the problem in here?

It is a result of a code review comment about improper use of async in a project. This resulted in a lot of Task showing up in the return methods, but not in any measurable improvement in the actual codebase use of asynchronicity.

The problem is that when you need to work with such things in C# 4.0, you have to do some annoying things to get the code to work properly. In particular, this method was modified to be:

public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IList<RdcNeed> needList, int position = 0)
{
  if(position>= needList.Count)
  {
        return new CompletedTask();
  }
  var item = needList[position];
  Task task;
            
  switch (item.BlockType)
  {
        case RdcNeedType.Source:
            task = source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
            break;
        case RdcNeedType.Seed:
            task = seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
            break;
        default:
            throw new NotSupportedException();
  }

  return task.ContinueWith(resultTask =>
    {
        if (resultTask.Status == TaskStatus.Faulted)
            resultTask.Wait(); // throws
        return ParseAsync(source, seed, output, needList, position + 1);
    }).Unwrap();
}

This code is more complex, but it is actually making proper use of the TPL. We have changed the loop into a recursive function, so we can take advantage of ContinueWith to the next iteration of the loop.

And no, I can’t wait to get to C# 5.0 and have proper await work.

Security decisions: Separate Operations & Queries

The question came up several times in the mailing list with regards to how the RavenDB Authorization Bundle operates, and I think it serves a broader discussion.

Let us imagine a system where we have contracts, which may be in several states:

  • Mine – Contracts that an employee signed.
  • Done – Standard users can view, Lawyers assigned to the company can sign.
  • Draft – Lawyers can view / edit, Partners can approve.
  • Proposed – Lawyers can create / edit, but only the lawyer that created it can view it, Partners can accept.

So far, fairly simple, right? Except the pure hell that you are going to get into when you are trying to show the users all of the contracts that they can see, sorted by edit date and in the NDA category.

Why am I being so negative here? Well, let us look at what we are going to have to do in the most trivial of cases:

image

In this sort of system, we are going to have to show the user all of the contracts that they are allowed to see, and show them some indication what operations they can do on each.

The problem is that generating this sort of view is expensive. Especially when you have large amount of data to work through. More interesting, from a UX perspective, it also doesn’t really work that well. Most users would want a better separation of the things that they can do, probably something like this:

image

This allows us to do a first level filtering on the data itself, rather than try to apply security rules to it.

In the first case, we need to get all the contracts that we are allowed to see. The security rules above are really simple, mind. But trying to translate them into an efficient query is going to be pretty hard. Both in terms of the code requires and the cost to actually perform the query on the server. There are other things that are involved as well, such as paging and sorting in such an environment.  I have created several such systems in the past, Rhino Security is probably the most well known of them, and it gets really hard to optimize things and make sure that everything works when you start getting more complex security rules (especially when you have a user editable security system, which is a common request).

The second case is cheaper because we can limit the choices that we see in the query itself. We may still need to apply security concerns, but those goes through the query directly, rather than a security sub system. This kind of change usually force people to be more explicit in what they want, and it result in a system that tends to be simpler. The security rules aren’t just something arbitrary that can be defined, they are actually visible on the screen (My Contracts, Drafts, etc). Changing them isn’t something that is done on an administrator’s whim.

Yes, this is a way to manage the client and their expectations, but that is important. But what about the complex security that they want?

That might still be there, certainly, but that would be active mostly for operations (stuff that happen on a single entity), not on things that happen over all entities. It is drastically easier to make a single entity security decisions work efficiently than make it work over the whole set inside the database.

Hiding values, API keys and other fun stuff

This post is mostly about fun ideas. In one scenario, we had the need to show data to the user, but there was some concern with regards to the hackability of the URL.

In general, you should be handle such things within your code, checking permissions, etc. But I decided to see if I can do something nice with things, and I got this:

private static object HideValues(string entityId, string tenantId, byte[] key, byte[] iv)
{
    using (var rijndael = Rijndael.Create())
    {
        rijndael.Key = key;
        rijndael.IV = iv;
        var memoryStream = new MemoryStream();
        using (var cryptoStream = new CryptoStream(memoryStream, rijndael.CreateEncryptor(), CryptoStreamMode.Write))
        using (var binaryWriter = new BinaryWriter(cryptoStream))
        {
            binaryWriter.Write(entityId);
            binaryWriter.Write(tenantId);
            binaryWriter.Flush();

            cryptoStream.Flush();
        }
        var bytes = memoryStream.ToArray();
        var sb = new StringBuilder();
        for (int index = 0; index < bytes.Length; index++)
        {
            var b = bytes[index];
            sb.Append(b.ToString("X"));
            if (index % (bytes.Length/4) == 0 && index > 0)
                sb.Append('-');
        }
        return sb;
    }
}

This will generate a “guid looking” value that we can send to the user. When they send it back to us, we can decrypt it and figure out what is actually going on in there.

Because it is encrypted, we know that this is a valid key, because otherwise we wouldn’t be able to decrypt it to valid data.

Passing 15 and 32 as the first two values, I got the following value back: 2A8AC8888-46B92092-BFD81393-7A6FB1

And it handle larger values as easily, of course. Quite fun, even if I say so myself. Not sure if this is useful, but I got into writing code because it is a great hobby.

Tags:

Published at

Originally posted at

Comments (15)

The economics of continuous deployment

One of the things that I did, almost by accident, when we started Hibernating Rhinos was to create a CI server and a public daily build server. And every single successful build ended up in customer hands. That was awesome in many respects, it removed a lot of the “we have got to make a new release” pressure, because we were making new releases, sometimes multiple times a day.

When we started with RavenDB, it was obvious to me that this was what we were going to do with it as well, because the advantages to this approach as so clear. With RavenDB, we needed a two stage system, but still, every single build gets to the customer hands.

Awesome, great, outstanding, exceptional and other such synonyms. As long as you look at this from one angle, the one in which we are only concerned about the technical challenges of delivering software .The problem is that there are additional things to note here. Economic challenges.

Let us take the profiler as a good example. It was released in beta on the Jan 1, 2009, and since then we had 920 separate builds, adding a ton of new features, capabilities, improving performance, making things smoother and in general making it a better product.

That is over 3 years without a major release, mostly because we never had the need to do this, we kept delivering software on a day to day basis.

During that time, we delivered features such as viewing the result set, checking the query plan of a query (in all major databases), exporting the entire session to HTML so you can send it to your DBA, CI integration and so much more. It has been wonderful.

Except… this has one implications that I didn’t think of at the time. If you bought NH Prof on the 1st Jan, 2009 you got 3 years of product updates, for no additional costs. And unless we create a new major version, you can keep using the software, including all the updates and improvements, without paying.

That is great for the very early customers, but not so good for the people who need to eat so they can work on the profiler. Let us think about the implications of this a bit more, okay?

In order for us to actually make money, we have to:

  • keep expanding our one-off customer base, which is going to hit a limit at some point.
  • create a new version, getting the old customer to purchase the updates.

Seems simple, right? This is what most companies do, and how most software is sold. You get a license for version 1 and you buy a license for version 2.

So far, so good. But let us consider the implications of that. In order to get the old users to buy the new one, I have to put some really nice stuff in the next version. Which means that I have to do a lot of “secret” development because I can’t just release it on our usual continuous deployment mode. That sucks. And it also means that features that are already coded are actually disabled because we defer them to the next version.

So, the next version of the profilers is going to have to have some interesting features to get people to buy it. One of them is production profiling. It has actually been around for quite a while. It has simply been #ifdef’ed out of the product, because it is something that we keep for the next version.

I just checked, and I was acutely surprised by what I found. The initial work for production profiling was done in Jan 2010, it is working since then. I got side tracked with RavenDB so I never had the chance to actually complete the rest of the features for 2.x and release them all.

In mid 2010 we started experimenting with subscriptions. Instead of having a one time payment model, we moved to a pay as you go. So as long as you were using the profiler, you were paying for it, and in return, we provided all of those new features.

I have been thinking about this a lot lately. I strongly lean toward making the next version of the profiler (coming soon, and it will have a bunch of nice features) subscription only.

My current thinking it to allow two modes of buying the product. Monthly / yearly subscription and a one time fee that give you 18 months of usage (and doesn’t re-charge). That would allow us to keep producing software in incremental steps, without having to go away for a while and work in secret on big ticket features just so we can have enough stuff to put on “why you should buy 2.x” list.

I would appreciate any feedback that you may have.

Tags:

Published at

Originally posted at

Comments (54)

Architecture > Code

Steve Py asks an interesting question in one of the comments to my On Infinite Scalability post:

Can you elaborate more on: "Note, those changes are not changes to the code, they are architectural and system changes. Where before you had a single database, now you have many. Where before you could use ACID, now you have to use BASE. You need to push a lot more tasks to the background, the user interaction changes, etc."

When you talk about jumping from 1 server to multiple servers, ACID to BASE, and how user interaction changes, how do you quantify that this is done without code changes?

The answer to that is that there is a mistaken assumption here. Changing the architecture is going to change the code. But usually that is rarely relevant, because changing the architecture is a big change. If you are moving from a single DB to multiple database, for example, there are going to be code changes, but that isn’t what you worry about. The major change is the architecture differences (how do you split the data, how do you do reporting, can some of the dbs be down, etc).

Moving from ACID to BASE is an even greater change. The code might change a little or change drastically, but that isn’t where a lot of the effort is. Just defining the new system behavior on those scenarios is going to be much more complex. For example, taking something as simple as “user names are unique” would move from being a unique constraint in the database to something that needs to be able to handle those sort of things in a reasonable fashion.

Depending on your original architecture, it might be anything from replacing a single service implementation to re-writing significant parts of the code.

Expanding your horizons: Actions

In theory, there is no difference between theory and real life.

In my previous blog post, I discussed my belief that the best value you get from learning is learning the very basic of how our machines operate. From learning about memory management in operating systems to the details of how network protocols like TCP/IP work.

Some of that has got to be theoretical study, actually reading about how those things work, but theory isn’t enough. I don’t care if you know the TCP specs by heart, if you haven’t actually built a real system with it, and experienced the pain points, it isn’t really meaningful. The best way to learn, at least from my own experiences, is to actually do something.

Because that teaches you several very interesting things:

  • What are the differences between the spec and what is actually implemented?
  • How to resolve common (and not so common problems)?

The later is probably the most important thing. I think that I learned most of what I know about HTTP in the process of building an RSS feed reader. I learned a lot about TCP from implementing a proxy system, and I did a lot of learning from a series of failed projects regarding distributed programming in general.

I learned a lot about file systems and how to work with file based storage from Practical File System Design and from building Rhino Queues and Rhino DHT. In retrospect, I did a lot of very different projects in various areas and technologies.

The best way that I know to get better is to do, to fail, and to learn from what didn’t work. I don’t know of any shortcuts, although I am familiar with plenty of ways of making the road much longer (and usually not very pleasant).

In short, if you want to get better, pick something that you don’t know how to do, and then do it. You might fail, you likely will, but you’ll learn a lot from failing.

I keep drawing a blank when people ask me to suggest options for things to try building, so I thought that I would ask the readers of this blog. What sort of things do you think would be useful to build? Things that would push most people out of their comfort zone and make them learn the fundamentals of how things work.

Tags:

Published at

Originally posted at

Comments (13)

Expanding your horizons

One of the questions that I routinely get asked is “how do you learn”. And the answer that I keep giving is that I had accidently started learning things from the basic building blocks. I still count a C/C++ course that I took over a decade ago as one of the chief reasons why I have a good grounding in how computers actually operate. During that course, we had to do anything from building parts of the C standard library on our own to construct much of the foundation of C++ features in plain C.

That gave me enough understanding of how things are actually implemented to be able to grasp how things are behaving elsewhere. Digging deep into the implementation is almost never a wasted effort. And if you can’t peel away the layer of abstractions, you can’t really say that you know what you are doing.

For example, I count myself ignorant in all manners about WCF, but I have full confidence that I can build a system using it. Not because I understand WCF itself, but because I understand the arena in which it plays. I don’t need to really understand how a certain technology works, if I already know what are the rules it has to play with.

Picking on WCF again, if you don’t know firewalls and routers, you can’t really build a WCF system, regardless of how good your memory is about the myriad ways of configuring WCF to do you will. If you can’t use WireShark to figure out why the system is slow to respond to requests, it doesn’t matter if you can compose an WCF envelope message literally on the back of a real world envelope.  If you don’t grok the Fallacies of Distributes Computing, you shouldn’t be trying to build a real system where WCF is used, regardless of whatever certificate you have from Microsoft.

The interesting bit is that for most of what we do, the rules are fairly consistent. We all have to play in Turing’s sand box, after all.

What this means is that learning the details of IP and TCP will be worth it over and over again. Understanding things like memory fetch latencies would be relevant in 5 years and in ten. Knowing what actually goes on in the system, even if it at a somewhat abstracted level is important. That is what make you the master of the system, instead of its slave.

Some of the things that I especially value, and that is of the top of my head and isn’t a closed list are:

  • TCP / UDP – how do they actually work.
  • HTTP – and implications (for example, state management).
  • The Fallacies of Distributed Computing.
  • Disk based storage – efficiently working with it, how file system works.
  • Memory management in OS and your environment.

Obviously, this is a very short list, and again, it isn’t comprehensive.  It is just meant to give you some indications for things that I have found to be useful over and over and over again.

That kind of knowledge isn’t something that is replaced often, and it will help you understand how anyone else has to interact with the same constraints. In fact, it often allows you to accurately guess how they solve a certain problem, because you are aware of the same alternatives that the other side had to solve.

In short, if you seek to be a better developer, dig deep and learn the real basic building blocks for our profession.

In my next post, I’ll discuss strategies for doing that.

Tags:

Published at

Originally posted at

Comments (25)

Frictionless development: Web.config and connection strings

This is something that I actually run into a lot at customer sites. They have a lot of friction during development between different connection strings that developers use during development. For example, we may have one developer using:

<add name="RavenDB" connectionString="Url=http://localhost:8080" />

While another is using:

<add name="RavenDB" connectionString="Url=http://localhost:8191;Database=TheApp" />

This usually causes a hell of a lot of trouble in most teams (or maybe you have developers that use SQL Express, and some who installed SQL Development, etc).

That is friction, and you want to deal with that as soon as possible. The easiest thing to do is actually:

<add name="Ayende-PC" connectionString="Url=http://localhost:8080" />
<add name="Ayende-Laptop" connectionString="Url=http://localhost:8191;Database=TheApp" />

This works, because the connection string name is now the machine name (System.Environment.MachineName). That is a great first step, because it means that you can get things done without fighting over the connection string in the web.config.

Another alternative is to have a default connection string, and allow to “override” it with the specification of a connection string specific for the machine.

It is a small thing, but it actually helps quite a lot. You can extend this to other settings as well. For apps that have a lot of settings, I usually take them out of the Web.config into a Default.config file, and the configuration reader is set to look for [MachineName].config first, and only then at the Default.config file.

Tags:

Published at

Originally posted at

Comments (30)

Negative hiring decisions, Part II

Another case of a candidate completing a task at home and sending it to me which resulted in a negative hiring decision is this:

protected void Button1_Click(object sender, EventArgs e)
{
    string connectionString = @"Data Source=OFFICE7-PC\SQLEXPRESS;Integrated Security=True";
    string sqlQuery = "Select UserName From  [Users].[dbo].[UsersInfo] Where UserName = ' " + TextBox1.Text + "' and Password = ' " + TextBox2.Text+"'";
    
    using (SqlConnection connection = new SqlConnection(connectionString))
    {
        SqlCommand command = new SqlCommand(sqlQuery, connection);
        connection.Open();
        SqlDataReader reader = command.ExecuteReader();
        try
        {
            while (reader.Read())
            {
           
            }
        }
        finally
        {
            if (reader.HasRows)
            {
                reader.Close();
                Response.Redirect(string.Format("WebForm2.aspx?UserName={0}&Password={1}", TextBox1.Text, TextBox2.Text));
            }
            else
            {
                reader.Close();
                Label1.Text = "Wrong user or password";
            }
        }
    }
}

The straw that really broke the camel’s back in this case was the naming of WebForm2. I could sort of figure out the rest, but not bothering to give a real name to the page was over the top.

Full Throttle at TekPub

Rob Conery pinged me a few weeks back, wanting to do another TekPub episode. So I show up, very early in the morning, and he starts throwing curve balls at me:

The website is down and we aren’t accepting orders, DO SOMETHING ABOUT THIS NOW!

And that was in the first minute or so!

Wait a minute! I thought we were doing a show about…

Oh, this is the Full Throttle episode…

Basically, Rob is just piling requirements and I am trying to play catch up. It is just over an hour, but I think it should be interesting to watch.  I really liked Rob’s post about the show. But you can also skip this and go directly to the actual show.

Tags:

Published at

Originally posted at

Comments (5)

Hiring Questions–The phone book–responding to commentary

Wow, this post got a lot more attention than I thought it would. Most of it was along the same lines, so I’ll answer it here.

Anyone suggesting, SQLite, Excel, Access, Esent, Embedded RavenDB, Munin, Embedded FireBird, MS SQL CE, DB4O or anything like it – that isn’t the purpose of the question. I am not trying to figure out if the candidate knows about embedded databases.

Performance isn’t much of a concern, we expect up to several thousands entries per phonebook, but not beyond that, and speed should be in the human response time range (hundreds of milliseconds).

I rejected JSON / XML file formats because I wanted to make the task harder than just using the built-in Linq API and serializing to a file. 

Out of this question, I want actual code that I can try out, not just some high level design. I estimate that if you know what you are doing, this should take less than half an hour. At high school, I think it took about two hours, and that was in unmanaged land.

Some people questioned what is the purpose of this question, under what scenarios is it valid, etc.

Put simply, it is valid because it tests a wide range of topics in a candidate abilities. I don’t feel that I need to go into world building to setup a scenario for an interview question.

Mike McG put it beautifully:

He spells out the requirements for a basic database engine with indexing. It's a multifaceted problem that can expose a lot about a candidate in their solution. Are concerns separated logically? How is performance addressed against disk I/O? How is code correctness validated (e.g. testing)? To submit a project that just wraps another database is disingenuous. He's looking for people that can solve problems head-on, not just pass the buck.

Exactly. More to the point, it forces a candidate to actually do a fairly complex task that still can be done in a short amount of time. It shows me how they think, whatever they have any idea how computers actually work. If you can’t complete this task, you don’t understand basic file IO. That means that you might be a great front end developer, but I need someone who can do more than that.

Tags:

Published at

Originally posted at

Comments (16)

Hiring Questions–The phone book

One of the things that we ask some of our interviewees is to give us a project that would answer the following:

We need a reusable library to manage phone books for users. User interface is not required, but we do need an API to create, delete and edit phone book entries. An entry contains a Name (first and last), type (Work, Cellphone or Home) and number. Multiple entries under the same name are allowed. The persistence format of the phone book library is a file, and text based format such as XML or Json has been ruled out.

In addition to creating / editing / deleting, the library also need to support iterating over the list in alphabetical order or by the first or last name of each entry.

The fun part with this question is that it is testing so many things at the same time, it gives me a lot of details about the kind of candidate that I have in front of me. From their actual ability to solve a non trivial problem, the way they design and organize code, the way they can understand and implement a set of requirements, etc.

The actual problem is something that I remember doing as an exercise during high school (in Pascal, IIRC).

Tags:

Published at

Originally posted at

Comments (32)

Pet Projects and Hiring Decisions

Wow! The last time I had a post that roused that sort of reaction, I was talking about politics and religion. Let us see if I can clarify my thinking and also answer many of the people who commented on the previous post.

One of the core values that I look for when getting new people is their passion for the profession. Put simply, I think about what I do as a hobby that I actually get paid for, and I am looking for people who have the same mentality as I do. There are many ways to actually check for that. We recently made an offer to a candidate simply because her eyes lighted up when she spoke about how she build games to teach kids how to learn Math.

There were a lot of fairly common issues with my approach.

Indignation – how dare you expect me to put my own personal time into something that looks like work. That also came with a lot of explanation about family, kids and references to me being a slave driver bastard.

Put simply, if you can’t be bothered to improve your own skills, I don’t want you.

Hibernating Rhinos absolutely believes that you need to invest in your developers, and I strongly believe that we are doing that. It starts from basic policies like if you want a tech book, we’ll buy it for you. Sending developers to courses and conferences, sitting down with people and making sure that they are thinking about their career properly. And a whole lot of other things aside.

Personally, my goal is to keep everyone who works for us right now for at least 5 years, and hopefully longer. And I think that the best way of doing that is to ensure that developers are appreciated, they have a place to grow in professionally and are having to deal with fun, complex and interesting problems for significant parts of their time at work. This does not remove any obligations on your part to maintain your own skills.

imageIn the same sense that I would expect a football player to be playing on a regular basis even in the off session, in the same sense that I wouldn’t visit a doctor who doesn’t spend time getting updated on what is changing in his field, in the same sense that I wouldn’t trust a book critic that doesn’t read for fun – I don’t believe that you can abjurate your own responsibility to keeping yourself aware of what is actually going on out there.

And I am sorry, I don’t care if you read a blog or two or ten. If you want to actually learn new stuff in development, you actually have to sit down and write some code. Anything that isn’t code isn’t really meaningful in our profession. And it is far too easy to talk the talk without being able to walk the walk. My company have absolutely no intention of doing anything with Node.js in the future, I still wrote some code in Node.js, just to be able to see how it feels like to actually do that. I still spend time writing code that is never going to reach production or be in any way useful, just for me to see if I can do something.

If you are a developer, your output is code, and that is what prospective employees will look for. From my perspective, it is like hiring a photographer without looking at any of their pictures. Like getting a cook without tasting anything that he made.

And yes, it is your professional responsibility to make sure that you are hirable. That means that you keep your skills up to date and that you have something to show to someone that will give them some idea about what you are doing.

Time – I can’t find none.

There are 168 hours in a week, if you can put 4 – 6 hours a week to hone your own skills, to try things, to just explore… well, that probably indicate something about your priorities. I would like to hire people who think about what they do as a hobby. I usually work 8 – 10 hours days, 5 – 6 days a week. I am married, we’ve got two dogs, I usually read / watch TV for at least 1 – 3 hours a day.

I have been at the Work Work Work Work All The Time Work Work Work And Some More Work parade. I got the T Shirt, I got the scary Burn Out Episode. I fully understand that working all the time is a Bad Idea. It is Bad Idea for you, it is Bad Idea for the company. This isn’t what I am talking about here.

Think about the children – I have kids, I can’t spend any time out of work doing this.

That one showed up a lot, actually. I am thinking about the children. I think it is absolutely irresponsible for someone with kids not to make damn sure that he is hirable. I am not talking about spending 8 hours at the office, 8 hours doing pet projects and 1.5 minutes with your children (while you got some code compiling). And updating your skills and maintaining a portfolio of projects is something that I think is certainly part of that.

I read professionally, but I don’t code  - this is a variation on all of the other excuses, basically. Here is a direct quote: “I often find that well written blog entry/article will provide more education that can be picked up in a few minutes reading than several hours coding. And I can do that in my lunch break.”

That is nice, I also like to read a lot of Science Fiction, but I am a terrible writer. If you don’t actually practice, you aren’t any good. Sure, reading will teach you the high level concepts, but it doesn’t teach you how to apply them. You can read about WCF all day long, but it doesn’t teach you how to handle binding errors. Actually doing things will teach you that. You need actual practice to become good at something. In theory, there is no difference between reality and theory, and all of that.

I legally can’t - You signed a contract that said that you can’t do any pet projects, or that all of your work is owned by the company.

I sure do hope that you are well compensated for that, because it is going to make it harder for you to get hired.

You have a life – therefor you can’t spend time on pet projects.

So do I, I managed. If you can’t, I probably don’t want you.

Wrong this to do - I shouldn’t want someone who is on Stack Overflow all the time, or will spend work time on pet projects.

This is usually from someone who think that the only thing that I care about is lines of code committed. If someone is on Stack Overflow a lot, or reading a lot of blogs, or writing a lot of blogs. That is awesome, as long as they manages to complete their tasks I a reasonable amount of time. I routinely write blog posts during work. It helps me think, it clarify my thinking, and it usually gets me a lot of feedback on my ideas. That is a net benefit for me and for the company.

Some people hit a problem and may spin on that for hours, and VS will be the active window for all of that time. That isn’t a good thing!  Others will go and answer totally unrelated questions on Stack Overflow while they are thinking on the problem, they come back to VS and resolve the problem in 2 minutes. As long as they manage to do the work, I don’t really care. In fact, having them in Stack Overflow probably means that answers about our products will be answered faster.

As for working on their own personal projects during work. The only thing that you need to do is somehow tie it to actual work. For example, that pet project may be converted to be a sample application for our products. Or it can be a library that we will use, or any of a number of options that you can use to make sure things interesting.

You should note as well that I am speaking here from our requirements from the candidate, not from what I consider to be our responsibilities toward our employees, I’ll talk about those in another post in more detail.

Then there was this guy, who actively offended me.

The author is a selfish ego maniac who only cares about himself. As an employer, you can choose to consume a resource (employee), get all you can out of it, then discard it. Doing so adds to the evil and bad in the world.

This is spoken like someone who never actually had to recruit someone, or actually pay to recruit someone. It costs a freaking boatload of money and take a freaking huge amount of time to actually find someone that you want to hire. Treating employees as disposable resources is about as stupid as you can get, because we aren’t talking about getting someone that can flip burgers at minimum wage here.

We are talking about 3 – 6 months training period just to get to the point where you can get good results out of a new guy. We are talking about 1 – 3 months of actually looking for the right person before that. I consider employees to be a valuable resource, something that I actively need to encourage, protect and grow. Absolutely the last thing that I want is to try to have a chain of disposable same-as-the-last-one developers in my company.

I have kicked people out of the office with instructions to go home and rest, because I would like to have them available tomorrow and the next day and month and year. Doing anything else is short sighted, morally repugnant and stupid.

If you don’t have pet projects, I don’t think I want you

I am busy hiring people now, and it got me thinking a lot about the sort of things that I want from my developers. In particular, I was inundated in CVs, and I used the following standard reply to help me narrow things down.

Thank you for your CV. Do you have any projects that you wrote that I can review? Have you done any OSS work that I can look at?

The replies are fairly interesting. In particular, I had a somewhat unpleasant exchange with one respondent. In reply for my question, the reply was:

My employer doesn’t allow any sharing of code. I can find some old projects that I did a while ago and send them to you, I guess.

Obviously, I don’t want to read any code that belong to someone without that someone’s explicit authorization. Someone sending me their current company code is about as bad manner as someone setting up an invite for a job interview on their work calendar (the later actually happened today).

After getting the projects and looking them over a bit, I replied that I don’t think this would be the appropriate position for this respondent. I got the following reply:

Wait a minute…

Can I know why? I took the trouble to send you stuff that I have done, maybe not the highest quality and caliber, but what I could send right now. You didn’t even interview me.

How exactly did you reach the unequivocal conclusion that I am not a good fit for this job?

My response to that was:

Put simply, we are looking for a .NET developer and one of the most important things that we look for is passion. In general, we have found that people that care and are interested in what they are doing tend to do other stuff rather than just their work assignments.

In other words, they have their own pet projects, it can be a personal site, a project for a friend, or just some code written to get familiar with some technology.

When you tell me that your only projects outside of work are 5+ years old, that is a bad indication for us.

There is more, but it gets to the details and not really relevant for this discussion.

Let me try to preempt the nitpickers. Not having pet projects doesn’t mean that you are a bad developer, nor vice versa.

But I don’t really care about experience, and assuming that you already know the syntax and has some basic knowledge in the framework, we can use you. But the one thing that I learned you can’t give people is the passion for the field. And that is critical. Not only because it means that they are either already good or going to be good (it is pretty hard to be passionate about something that you sucks at), but because it means that they care.

And if they care, it means two very important things:

  • The culture of the company is about caring for doing the appropriate thing.
  • The end result is going to be as awesome as we can get.

Now, if you’ll excuse me, I am going to check out SignalR, because I don’t feel like doing any more RavenDB work today.

Tags:

Published at

Originally posted at

Comments (170)

Node.cs

No, the title is not a typo. There is so much noise around Node.js, I thought it would be fun to make a sample of how it would work in C# using the TPL. Here is how the hello world sample would look like:

public class HelloHandler : AbstractAsyncHandler
{
    protected override Task ProcessRequestAsync(HttpContext context)
    {
        context.Response.ContentType = "text/plain";
        return context.Response.Output.WriteAsync("Hello World!");
    }
}

And the code to make this happen:

public abstract class AbstractAsyncHandler : IHttpAsyncHandler
{
    protected abstract Task ProcessRequestAsync(HttpContext context);

    private Task ProcessRequestAsync(HttpContext context, AsyncCallback cb)
    {
        return ProcessRequestAsync(context)
            .ContinueWith(task => cb(task));
    }

    public void ProcessRequest(HttpContext context)
    {
        ProcessRequestAsync(context).Wait();
    }

    public bool IsReusable
    {
        get { return true; }
    }

    public IAsyncResult BeginProcessRequest(HttpContext context, AsyncCallback cb, object extraData)
    {
        return ProcessRequestAsync(context, cb);
    }

    public void EndProcessRequest(IAsyncResult result)
    {
        if (result == null)
            return;
        ((Task)result).Dispose();
    }
}

And you are pretty much done. I combined this with a HttpHandlerFactory which does the routing, and you get fully async, and quite beautiful code.

Tags:

Published at

Originally posted at

Comments (38)

The evil tricks of “It Works On My Machine”, in reverse

The following is quite annoying. I am trying to make an authenticated request to a web server. In this instance, we are talking about communication from one IIS application to another IIS application (the second application is RavenDB hosted inside IIS).

The part that drives me CRAZY is that I am getting this error when I am trying to make an authenticated request, but using the exact same code and credentials, from another machine, I can make this work just fine.

image

It took me a while to figure out that most important part, when running locally, I was running under my own account, when running remotely, the application run under IIS account. The really annoying part was that even when running on IIS locally, it still worked locally and failed remotely.

It took me even longer to figure out that the local IIS was configure to run under my own account as well Sad smile

Once that discovery was made, I was able to figure out what is wrong and implement a work around (run the remote IIS site under a custom user name). I know that the actual problem is something relating to permissions on certificate store, but I have no idea how, or, for that matter, which certificate?

This is plain old HTTP auth, no SSL, client certs, etc. I am assuming that this is raised because we are using Windows Auth, but I am not sure what is going on here with regards to which certificate should I grant to which user, or even how to do this.

Any ideas?

Generating API Documentation

I want to generate API documentation for Raven DB. I decided to take the plunge and generate XML documentation, and to ensure that I’ll be good, I marked the “Warnings as Errors” checkbox.

864 compiler errors later (I spiked this with a single project), I had a compiling build and a pretty good initial XML documentation. The next step was to decide how to go from simple XML documentation to a human readable documentation.

Almost by default, I checked Sandcastle, unfortunately, I couldn’t figure out how to get this working at all. Sandcastle Help File Builder seems the way to go, though. And using that GUI, I had a working project in a minute or two. There are a lot of options, but I just went with the default ones and hit build. While it was building, I had the following thoughts:

  • I didn’t like that it required installation.
  • I didn’t know whatever you can easily put it in a build script.

The problem was that generating the documentation (a CHM) took over three and a half minutes.

I then tried Docu. It didn’t work, saying that my assembly was in the wrong format. Took me a while to figure out exactly why, but eventually it dawned on me that Raven is a 4.0 project, while Docu is a 3.5. I tried recompiling it for 4.0, but it gave me an error there as well, something about duplicate mscorlib, at which point I decided to drop it. That was also because I didn’t really like the format of the API it generated, but that is beside the point.

I then tried doxygen, which I have used in the past. The problem here is the sheer number of options you have. Luckily, it is very simple to setup using Doxywizard, and the generated documentation looks pretty nice as well. It also take a while, but significantly faster than Sandcastle.

Next on the list was MonoDoc, where I could generate the initial set of XML files, but was unable to run the actual mono doc browser. I am not quite sure why that happened, but it kept complaining that the result was too large.

I also checked VSDocman, which is pretty nice.

All in all, I think that I’ll go with doxygen, it being the simplest by far and generating (OOTB) HTML format that I really like.

Being a product owner

A while ago I have come to the realization that it is impossible for me to everything alone, so I got other people to help me build my projects. In some cases, that was done using OSS, by soliciting contribution from the community. In others, it is a simple commercial transaction, where I give someone money for code.

I think that I have gotten too used to the OSS model, because I got the following reply for this:

image

That was somewhat of a rude shock, I was using the same loose language that I always hated when I got specs to implement.

Those are all really good questions.

The Law of Conservation of Tradeoffs

Every so often I see a group of people or a company come up with a new Thing. That new Thing is supposed to solve a set of problems. The common set of problems that people keep trying to solve are:

  • Data access with relational databases
  • Building applications without needing developers

And every single time that I see this, I know that there is going to be a catch involved. For the most part, I can usually even tell you what the catches involved going to be.

It isn’t because I am smart, and it is certain that I am not omnificent. It is an issue  of knowing the problem set that is being set out to solve.

If we will take data access as a good example, there aren’t that many ways tat you can approach it, when all is said and done. There is a set of competing tradeoffs that you have to make. Simplicity vs. usability would probably be the best way to describe it. For example, you can create a very simple data access layer, but you’ll give up on doing automatic change tracking. If you want change tracking, then you need to have Identity Map (even data sets had that, in the sense that every row represented a single row :-) )

When I need to evaluate a new data access tool, I don’t really need to go too deeply into how it does things, all I need to do is to look at the set of tradeoffs that this tool made. Because you have to make those tradeoffs, and because I know the play field, it is very easy for me to tell what is actually going on.

It is pretty much the same thing when we start talking about the options for building applications without developers (a dream that the industry had chased for the last 30 – 40 years or so, unsuccessfully). The problem isn’t in lack of trying, the amount of resources that were invested in the matter are staggering. But again you come into the realm of tradeoffs.

The best that a system for non developers can give you is CRUD. Which is important, certainly, but for developers, CRUD is mostly a solved problem. If we want plain CRUD screens, we can utilize a whole host of tools and approaches to do them, but beyond the simplest departmental apps, the parts of the application that really matter aren’t really CRUD. For one application, the major point was being able to assign people to their proper slot, a task with significant algorithmic complexity. In another, it was fine tuning the user experience so they would have a seamless journey into the annals of the organization decision making processes.

And here we get to the same tradeoffs that you have to makes. Developer friendly CRUD system exists in abundance, ASP.Net MVC support for Editor.For(model) is one such example. And they are developer friendly because they give you he bare bones of functionality you need, allow you to define broad swaths.  of the application in general terms, but allow you to fine tune the system easily where you need it. They are also totally incomprehensible if you aren’t a developer.

A system that is aimed at paradevelopers focus a lot more of visual tooling to aid the paradeveloper achieve their goal. The problem is that in order to do that, we give up the ability to do things in broad strokes, and have to pretty much do anything from scratch for everything that we do. That is acceptable for a paradeveloper, without the concepts of reuse and DRY, but those same features that make it so good for a paradeveloper would be a thorn in a developer’s side. Because they would mean having to do the same thing over & over & over again.

Tradeoffs, remember?

And you can’t really create a system that satisfy both. Oh, you can try, but you are going to fail. And you are going to fail because the requirement set of a developer and the requirement set of a paradeveloper are so different as to be totally opposed to one another. For example, one of the things that developers absolutely require is good version control support. And by good version control support i mean that you can diff between two versions of the application and get a meaningful result from the diff.

A system for paradeveloper, however, is going to be so choke full of metadata describing what is going on that even if the metadata is in a format that is possible to diff (and all too often it is located in some database, in a format that make it utterly impossible to work with using source control tools).

Paradeveloper systems encourage you to write what amounts to Bottun1_Click handlers, if they give you even that. Because the paradevelopers that they are meant for have no notion about things like architecture. The problem with that approach when developers do that is that it is obviously one that is unmaintainable.

And so on, and so on.

Whenever I see a new system cropping up in a field that I am familiar with, I evaluate it based on the tradeoffs that it must have made. And that is why I tend to be suspicious of the claims made about the new tool around the block, whatever that tool is at any given week.

Horror stories from the trenches: My very first project

My first day at work, and I am but a young whippersnapper. I arrived, did the usual meet & greet with everybody (and promptly forgot their names), then I was shown to a desk, given a laptop and one of my new coworkers rummaged in a desk drawer for a while before throwing me a document. Well, I say a document, what I mean is a book.

It was couple hundred pages, and it was titled: Specs for the Ergaster module in the Heidelbergensis System. When asked what I was supposed to do with it, I was given a very simple answer: Implement.

I later found on some very interesting aspects on the spec:

  • The document represented over a year and a half of work.
  • The module was being developed for a client of our client.
  • The module was meant to be both specific for the sub client needs and at the same time generic enough to used by other clients of our clients.
  • The system that we were supposed to be a module in was currently under development and wasn’t in a position to be used by us.

Can you say image

Well, I spent close to two years working on this project. I learned a lot, from how to proof myself from upstream teams to how not to design a system. I poured a whole lot of work and effort into this.

Along the way, I also learned something very interesting about the spec. That was the first time that I actually had to reverse engineer a specification to understand what the customer actually wanted.

Two years after we handed off the module to the client, I had the chance to go to lunch with the team leader from that client, so I asked him about the module. He informed me that they still haven’t deployed to production.

I was depressed for a week afterward.

Pricing software

It seems that people make some rather interesting assumptions regarding software pricing. When talking about commercial products, there is a wide variety of strategies that you might want to push, depending on what exactly you are trying to achieve.

You may be trying to have a loss leader product, gain reputation or entry to a new market, in which case your goals are skewed toward optimizing other things. What I want to talk about today is pricing software with regards to maximizing profits.

Probably the first thing that you need to do consider the base price of the product. That is important for several levels. For the obvious reason that this is what you will make out of it, but also because the way you price your product impacts its Perception of Value.

If you try to sell your product at a significant cost below what the market expect it to be, the perception is that it is crap, and you are trying to get rid of it.

Another issue that you have to face is that it is very hard to change the base price once you settled on it. It is fairly obvious why you have a hard time jacking the price upward, but pushing the price down is also something to approach with trepidation. It has a very negative impact on the product image. Now, you can effectively price it at a lower rate very easily, but doing things like offers, discounts, etc. Those do not affect the base price and so generally do not affect how the product is perceived.

And just as bad as with setting the price significantly below market expectations, setting the price significantly higher than market expectations. The problem here is that you come out as a loony with delusions of grandeur. Even if your product is wonderful, and even if it will return its investment in a single week, pricing it too high generate a  strong rejection reaction. It also means that you need to actively combat that perception, and that takes a lot of time & effort.

Finally, in the range between too little and too much, you have a wide room to play. And that is where the usual supply & demand metrics come into play. Since supply isn’t a problem with software, what we are talking about is demand vs. price. Everyone in imageeconomics is familiar with the graph, I assume.

As price goes up, demands goes down, and vice versa. So the question is what is the optimum price point that would generate the most revenue. Here are some numbers, showing the probable correlation between price and sales.

Price Users $
$ 2,000.00 1.00 $ 2,000.00
$ 1,000.00 1.00 $ 1,000.00
$ 200.00 100.00 $ 20,000.00
$ 100.00 200.00 $ 20,000.00
$ 10.00 5000.00 $ 50,000.00

It might be easier to look at in its graphical form to the right.

You can see that when the price point is too high, very few people will buy it, if at all. When the price point is in the range that the market expects, we make some money, drastically more than we would when the price was above market expectations.

However, if we set the price way down, we get a lot more users, and a lot more money pouring in. We will ignore the question of damage to the reputation for now.

This seems to indicate that by lowering the price we can make a lot more money, right? Except that it isn’t really the case. Let us see why.

1 in 10 users has a question, requires support, etc. Let us say that each question cost 100$, and now let us look at the numbers.

Price Users Calls Support Cost Income Revenue
$ 2,000.00 1.00 0.00 $ - $ 2,000.00 $ 2,000.00
$ 1,000.00 1.00 0.00 $ - $ 1,000.00 $ 1,000.00
$ 200.00 100.00 10.00 $ 1,000.00 $ 20,000.00 $ 19,000.00
$ 100.00 200.00 20.00 $ 2,000.00 $ 20,000.00 $ 18,000.00
$ 10.00 5000.00 500.00 $ 50,000.00 $ 50,000.00 $ -

And that tells us a very different story indeed. The end goal is increasing profit, not just the amount of money coming in, after all.

Oh, and a final thought, the following coupon code EXR-45K2D462FD is my pricing experiment, it allows the first 10 users to buy NH Prof, L2S Prof or EF Prof at a 50% discount.

Since all the coupon codes run out in a short order, I am going to continue the experiment, this coupon code ER2-45K2D462FJ will give the first 25 users to buy NH Prof, L2S Prof or EF Prof at a 25% discount.

The NIH dance

I started thinking about all the type of stuff that I had to write or participated at, and I find it… interesting.

  1. Database – Rhino.DivanDB (hobby project).
  2. Data Access – Multitude of those.
  3. OR/M – NHibernate, obviously, but a few others as well.
  4. Distributed caching systems – NMemcached – several of those.
  5. Distributed queuing systems – Rhino Queues actually have ~7 different implementations.
  6. Distributed hash table – Rhino DHT is in production.
  7. Persistent hash tables – Rhino PHT, of course, but I actually had to write a different implementation for the profiler as well.
  8. Mocking framework – Rhino Mocks, obviously.
  9. Web frameworks– I am referring to MonoRail, although I only dabbled there, to be truthful. Rhino Igloo was a lot of fun, too, if only because I had to.
  10. Text templating language – Brail
  11. Inversion of Control containers – Windsor, and a few custom ones.
  12. AOP – I actually built several implementation, the most fun was with the Code DOM approach :-)
  13. Dynamic Proxies & IL weaving – Castle Dynamic Proxy, not the recommended way to learn IL, I must say.
  14. CMS systems – several, but I really like Impleo and the concept behind it.
  15. ETL system – Took 3 times to get right.
  16. Security system – Rhino Security was fun to design, and quite interesting to implement.
  17. Licensing framework – because trying to buy one commercially just didn’t work.
  18. Service Bus – which I consider to be one of my best coding efforts.
  19. CI Server – so I can get good GitHub integration.
  20. Domain Specific Language framework – well, I did write the book on that :-)
  21. Source control server – SvnBridge

I haven’t written a testing framework, though.

I am probably forgetting a lot of stuff, actually…

NIH alert! Writing my own RPC systems

Let us see if you can help me here, I found myself facing a rather unpleasant realization, in that I need to communicate between two processes (that compose a single system) with some rather draconian measures about how they are going to be used.

Quite simply, I need to expose an interface with ~25 methods on it to the second process. So far, it is pretty easy to do, right?

The problem is that the first process is a standard .NET executable, which may also be run on the Mono platform while the second is a Silverlight application. My first thought, just use remoting to handle this, failed because remoting isn’t available to Silverlight. My second thought, to use WCF, failed because that isn’t available on Mono.

Building a simple RPC system is pretty easy, so that doesn’t worry me. The reason I am reluctant to do so is that I really don’t want to build yet another infrastructure. At some point, even to me, NIH flag starts to pop up.

Tags:

Published at

Originally posted at

Comments (29)

Fighting the profiler memory obesity

When I started looking into persisting profiler objects to disk, I had several factors that I had to take into account:

  • Speed in serializing / deserializing.
  • Ability to intervene in the serialization process at a deep level.
  • Size (also effect speed).

The first two are pretty obvious, but the third requires some explanation. The issue is, quite simply, that I can apply some strategies to significantly reduce both speed & size of serialization by making sure that the serialization pipeline knows exactly what is going on (string tables & flyweight objects).

I started looking into the standard .NET serialization pipeline, but that was quickly ruled out. There are several reasons for that, first, you literally cannot hook deep enough into the serialization pipeline to do the sort of things that I wanted to do (you cannot override how System.String get persisted), and it is far too slow for my usages.

My test data started as a ~900Mb of messages, which I loaded into the profiler (resulting in a 4 GB footprint during processing and a 1.5GB footprint when processing is done). Persisting the in memory objects using BinaryFormatter resulted in a file whose size is 454Mb and whose deserialization I started before I started writing this post and at this point in time has not completed yet. Currently the application (simple cmd line test app that only does deserialization, takes 1.4 GB).

So that was utterly out. So I set out to write my own serialization format. Since I wanted it to be fast, I couldn’t use reflection, (BF app currently takes 1.6 GB) but by the same token, writing serialization by hand is labor intensive, error prone method. That lives aside the question of handling changes in the objects down the road, that is not something that I would like to do.

Having come to that conclusion, I decided to make use of CodeDOM to generate a serialization assembly on the fly. That would give me the benefits of no reflection, handle addition of new members to the serialized objects and would allow me to incrementally improve how (BF app now takes 2.2 GB, and I am getting ready to kill it). My first attempt in doing so, applying absolutely not optimization techniques, result in a 381 Mb file and an 8 seconds parsing time.

That is pretty good, but I wanted to do a bit more.

Now, note that this is an implementation specific for a single use. After applying a simple string table optimization, the results of the serialization are two files, the string table is 10Mb in length and the actual saved data is 215Mb and de-serialization takes ~10 seconds. Taking a look at what actually happened, it looked like the cost of maintaining string table is quite high. Since I care more about responsiveness than file size, and since the code for maintaining the string table is complex, I dropped that in favor of in memory only MRU string interning.

Initial testing shows that this should be quite efficient in reducing memory usage. In fact, in my test scenario, memory consumption during processing dropped down 4 GB to just 1.8 – 1.9 GB and 1.2 GB when processing is completed. And just using the application shows that the user level performance is pretty good, even if I say so myself.

There are additional options that I intend to take, but I’ll talk about them in a later post.

How would you learn a new platform?

Here is an interesting problem that I am facing. I have a pretty good working knowledge of computing, and while I can usually manage to get the gist of a new technology in a short amount of time, that is only useful for talking about it, not actually applying that. I am currently trying to figure out the parts where I am missing, and plug the holes. And I am running into a bit of a problem here.

A case in point, I can read Java, and I can probably write C# code in Java, but I can’t write a Java application. Not for lack of technical skills, but just because I lack the practical knowledge on how to do so.  The problem is that it is going to take too long for me to slog through everything myself, especially since my main problems are in things like IDEs and usage patterns, which aren’t really stuff that you learn from reading about it.

I am thinking on taking a Java course, not so much for the content of the course, as the ability to actually build Java apps in an environment where I can ask questions. On the other hand, picking the right course is problematic, I am not going to sit through “this is a for loop", but I don’t want to be the guy with the blank stare.