Ayende @ Rahien

It's a girl

Happy New Year, and across the board profiler discount

To celebrate the new year, I decided to offer a single day coupon that will get you 35% discount for all my profiler products.

The coupon code is valid until the 1st of January 2011, but I won’t mention in which timezone, so you might want to hurry up.

The coupon code is:

HNY-45K2D465DD

You can use it to buy:

Happy new year!

Public Service Announcement: Git master repositories for the Rhino Tools projects

There have been some changes, and it seems that it is hard to track them. Here are where you can find the master repositories for the rhino tools projects:

We are hiring!

In 2005, I got out of the army, and looked for a job. I felt that I had all the necessary qualifications (at the time, I had already written Rhino Mocks), but I run into an interesting problem. I didn’t have any commercial experience. I am sure that you are familiar with the tale. All the jobs require at least 2 – 3 years experience, and the list of qualification looked like someone just raided the TLA storage facilities.

That was pretty annoying at the time, and I wished I could do something about it. Now I can.

We aren’t hiring additional full time employees (at least not right now), but we are hiring interns. The main idea is to enable people to gain commercial experience while working on cool software.

Some information that you probably need to know:

  • You have to know to program in C#. If you have OSS credentials, that is all for the better, but it is not required.
  • This is a paid position, you are not expected to work for free.
  • It is not a remote position, our offices are located in the Southern Industrial Area in Hadera, Israel.
  • I’m not going to ask you to do all the annoying work. There is some annoying stuff to do (mostly docs), but look below to see what exactly I have in mind.

This position entails:

Working on our shipping products (the Uber Prof line of products and the Raven DB / Raven MQ line).

Among other things, I have earmarked the following features for you:

  • Raven DB’s query optimizer.
  • Raven DB’s auto sharding & scaling.
  • Raven DB’s clustering support.
  • Uber Prof’s production profiling feature.

The other things that are involved include all the usual stuff, bug fixes, doc writing and standard (in other words, not that exciting) features.

Duration:

This is a three to six months position.

What is required:

Commercial experience is not required, but I would want to see code that you wrote.

Answer: This code should never hit production

Originally posted at 12/15/2010

Yesterday I asked what is wrong with the following code:

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

The answer to that is quite simple, this code doesn’t have any paging available. What this means is if we executes this piece of code on an field with very high number of unique items (such as, for example, email addresses), we would return all the results in one shot. That is, if we can actually fit all of them to memory. Anything that can run over potentially unbounded result set should have paging as part of its basic API.

This is not optional.

Here is the correct piece of code:

public ISet<string> GetTerms(string index, string field, string fromValue, int pageSize)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field, fromValue ?? string.Empty));
        if (string.IsNullOrEmpty(fromValue) == false)// need to skip this value
        {
            while(fromValue.Equals(termEnum.Term().Text()))
            {
                if (termEnum.Next() == false)
                    return result;
            }
        }
        while (field.Equals(termEnum.Term().Field()))
        {
            result.Add(termEnum.Term().Text());

            if (result.Count >= pageSize)
                break;

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

And that is quite efficient, even for searching large data sets.

For bonus points, the calling code ensures that pageSize cannot be too big :-)

Challenge: This code should never hit production

This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

As usual, I’ll post the answer tomorrow.

Detailed Answer: Your own ThreadLocal

Originally posted at 12/15/2010

In the last post, we have shown this code:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        lock (Slots)
        {
            if (Slots.TryGetValue(holder, out val))
            {
                return val;
            }
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        lock (Slots)
        {
            Slots[holder] = val;
        }
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        lock (capturedSlots)
            capturedSlots.Remove(holder);
    }
}

And then I asked whatever there are additional things that you may want to do here.

The obvious one is to thing about locking. For one thing, we have now introduced locking for everything. Would that be expensive?

The answer is that probably not. We can expect to have very little contention here, most of the operations are always going to be on the same thread, after all.

I would probably just change this to be a ConcurrentDictionary, though, and remove all explicit locking. And with that, we need to think whatever it would make sense to make this a static variable, rather than a thread static variable.

Answer: Your own ThreadLocal

Originally posted at 12/15/2010

Well, the problem on our last answer was that we didn’t protect ourselves from multi threaded access to the slots variable. Here is the code with this fixed:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        lock (Slots)
        {
            if (Slots.TryGetValue(holder, out val))
            {
                return val;
            }
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        lock (Slots)
        {
            Slots[holder] = val;
        }
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        lock (capturedSlots)
            capturedSlots.Remove(holder);
    }
}

Is this it? Are there still issues that we need to handle?

Wrong Answer #3: Your own Thread Local

Originally posted at 12/15/2010

Last time, we forgot that the slots dictionary is a thread static variable, and that the finalizer is going to run on another thread… Let us fix this, too:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(holder, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[holder] = val;
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        capturedSlots.Remove(holder);
    }
}

And now it works!

Except… Under some very rare scenarios, it will not do so.

What are those scenarios? Why do we care? And how do we fix this?

Wrong Answer #2: Your own ThreadLocal

Originally posted at 12/15/2010

Well, last time we tried to introduce a finalizer, but we forgot that we were using our own instance as the key to the slots dictionary, which prevented us from being collected, hence we never run the finalizer.

All problems can be solved by adding an additional level of indirection… instead of using our own instance, let us use a separate instance:

public class CloseableThreadLocal
{
    [ThreadStatic] public static Dictionary<object, object> slots;

    object holder = new object();
    public static Dictionary<object, object> Slots
    {
        get { return slots ?? (slots = new Dictionary<object, object>()); }
    }

    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(holder, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[holder] = val;
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (slots != null)// intentionally using the field here, to avoid creating the instance
            slots.Remove(holder);
    }

    ~CloseableThreadLocal()
    {
        Close();
    }
}

Now it should work, right?

Expect that it doesn’t… We still see that the data isn’t cleaned up properly, even though the finalizer is run.

Why?

Wrong answer #1: Your own ThreadLocal

Originally posted at 12/15/2010

Well, one way we can use to solve the problem is by introducing a finalizer, like so:

public class CloseableThreadLocal
{
    [ThreadStatic] private static Dictionary<object, object> slots;

    public static Dictionary<object, object> Slots
    {
        get { return slots ?? (slots = new Dictionary<object, object>()); }
    }

    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(this, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[this] = val;
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (slots != null)// intentionally using the field here, to avoid creating the instance
            slots.Remove(this);
    }

   ~CloseableThreadLocal()
   {
         Close();
   }
}

But this will not actually work, executing this code shows that we still have a memory leak:

internal class Program
{
    private static void Main(string[] args)
    {
        UseThreadLocal();
        GC.Collect(int.MaxValue);
        GC.WaitForPendingFinalizers();
        Console.WriteLine(CloseableThreadLocal.slots.Count);
    }

    private static void UseThreadLocal()
    {
        var tl = new CloseableThreadLocal();
        tl.Set("hello there");
        Console.WriteLine(tl.Get());
    }
}

Why? And how can we fix this?

Challenge: Your own ThreadLocal

I found a bug in Lucene.NET that resulted in a memory leak in RavenDB. Take a look here (I have simplified the code to some extent, but the same spirit remains):

public class CloseableThreadLocal
{
    [ThreadStatic] private static Dictionary<object, object> slots;

    public static Dictionary<object, object> Slots
    {
        get { return slots ?? (slots = new Dictionary<object, object>()); }
    }

    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(this, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[this] = val;
    }

    public virtual void Close()
    {
        if (slots != null)// intentionally using the field here, to avoid creating the instance
            slots.Remove(this);
    }
}

As you can imagine, this is a fairly elegant way of doing this (please note, .NET 4.0 have the ThreadLocal class, which I strongly recommend using). But it has one very serious flaw. It you don’t close the instance, you are going to leak some memory. As you can imagine, that is a pretty bad thing to do.

In general, I consider such designs as pretty bad bugs when writing managed code. We have the GC for a reason, and writing code that forces the user to manually manage memory is BAD for you. Here is an example showing the problem:

class Program
{
    static void Main(string[] args)
    {
        UseThreadLocal();
        GC.Collect(2);
GC.WaitForPendingFinalizers();
Console.WriteLine(CloseableThreadLocal.slots.Count); } private static void UseThreadLocal() { var tl = new CloseableThreadLocal(); tl.Set("hello there"); Console.WriteLine(tl.Get()); } }

This will show that after the UseThreadLocal() run and we force full collection, the value is still there.

Without using the builtin ThreadLocal, can you figure out a way to solve this?

Points goes to whoever does this with the minimum amount of changes to the code.

The right storage model

Originally posted at 12/15/2010

What are the advantages that you get when using RavenDB? The easy answers are:

  • Sparse data
  • Dynamic data
  • Better scaling across nodes

But there is another one that is even more important, in my eyes. It is the simple issue that with RavenDB, a document is a transactional boundary. That makes it very easy to model interactions on root aggregates that are much harder in relational database.

The most commonly used example is Order & OrderLines. In a relational database, those would be in two separate tables, and it is possible to modify order lines without touching the order. With RavenDB, on the other hand, the OrderLines are embedded in the Order document, so you literally cannot modify them without touching the order as well. This has a lot of implications. The most obvious one is that you just got rid of a big burden in terms of managing concurrency. You no longer have to be very careful about how you update parts of your model, as you have to be when using relational databases. You can just make any update you want, and if there are concurrent updates, it will completely, without you having to do anything at all.

Another benefit of this approach is that because a document is the transactional boundary, accessing all of it is a very cheap operation. Coming back again to the Order, it is a very cheap operation to access all of the Order details, which can be very costly in a relational database (having to access Orders, OrderLines, Addresses, etc… ).

The strange world of products psychology

Originally posted at 12/13/2010

A while ago my company started to offer Commercial Support for NHibernate. It seems like a complementary product to what we are already doing, but it actually had a few other reasons to want to do so.

Put simply, my pool of potential clients for the NHibernate Profiler are… NHibernate users. Anything that increases the pool of NHibernate’s users also increase the pool of potential clients for NH Prof.

The reason that we offer commercial support for NHibernate in the first place was to increase the trust people have with NHibernate. It reduce a barrier to entry, since if a company asks about the support options, they know that there are commercial support options out there.

With that in mind, I set out to design the pricing structure for the commercial support very carefully. One of the most important aspects of offering commercial support is that you want to have it behave just like insurance. For the most part, people need the support contract very rarely, and even if you end up spending three days working on a customer problem, that cost is amortized with other customers who did not call.

Knowing that, I explicitly added an Ad Hoc option for the commercial support offering for NHibernate. Ad Hoc support doesn’t really make a lot of sense, to tell you the truth. Not as a product, at least. It basically means that someone can call you, pay a fixed amount and then require potentially unlimited amount of time / effort. The reason that I did that is that having an Ad Hoc support option is a very compelling feature from the customer point of view. For precisely the same reason.

I was looking to make NHibernate itself more attractive with that offer, not to actually have a viable product.

Now, however, I feel that NHibernate no longer need the prop of the Ad Hoc support option, and thus it is no longer an option for our commercial support offering for NHibernate.

What does it means? Advanced

Originally posted at 12/8/2010

I posted the following a few days ago:

I’ll be giving my Advanced NHibernate course in March 2011 in Dallas. We are talking about 3 days of intensive dive into NHibernate, how it works, fully utilizing its capabilities, and actually grokking the NHibernate’s zen.

You can register to the course here: http://dallas-nhibernate.eventbee.com

And I got an very interesting comment about that:

You mention in your post that your having an advanced nhibernate course, but that doesn't seem to agree with your syllabus which devotes 1/3 of the class to intro concepts.
Just thought I'd say this in hopes that the course could be more advanced in nature.

The answer is that the syllabus is correct, and the course is advanced. But how can I call a course advanced if it contains things like?

  • Basic NHibernate configuration
  • Getting started with queries and mapping
  • Session and transaction management basics
  • Simple NHibernate queries

Put simply, it is because most people, even thus who are using NHibernate for a  while, barely scratch the surface of its capabilities. What we are doing in the course is going deep. We digs into how NHibernate is doing some things, why it does them in such a fashion and what alternative are theer.

Another very interesting thing that I noted with most people coming to my courses is that they may have a lot of knowledge in a particular area of NHibernate, but lack the same level of knowledge in other areas.

And then you’ll make it up in volume, right?

Originally posted at 12/6/2010

I recently had a few discussions with several startups, in the process of going to the cloud. One of the first questions that I ask is “what is your licensing/revue model?”.

At first, I usually get some very funny looks, I am not there to do business consulting, but to do technical stuff. Why do I stick my nose into things that have absolutely nothing to do with technical stuff?

The answer is quite simple, you have to consider the money side of things in your architecture. Let me give you a real world example. A startup that I was talking with was talking about creating a SAAS system on the cloud. Each client would pay  a flat fee of 150$ per month. When I started talking with them, the idea was to use EC2’s ability to just create a new node on the fly, and give each client their own server. That would allow very easy development and deployment, and remove a whole bunch of complexity.

I even agreed, except that I raised one small problem. Each server would cost them roughly 250$ per month. I had to ask them: “Do you intend to make it up in volume?”

We always had to make those sort of calls, but especially when we are using the cloud it becomes so much clearer. We have to take into account the actual real world costs of our architecture choices, and in many cases, we have to adapt our practices to the actual economical realities.

Published at

Originally posted at

Comments (4)

You don’t scale a mutli tenant environment

Originally posted at 12/6/2010

That is actually an inaccurate statement, for the most part. A more accurate way of saying this is that for the most part, you don’t need to think about scaling a multi tenant environment. Most multi tenant environments are using some form of a user based licensing method, which means that for the vast majority of tenants, there aren’t going to be a lot of users. That means that we have a very natural way of isolating things. Handling a single tenant is very easy, then, because we are actually handling small amount of users and (relatively) small amount of information.

The challenge begins when we need to consider how to work with all the tenants. My general approach to that is actually to think upfront about the sort of things that would prevent me from scaling things, and avoid those, but not do anything else about it. Let us consider what I mean…

The easiest way to handle multi tenant application is when you create total separation between each tenant. It is fairly easy to do, all you have to ensure is a separate data store (note that you really need to ensure separate caches, as well), and everything else more or less flows from there. In most multi tenant applications, there are four things that may vary between tenants:

  • Schema
  • Data
  • Behavior
  • User Interface

The first two are handled at the data store level (and is usually very easy), while the last two are just application code (be it separate dll, tenant scripts, configuration, etc).

Given all of that, scaling multi tenant applications is usually a process that is handled by the load balancer. What you need is to specify which tenants are served by which servers, and to ensure that you don’t overload each server capacity. If you want to be really smart about it, you can do load based load balancing, but for the most part, even static routing that gives each server X number of tenants would work.

This helps a lot because it means that for the most part, even though you are writing a distributed app that may have a very large number of total users, your actual application can behave as if it was a small application. That means that you can do thinks like memory based caching, instead of distributed caching, because all the tenant users are going to be served from the same server.

And what happen if you have a tenant with thousands of users?

At that point, doing this sort of trick won’t work, but presumably you are charging them enough to just allocate a single server for this tenant. In fact, assuming standard user based licensing, you actually got to the point where you have, as we say in Israel, a Rich Man’s Problem. Even if they are big enough to require more than a single server, you can usually just split them to several, that is where the pre-planning for the scaling stoppers starts to pay. But in all honestly, it doesn’t happen that often.

Tags:

Published at

Originally posted at

Data roles don’t scale up/down dynamically

Originally posted at 12/5/2010

One question that comes up frequently with regards to running RavenDB on the cloud is how to take advantage of the cloud’s ability to scale up & down dynamically. The question to that is actually quite interesting.

You don’t.

That seems to give most people a pause, because it is totally unexpected. On the cloud, people expects to be able to dynamically adjust the number of servers they have based on their current load. After all, this is what you do with web and worker roles, no?

The problem with that logic chain is that is assumes equality between a Database and a web/worker role. For the most part, web/worker roles are pretty much stateless, they may have caches, but that is about it. That makes it very easy to add new servers when there is heavy load and remove them when the load goes down.

But for data roles, you can’t really do that. What is going to happen to the data in that node when you take it down when there is less work to be done?

There are actually solutions for that, to tell you the truth, at least for RavenDB, because we can manipulate offline databases very easily, so we can shuffle them off & on machines by just copying the document. But for the most part, it is actually too much work. Even for very large loads, a small number of sharded servers can more than keep up with your application, and while it is theoretically nice to have the ability to do so, you usually don’t care.

Tags:

Published at

Originally posted at

Comments (4)

Querying relative information with RavenDB

Originally posted at 12/3/2010

Let us say that we have the following document:

{
  "Name": "Ayende",
  "LastScore": 239.2,
  "MaxScore": 392.6
}

And we want to find all documents whose last score is within a certain range from their max score. Note that for different queries, the range that I can use may be different.

RavenDB doesn’t offer the option of doing computation in the where clause. Mostly, that is because such computations are going to perform badly unless special care is taken to avoid that. Instead, we are going to create a computed field in the index.

First, we define:

from u in docs.Users
where u.MaxScore
select new { Score = u.LastScore / u.MaxScore } 

This computed field now allows us to query on it very easily. Moreover, when we query, we are still querying over pre-computed data, which is going to blindingly fast.

Tags:

Published at

Originally posted at

Comments (6)

NH Prof New Feature: Alert on bad ‘like’ query

Originally posted at 12/3/2010

One of the things that are coming to NH Prof is more smarts at the analysis part. We now intend to create a lot more alerts and guidance. One of the new features that is already there as part of this strategy is detecting bad ‘like’ queries.

For example, let us take a look at this

image

This is generally not a good idea, because that sort of query cannot use an index, and requires the database to generate a table scan, which can be pretty slow.

Here is how it looks like from the query perspective:

image

And NH Prof (and all the other profilers) will now detect this and warn about this.

image

In fact, it will even detect queries like this:

image

Tags:

Published at

Originally posted at

Comments (8)

Minimum Sellable Features

Originally posted at 12/3/2010

I also heard this term as “Minimum Viable Product”.

I use this term to describe what is the minimum set of features that we need before we can sell the product. This is something that is very important when you are talking about product development. Defining what is the point when I can sell the product defines a lot of other aspects of the product.

It defines the release schedule, it defines the architecture and the process we use for building, it define what is going to be in the Beta and in the 1.0 versions, and what is going to be deferred for vNext.

Note that Minimum Sellable Features is not the product roadmap. It is simply the point when you decide that charging money for the product is not going to be ripping off your product.  My personal preference is that the moment that you reach this point, you start selling the product.

It is also very important to understand that Minimum Sellable Features has a minimum in the name. That is intentional, and descriptive. You want to make the list of required features to be as small as possible. The worst case is that your Minimum Sellable Features is also your Complete Features List. My rule of thumb is that Minimum Sellable Features should be about 5% of the Complete Features List.

Why is that?

Because that means that you only need enough money to get to the Minimum Sellable Feature point. Afterward, you should be able to generate cash flow and use that to continue development.

Why is this important?

It is all about feedback, it is much cheaper to get to Minimum Sellable Features and give customers the chance to tell us if they like it (by buying it) or why they don’t like it (via feedback).

What if no one buys this?

Great! That means that you didn’t have to burn as much money as you would have to discover that you are building something that no one would buy.

This is important, because you have to be very clear about your goals. My goals, when building a product, are:

  • Have fun
  • Make money

If I can’t make it do both, then there are other rules that apply:

  • Fun but no money - becomes an OSS project. At which point I can skip doing all the unpleasant things like documentation.
  • Money but not fun – outsourced the project.
  • Not fun, no money – bye bye.

But the most important issue here is that your product is getting in the hands of the users very early. There are many strategies for reducing the number of features that has to be in Minimum Sellable Features list. For example, using a disabled UI button to show that this is an upcoming feature.

With Minimum Sellable Features, you usually sell the product at some discount, because it isn’t complete yet. You might be familiar with the idea as beta discount.

Defining your Minimum Sellable Features is a good way to know how to get (and fast) to a state where you can actually get money from the users. It is important to be careful here and not provide something that is not functional, but assuming that you have something that is functional, the users’ feedback (and money) are invaluable.

Published at

Originally posted at

Comments (9)

Feature selection strategies for NH Prof

Originally posted at 12/3/2010

I recently had a discussion on how I select features for NH Prof.  The simple answer is that I started with features that would appeal to me.  My dirty little secret is that the only reason NH Prof even exists is that I wanted it so much and no one else did it already.

But while that lasted for a good while, I eventually got to the point where NH Prof does everything that I need it to do. So, what next… ?

Feature selection is a complex topic, and it is usually performed in the dark, because you have to guess at what people are using. A while ago I setup NH Prof so I can get usage reports (they are fully anonymous, and were covered on this blog previously). Those usage reports come in very handily when I need to understand how people are using NH Prof. Think about it like a users study, but without the cost, and without the artificial environment.

Here are the (real) numbers for NH Prof:

Action % What it means
Selection 62.76% Selecting a statement
Session-Statements 20.58% Looking at a particular session statements
Recent-Statements 8.67% The recent statements (default view)
Unique-Queries 2.73% The unique queries report
Listening-Toggle 1.10% Stop / Start listening to connections
Session-Usage 0.91% Showing the session usage tab for a session
Session-Entities 0.54% Looking at the loaded entities in a session
Query-Execute 0.50% Show the results of a query
Connections-Edit 0.38% Editing a connection string
Queries-By-Method 0.34% The queries by method report
Queries-By-Url 0.27% The queries by URL report
Overall-Usage 0.25% The overall usage report
Show-Settings 0.23% Show settings
Aggregate-Sessions 0.21% Selecting more than 1 session
Reports-Queries-Expensive 0.16% The expensive queries report
Session-Remove 0.13% Remove a session
Queries-By-Isolation-Level 0.08% The queries by isolation level report
File-Load 0.04% Load a saved session
File-Save 0.03% Save a session
Html-Export 0.02% Exporting to HTML
Sessions-Diff 0.01% Diffing two sessions
Sort-By-ShortSql 0.01% Sort by SQL
Session-Rename 0.01% Rename a session
Sort-By-Duration 0.01% Sort by duration
Sort-By-RowCount > 0.00% Sort by row count
GoToSession > 0.00% Go from report to statement’s session
Sort-By-AvgDuration > 0.00% Sort by duration (in reports)
Production-Connect > 0.00% (Not publically available) Connect to production server
Sort-By-QueryCount > 0.00% Sort by query count (in reports)
Sort-By-Alerts > 0.00% Sort by alerts (for statements)
Sort-By-Count > 0.00% Sort by row count

There is nothing really earth shattering here, by far, people are using NH Prof as a tool to show them the SQL. Note how most of the other features are used much more rarely. This doesn’t mean that they are not valuable, but it does represent that a feature that isn’t immediately available on the “show me the SQL” usage path is going to be used very rarely.

There is another aspect for feature selection, will this feature increase my software sales?

Some features are Must Have, your users won’t buy the product without them. Some features are Nice To Have, but have no impact on the sale/no sale. Some features are very effective in driving sales.

In general, there is a balancing act between how complex a feature is, how often people will use it and how useful would it be in marketing.

I learned quickly that having better analysis (alerts) is a good competitive advantage, which is why I optimized the hell out of this development process. In contrast to that, things like reports are much less interesting, because once you got the Must Have ones, adding more doesn’t seem to be an effective way of going about things.

And then, of course, there are the features whose absence annoys me…

Tags:

Published at

Originally posted at

Comments (6)