Code Data Mining

time to read 2 min | 298 words

I just wrote this piece of code:

class ExpressionInserterVisitor : DepthFirstVisitor
{
    public override bool Visit(Node node)
    {
        using(var con = new SqlConnection("data source=localhost;Initial Catalog=Test;Trusted_Connection=yes"))
        using (var command = con.CreateCommand())
        {
            con.Open();
            command.CommandText = "INSERT INTO Expressions (Expression) VALUES(@P1)";
            command.Parameters.AddWithValue("@P1", node.ToString());
            command.ExecuteNonQuery();
        }
        Console.WriteLine(node);
        return base.Visit(node);
    }
}

As you can imagine, this is disposable code, but why did I write that?

I run this code on the entire DSL code base that I have, and then started applying metrics to it. In particular, I was interested in trying to find repeated concepts that has not been codified.

For example, if this would have shown 7 uses of:

user.IsPreferred and order.Total > 500 and (order.PaymentMethod is Cash or not user.IsHighRisk)

Then this is a good indication that I have a business concept waiting to be discovered here, and I turn that into a part of my language:

IsGoodDealForVendor (or something like that)

Here we aren't interested in the usual code quality metrics, we are interested in business quality metrics :-) And the results were, to say the least, impressive.