Ayende @ Rahien

It's a girl

It really happened, legacy programmers tales

Fairy tales always start with “Once upon a time”, and programmers tales starts with “when I was at a client”…

Two days ago I was a client, and the discussion turned to bad code bases, as it often does. One story that I had hard time understanding was the Super If.

Basically, it looked like this:

image

I had a hard time accepting that someone could write an if condition that long. I kept assuming that they meant that the if statements were 50 lines long, but that wasn’t the case.

And then yesterday I had an even more horrifying story. A WCF service making a call to the database always timed out on the first request, but worked afterward. What would be your first suspicion? Mine was that it took time to establish the database connection, and that after the first call the connection resided in the connection pool.

They laughed at my naivety, for it wasn’t connecting to the database that caused the timeout, it was JITting the method that the WCF service ended up calling.

Yep, you got that right, JITting a single method (because the runtime only JIT a single method at a time). I had even harder time believing that, until they explained to me how that method was built:

image

Some interesting stats:

  • It had a Cyclomatic Complexity of either 4,000 or 8,000, the client couldn’t remember.
  • The entire Rhino Mocks codebase fits in 13,000 LOC, so this single method could contain it several times over.

But you know what the really scary part is?

I upgraded from Super If to Black Hole Methods, and I am afraid to see what happen today, because if I get something that top the Black Hole Method, I may have to hand back my keyboard and go raise olives.

Entity != Table

I recently had a chance to work on an interesting project, doing a POC of moving from a relational model to RavenDB. And one of the most interesting hurdles along the way wasn’t technical at all, it was trying to decide what an entity is. We are so used to make the assumption that Entity == Table that we started to associate the two together. With a document database, an entity is a document, and that map much more closely to a root aggregate than to a RDMBS entity.

That gets very interesting when we start looking at tables and having to decide if they represent data that is stand alone (and therefore deserve to live is separate documents) or whatever they should be embedded in the parent document. That led to a very interesting discussion on each table. What I found remarkable is that it was partly a discussion that seem to come directly from the DDD book, about root aggregates, responsibilities and the abstract definition of an entity and partly a discussion that focused on meeting the different modeling requirement for a document database.

I think that we did a good job, but I most valued the discussion and the insight. What was most interesting to me was how right was RavenDB for the problem set, because a whole range of issues just went away when we started to move the model over.

I ain’t going against my professional judgment pro bono

I had an interesting conversation with a guy about some problem he was having. This was just one of those “out of the blues” contacts that happen, when someone contact me to ask a question. He presented a problem that I see all too often, trying to create a system in which the entities are doing everything, and he run into problems with that (to be fair, he run into a unique set of problems with that). I gave him a list of blog posts are articles to read, suggesting the right path to go. After a few days, he replied with:

I went over your advised reading in depth, but let me describe in short the properties and functions of our system, which I think causes the system to be an exception to those methods.

He then proceed to outlay his problem, a proposed solution and then asked a very specific NHibernate question that was a blocking stumbling block to get ahead with the solution he wanted. My reply was that he took the wrong approach, a suggestion how to resolve it in a different manner and a link to our NHibernate Commercial Support option.

Database assisted denormalization – Oracle edition

I decided to take a chance (installing Oracle is a big leap :-) ) and see how things match in Oracle.

I decided to run the following query:

SELECT deptno, 
       dname, 
       loc, 
       (SELECT COUNT(*) 
        FROM   emp 
        WHERE  emp.deptno = dept.deptno) AS empcount 
FROM   dept 
WHERE  deptno = 20 

Please note that I run in on a database that had (total) maybe a 100 records, so the results may be skewed.

image

Like in the SQL Server case, we need to create an index on the FK column. I did so, after which I got:

image

Then I dropped that index and create a simple view:

CREATE VIEW depswithempcount 
AS 
  SELECT deptno, 
         dname, 
         loc, 
         (SELECT COUNT(*) 
          FROM   emp 
          WHERE  emp.deptno = dept.deptno) AS empcount 
  FROM   dept 

Querying on top of that gives me the same query plan as before. Trying to create a materialized view out of this fails, because of the subquery expression, I’ll have to express the view in terms of joins, instead. Like this:

SELECT dept.deptno, 
       dname, 
       loc, 
       COUNT(*) empcount 
FROM   dept 
       LEFT JOIN emp 
         ON dept.deptno = emp.deptno 
WHERE  dept.deptno = 20 
GROUP  BY dept.deptno, 
          dname, 
          loc 

Interestingly enough, this is a different query plan than the subquery, with SQL Server, those two query exhibit identical query plans.

image

Now, to turn that into an materialized view.

CREATE materialized VIEW deptwithempcount 
AS SELECT dept.deptno, 
          dname, 
          loc, 
          COUNT(*) empcount 
   FROM   dept 
          left join emp 
            ON dept.deptno = emp.deptno 
   GROUP  BY dept.deptno, 
             dname, 
             loc 

And querying on this gives us very interesting results:

select * from deptwithempcount 
where deptno = 20

image

Unlike SQL Server, we can see that Oracle is reading everything from the view. But let us try one more thing, before we conclude this with a victory.

update emp 
set deptno = 10
where deptno = 20;

select * from deptwithempcount 
where deptno = 20

But now, when we re-run the materialized view query, we see the results as they were at the creation of the view.

There appears to be a set of options to control that, but the one that I want (RERESH FAST), which update the view as soon as data changes will not work with this query, since it consider it too complex. I didn’t investigate too deeply, but it seems that this is another dead end.

Tags:

Published at

Originally posted at

Comments (11)

The Profiler New Features: Starring & Renaming

An interesting thing happened recently, when I started to build the profiler, a lot of the features were what I call Core Features. Those were the things that without which, we wouldn’t have a product. Things like detecting SQL, merging it into sessions, providing reports, etc. What I find myself doing recently with the profiler is not so much building Core Features, but building UX features. In other words, now that we have this in place, let us see how we can make better use of this.

Case in point, the new features that were just released in build 713. They aren’t big, but they are there to improve how people are commonly using the products.

Renaming a session:

image

This is primarily useful if you are in a long profiling session and you want to mark a specific session with some notation:

image

Small feature, and individually not very useful. But you might have noticed that the sessions are marked with stars around them. They weren’t there is previous builds, so what are they?

image

They are a way to tell the profiler that you really like those sessions :-)

More to the point, such sessions will not be removed when you clear the current state. That lets you keep around the previous state of the application as a base line while you work to improve it. Beside, it makes it much easier to locate them visually.

And finally, as a quicker way to do that, you can just ask the profiler to clear all but the selected features.

image

Not big features, but nice ones, I think.

LightSwitch on the wire

This is going to be my last LightSwitch post for a while.

I wanted to talk about something that I found which was at once both very surprising and Doh! at the same time.

Take a look here:

image_thumb[1]

What you don’t know is this was generated from a request similar to this one:

wget http://localhost:22940/Services/LSTest-Implementation-ApplicationDataDomainService.svc/binary/AnimalsSet_All?$orderby=it.Id&$take=45&$includeTotalCount=

What made me choke was that the size of the response for this was 2.3 MB.

Can you guess why?

The image took up most of the data, obviously. In fact, I just dropped an image from my camera, so it was a pretty big one.

And that lead to another problem. It is obviously a really bad idea to send that image on the wire all the time, but LightSwitch make is so easy, indeed, even after I noticed the size of the request, it took me a while to understand what exactly is causing the issue.

And there doesn’t seems to be any easy way to tell LightSwitch that we want to put the property here, but only load it in certain circumstances. For that matter, I would generally want to make the image accessible via HTTP, which means that I gain advantages such as parallel downloads, caching, etc.

But there doesn’t seems to be any (obvious) way to do something as simple as binding a property to an Image control’s Url property.

LightSwitch & Source Control

Something that I found many high level tools are really bad at is source control, so I thought that I would give LightSwitch a chance there.

I created a Git repository and shoved everything into it, then I decided that I would rename a property and see what is going on.

I changed the Animals.Species to Animals.AnimalType, which gives me:

image

This is precisely what I wanted to see.

Let us see what happen when I add a new table. And that created a new set in the ApplicationDefinition.lsml file.

Overall, this is much better than I feared.

I am still concerned about having everything in a single file (which is a receipt for having a lot of merge conflicts), but at least you can diff & work with it, assuming that you know how the file format works, and is seems like it is at least a semi reasonable one.

Nevertheless, as promised:

True story, I used to have a lot of ravens in my backyard, but they seem to have gone away single my dog killed one of them, about a week after RavenDB’s launch.

Analyzing LightSwitch data access behavior

I thought it would be a good idea to see what sort of data access behavior LightSwitch applications have. So I hook it up with the EntityFramework Profiler and took it for a spin.

It is interesting to note that it seems that every operation that is running is running in the context of a distributed transaction:

image

There is a time & place to use DTC, but in general, you should avoid them until you really need them. I assume that this is something that is actually being triggered by WCF behavior, not intentional.

Now, let us look at what a simple search looks like:

image

This search results in:

image

That sound? Yes, the one that you just heard. That is the sound of a DBA somewhere expiring. The presentation about LightSwitch touted how you can search every field. And you certainly can. You can also swim across the English channel, but I found that taking the train seems to be an easier way to go about doing this.

Doing this sort of searching is going to be:

  • Very expensive once you have any reasonable amount of data.
  • Prevent usage of indexes to optimize performance.

In other words, this is an extremely brute force approach for this, and it is going to be pretty bad from performance perspective.

Interestingly, it seems that LS is using optimistic concurrency by default.

image

I wonder why they use the slowest method possible for this, instead of using version numbers.

Now, let see how it handles references. I think that I run into something which is a problem, consider:

image

Which generates:

image

This make sense only if you can think of the underlying data model. It certainly seems backward to me.

I fixed that, and created four animals, each as the parent of the other:

image

Which is nice, except that here is the SQL required to generate this screen:

-- statement #1
SELECT [GroupBy1].[A1] AS [C1]
FROM   (SELECT COUNT(1) AS [A1]
        FROM   [dbo].[AnimalsSet] AS [Extent1]) AS [GroupBy1]

-- statement #2
SELECT   TOP ( 45 ) [Extent1].[Id]              AS [Id],
                    [Extent1].[Name]            AS [Name],
                    [Extent1].[DateOfBirth]     AS [DateOfBirth],
                    [Extent1].[Species]         AS [Species],
                    [Extent1].[Color]           AS [Color],
                    [Extent1].[Pic]             AS [Pic],
                    [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM     (SELECT [Extent1].[Id]                      AS [Id],
                 [Extent1].[Name]                    AS [Name],
                 [Extent1].[DateOfBirth]             AS [DateOfBirth],
                 [Extent1].[Species]                 AS [Species],
                 [Extent1].[Color]                   AS [Color],
                 [Extent1].[Pic]                     AS [Pic],
                 [Extent1].[Animals_Animals]         AS [Animals_Animals],
                 row_number()
                   OVER(ORDER BY [Extent1].[Id] ASC) AS [row_number]
          FROM   [dbo].[AnimalsSet] AS [Extent1]) AS [Extent1]
WHERE    [Extent1].[row_number] > 0
ORDER BY [Extent1].[Id] ASC

-- statement #3
SELECT [Extent1].[Id]              AS [Id],
       [Extent1].[Name]            AS [Name],
       [Extent1].[DateOfBirth]     AS [DateOfBirth],
       [Extent1].[Species]         AS [Species],
       [Extent1].[Color]           AS [Color],
       [Extent1].[Pic]             AS [Pic],
       [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM   [dbo].[AnimalsSet] AS [Extent1]
WHERE  1 = [Extent1].[Id]

-- statement #4
SELECT [Extent1].[Id]              AS [Id],
       [Extent1].[Name]            AS [Name],
       [Extent1].[DateOfBirth]     AS [DateOfBirth],
       [Extent1].[Species]         AS [Species],
       [Extent1].[Color]           AS [Color],
       [Extent1].[Pic]             AS [Pic],
       [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM   [dbo].[AnimalsSet] AS [Extent1]
WHERE  2 = [Extent1].[Id]

-- statement #5
SELECT [Extent1].[Id]              AS [Id],
       [Extent1].[Name]            AS [Name],
       [Extent1].[DateOfBirth]     AS [DateOfBirth],
       [Extent1].[Species]         AS [Species],
       [Extent1].[Color]           AS [Color],
       [Extent1].[Pic]             AS [Pic],
       [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM   [dbo].[AnimalsSet] AS [Extent1]
WHERE  3 = [Extent1].[Id]

I told you that there is a select n+1 builtin into the product, now didn’t I?

Now, to make things just that much worse, it isn’t actually a Select N+1 that you’ll easily recognize. because this doesn’t happen on a single request. Instead, we have a multi tier Select N+1.

image

What is actually happening is that in this case, we make the first request to get the data, then we make an additional web request per returned result to get the data about the parent.

And I think that you’ll have to admit that a Parent->>Children association isn’t something that is out of the ordinary. In typical system, where you may have many associations, this “feature” alone is going to slow the system to a crawl.

Profiling LightSwitch using Entity Framework Profiler

This post is to help everyone who want to understand what LightSwitch is going to do under the covers. It allows you to see exactly what is going on with the database interaction using Entity Framework Profiler.

In your LightSwitch application, switch to file view:

image

In the server project, add a reference to HibernatingRhinos.Profiler.Appender.v4.0, which you can find in the EF Prof download.

image

Open the ApplicationDataService file inside the UserCode directory:

image

Add a static constructor with a call to initialize the entity framework profiler:

public partial class ApplicationDataService
{
    static ApplicationDataService()
    {
        HibernatingRhinos.Profiler.Appender.EntityFramework.EntityFrameworkProfiler.Initialize();
    }
}

This is it!

You’re now able to work with the Entity Framework Profiler and see what sort of queries are being generated on your behalf.

image

Tags:

Published at

LightSwitch: Initial thoughts

As promised, I intend to spend some time today with LightSwitch, and see how it works. Expect a series of post on the topic. In order to make this a read scenario, I decided that that a simple app recording animals and their feed schedule is appropriately simple.

I created the following table:

image

Note that it has a calculated field, which is computed using:

image

There are several things to note here:

  • ReSharper doesn’t work with LightSwitch, which is a big minus to me.
  • The decision to use partial methods had resulted in really ugly code.
  • Why is the class called Animals? I would expect to find an inflector at work here.
  • Yes, the actual calculation is crap, I know.

This error kept appearing at random:

image

It appears to be a known issue, but it is incredibly annoying.

This is actually really interesting:

image

  • You can’t really work with the app unless you are running in debug mode. That isn’t the way I usually work, so it is a bit annoying.
  • More importantly, it confirms that this is indeed KittyHawk, which was a secret project in 2008 MVP Summit that had some hilarious aspects.

There is something that is really interesting, it takes roughly 5 – 10 seconds to start a LS application. That is a huge amount of time. I am guessing, but I would say that a lot of that is because the entire UI is built dynamically from the data source.

That would be problematic, but acceptable, except that it takes seconds to load data even after the app has been running for a while. For example, take a look here:

image

This is running on a quad core, 8 GB machine, in 2 tiers mode. It takes about 1 – 2 seconds to load each screen. I was actually able to capture a screen half way loaded. Yes, it is beta, I know. Yes, perf probably isn’t a priority yet, but that is still worrying.

Another issue is that while Visual Studio is very slow, busy about 50% of the time. This is when the LS app is running or not. As an a side issue, it is hard to know if the problem is with LS or VS, because of all the problems that VS has normally.

image

As an example of that, this is me trying to open the UserCode, it took about 10 seconds to do so.

What I like about LS is that getting to a working CRUD sample is very quick. But the problems there are pretty big, even at a cursory examination. More detailed posts touching each topic are coming shortly.

Runtime code compilation & collectible assemblies are no go

The problem is quite simple, I want to be able to support certain operation on Raven. In order to support those operations, the user need to be able to submit a linq query to the server. In order to allow this, we need to accept a string, compile it and run it.

So far, it is pretty simple. The problem begins when you consider that assemblies can’t be unloaded. I was very hopeful when I learned about collectible assemblies in .NET 4.0, but they focus exclusively on assemblies generated from System.Reflection.Emit, while my scenario is compiling code on the fly (so I invoke the C# compiler to generate an assembly, then use that).

Collectible assemblies doesn’t help in this case. Maybe, in C# 5.0, the compiler will use SRE, which will help, but I don’t hold much hope there. I also checked out Mono.CSharp assembly, hoping that maybe it can do what I wanted it to do, but that suffer from the memory leak as well.

So I turned to the one solution that I knew would work, generating those assemblies in another app domain, and unloading that when it became too full. I kept thinking that I can’t do that because of the slowdown with cross app domain communication, but then I figured that I am violating one of the first rules of performance: You don’t know until you measure it. So I set out to test it.

I am only interested in testing the speed of cross app domain communication, not anything else, so here is my test case:

public class RemoteTransformer : MarshalByRefObject
{
    private readonly Transformer transfomer = new Transformer();

    public JObject Transform(JObject o)
    {
        return transfomer.Transform(o);
    }
}

public class Transformer
{
    public JObject Transform(JObject o)
    {
        o["Modified"] = new JValue(true);
        return o;
    }
}

Running things in the same app domain (base line):

static void Main(string[] args)
{
    var t = new RemoteTransformer();
    
    var startNew = Stopwatch.StartNew();

    for (int i = 0; i < 100000; i++)
    {
        var jobj = new JObject(new JProperty("Hello", "There"));

        t.Transform(jobj);

    }

    Console.WriteLine(startNew.ElapsedMilliseconds);
}

This consistently gives results under 200 ms (185ms, 196ms, etc). In other words, we are talking about over 500 operations per millisecond.

What happen when we do this over AppDomain boundary? The first problem I run into was that the Json objects were serializable, but that was easy to fix. Here is the code:

 static void Main(string[] args)
 {
    var appDomain = AppDomain.CreateDomain("remote");
    var t = (RemoteTransformer)appDomain.CreateInstanceAndUnwrap(typeof(RemoteTransformer).Assembly.FullName, typeof(RemoteTransformer).FullName);
    
    var startNew = Stopwatch.StartNew();
     
     for (int i = 0; i < 100000; i++)
     {
         var jobj = new JObject(new JProperty("Hello", "There"));

         t.Transform(jobj);

     }

     Console.WriteLine(startNew.ElapsedMilliseconds);
 }

And that run close to 8 seconds, (7871 ms). Or over 40 times slower, or just about 12 operations per millisecond.

To give you some indication about the timing, this means that an operation over 1 million documents would spend about 1.3 minutes just serializing data across app domains.

That is… long, but it might be acceptable, I need to think about this more.

Database assisted denormalization

Let us say that I have the homepage of the application, where we display Blogs with their Post count, using the following query:

select 
    dbo.Blogs.Id, 
    dbo.Blogs.Title,
    dbo.Blogs.Subtitle,
    (select COUNT(*) from Posts where Posts.BlogId = Blogs.Id) as PostCount
 from dbo.Blogs 

Given what I think thoughts of denormalization, and read vs. write costs, it seems a little wasteful to run the aggregate all the time.

I can always add a PostCount property to the Blogs table, but that would require me to manage that myself, and I thought that I might see whatever the database can do it for me.

This isn’t a conclusive post, it details what I tried, and what I think is happening, but it isn’t the end all be all. Moreover, I run my tests on SQL Server 2008 R2 only, not on anything else. I would like to hear what you think of this.

My first thought was to create this as a persisted computed column:

ALTER TABLE Blogs
ADD PostCount AS (select COUNT(*) from Posts where Posts.BlogId = Blogs.Id) PERSISTED

But you can’t create computed columns that uses subqueries. I would understand easier why not if it was only for persisted computed columns, because that would give the database a hell of time figuring out when that computed column needs to be updated, but I am actually surprised that normal computed columns aren’t supporting subqueries.

Given that my first attempt failed, I decided to try to create a materialized view for the data that I needed. Materialized views in SQL Server are called indexed views, There are several things to note here. You can’t use subqueries here either (likely because the DB couldn’t figure which row in the index to update if you were using subqueries), but have to use joins.

I created a data set of 1,048,576 rows in the blogs table and 20,971,520 posts, which I think should be enough to give me real data.

Then, I issued the following query:

select 
        dbo.Blogs.Id, 
        dbo.Blogs.Title,
        dbo.Blogs.Subtitle,
        count_big(*) as PostCount
from dbo.Blogs left join dbo.Posts
        on dbo.Blogs.Id = dbo.Posts.BlogId
where dbo.Blogs.Id = 365819
group by dbo.Blogs.Id,
        dbo.Blogs.Title,
        dbo.Blogs.Subtitle

This is before I created anything, just to give me some idea about what kind of performance (and query plan) I can expect.

Query duration: 13 seconds.

And the execution plan:

image

The suggest indexes feature is one of the best reasons to move to SSMS 2008, in my opinion.

Following the suggestion, I created:

CREATE NONCLUSTERED INDEX [IDX_Posts_ByBlogID]
ON [dbo].[Posts] ([BlogId])

And then I reissued the query. It completed in 0 seconds with the following execution plan:

image

After building Raven, I have a much better understanding of how databases operate internally, and I can completely follow how that introduction of this index can completely change the game for this query.

Just to point out, the results of this query is:

Id          Title                 Subtitle               PostCount
----------- --------------------- ---------------------- --------------------
365819      The lazy blog         hibernating in summer  1310720

I decided to see what using a view (and then indexed view) will give me. I dropped the IDX_Posts_ByBlogID index and created the following view:

CREATE VIEW BlogsWithPostCount 
WITH SCHEMABINDING
AS 
select 
    dbo.Blogs.Id, 
    dbo.Blogs.Title,
    dbo.Blogs.Subtitle,
    count_big(*) as PostCount
 from dbo.Blogs join dbo.Posts
    on dbo.Blogs.Id = dbo.Posts.BlogId
 group by dbo.Blogs.Id,
    dbo.Blogs.Title,
    dbo.Blogs.Subtitle

After which I issued the following query:

select 
        Id, 
        Title,
        Subtitle,
        PostCount
from BlogsWithPostCount
where Id = 365819

This had the exact same behavior as the first query (13 seconds and the suggestion for adding the index).

I then added the following index to the view:

CREATE UNIQUE CLUSTERED INDEX IDX_BlogsWithPostCount
ON BlogsWithPostCount (Id)

And then reissued the same query on the view. It had absolutely no affect on the query (13 seconds and the suggestion for adding the index). This make sense, if you understand how the database is actually treating this.

The database just created an index on the results of the view, but it only indexed the columns that we told it about, which means that is still need to compute the PostCount. To make things more interesting, you can’t add the PostCount to the index (thus saving the need to recalculate it).

Some points that are worth talking about:

  • Adding IDX_Posts_ByBlogID index resulted in a significant speed increase
  • There doesn’t seem to be a good way to perform materialization of the query in the database (this applies to SQL Server only, mind you, maybe Oracle does better here, I am not sure).

In other words, the best solution that I have for this is to either accept the cost per read on the RDBMS and mitigate that with proper indexes or create a PostCount column in the Blogs table and manage that yourself. I would like your critique on my attempt, and additional information about whatever what I am trying to do is possible in other RDMBS.

Finding chrome bugs

That one was annoying to figure out. Take a look at the following code:

static void Main(string[] args)
{
    var listener = new HttpListener();
    listener.Prefixes.Add("http://+:8080/");
    listener.Start();

    Console.WriteLine("Started");

    while(true)
    {
        var context = listener.GetContext();
        context.Response.Headers["Content-Encoding"] = "deflate";
        context.Response.ContentType = "application/json";
        using(var gzip = new DeflateStream(context.Response.OutputStream, CompressionMode.Compress))
        using(var writer = new StreamWriter(gzip, Encoding.UTF8))
        {
            writer.Write("{\"CountOfIndexes\":1,\"ApproximateTaskCount\":0,\"CountOfDocuments\":0}");
            writer.Flush();
            gzip.Flush();
        }
        context.Response.Close();
    }
}

FireFox and IE have no trouble using this. But here is how it looks on Chrome.

image

To make matter worse, pay attention to the conditions of the bug:

  • If I use Gzip instead of deflate, it works.
  • If I use "text/plain” instead of “application/json”, it works.
  • If I tunnel this through Fiddler, it works.

I hate stupid bugs like that.

Tags:

Published at

Originally posted at

Comments (17)

Hunt the bug

The following code will throw under certain circumstances, what are they?

public class Global : HttpApplication
{
       public void Application_Start(object sender, EventArgs e)
       {
            HttpUtility.UrlEncode("Error inside!");
       }
}

Hint, the exception will not be raised because of transient conditions such as low memory.

What are the conditions in which it would throw, and why?

Hint #2, I had to write my own (well, take the one from Mono and modify it) HttpUtility to avoid this bug.

ARGH!

Tags:

Published at

Originally posted at

Comments (5)

Application databases and external integration points

Dave has an interesting requirements in his project:

We're not in control of where the data is located, how it's stored and in what configuration. In most cases employees need to be retrieved from a Active Directory (There's is no 'login', the Window Identity determines what a user can or can't do). Customer contacts are usually handled by the helpdesk department and each contact moment is logged in a helpdesk database. The customer (account information) itself often needs to be retrieved from an IBM DB2 database.

What you have is not one application that needs to access different data sources. That would be the wrong way to think about this, because it introduce a whole lot of complexity into the application.

image

It is much better to structure the application as an independent application with each integration point made explicit. Instead of touch the DB/2 database, you put a service on it and access that.

image

This isn’t just “oh, SOA nonsense again”, it is an important distinction. When you tie yourself directly to so many external integration points, you are also ensuring that whenever there is a change in one of them, you are going to be impacted. When you put a service boundary between you and the integration point (even if you have to build the service), the affect is much less noticeable.

Also, did you notice the blue lines going from the databases? Those are background ETL processes, replicating data to/from the databases. It allows us to handle situations where the integration points are not available.

In short, design you application so it doesn’t stick its nose into other people’s databases. If you need data from another database, put a service there, or replicate it. You’ll thank me when you app stays up.

NH Prof & usage data

There seems to be some suspicion about the usage data from NH Prof that I published recently.

I would like to apologize for responding late to the comments, I know that there are some people who believe that I have installed a 3G chip directly to my head, but I actually was busy in the real world and didn’t look at my email until recently. The blog runs on auto pilot just so I’ll be able to do that, but sometimes it does give the wrong impression.

So, what does NH Prof “phone home” about?

Well, the data is actually divided into two distinct pieces. Most of the data (numbers, usages, geographic location, etc) actually comes from looking at the server logs for the update check.

Another piece of data that the profiler reports is feature usage. There are about 20 – 30 individual features that are being tracked for usage. What does it means, tracking a feature?

Well, here are three examples that shows what gets reported:

image

image

image

There is no way to correlate this data to an individual user, nor is there a way to track the behavior of a single user.

I use this data mainly in order to see what features are being used most often (therefore deserving the most attention, optimizations, etc).

Those are mentioned in the product documentation.

To summarize:

  • I am not stealing your connection strings.
  • I don’t gather any personally identifying data (and I am at somewhat at a loss to understand what I would do with it even if I did).
  • There is never any data about what you are profiling being sent anywhere.

I hope this clear things out.

How to become a speaker?

I get asked that quite frequently. More to the point, how to become an international speaker?

I was recently at a gathering where no less than three different people asked me this question, so I thought that it might be a good post.

Note: this post isn’t meant for someone who isn’t already speaking. And if you are speaking but are bad at it, this isn’t for you. The underlying assumption here is that you can speak and are reasonably good at it.

Note II: For this post, speaking is used to refer to presenting some technical content in front of an audience.

Why would you want to be a speaker anyway?

I heard that it is actually possible to make a living as a speaker. I haven’t found it to be the case, but then again, while I speak frequently, I don’t speak that frequently.

There are several reasons to want to be a speaker:

  • reputation (and in the end, good reputation means you get to raise your rates, get more work, etc).
  • contacts (speaking put you in front of dozens or hundreds of people, and afterward you get to talk with the people who are most interested in what you talked about)
  • advertising for your product (all those “lap around Visual Studio 2010” are actually an hour long ad that you paid to see :-) ).

I’ll focus on the first two, reputation & contacts gives you a much wider pool of potential work that you can choose from, increase the money you can make, etc.

So how do I do that, damn it?

Honestly, I have no idea. The first time that I presented at a technical conference, it was due to a mixup in communication. Apparently when in the US “it would have been delightful” means “we regret to inform”, but in Israel we read that as “great, let us do it”, and put the guy on the spot, so he had to scramble and do something.

Okay, I lied, I do have some idea about how to do this.

Again, I am assuming you are a reasonably good speaker (for myself, I know that my accent is a big problem when speaking English), but there are a lot of reasonably good speakers out there.

So, what is the answer? Make yourself different.

Pick a topic that is near & dear to your heart (or to your purse, which also works) and prepare a few talks on it. Write about it in a blog, comment on other people blogs about the topic. Your goal should be that when people think about topic X, your name would be on that list.  Forums like Stack Overflow can help, writing articles (whatever it is for pay or in places like CodeProject). Join a mailing list and be active there (and helpful). Don’t focus on regionally associated forums / mailing list, though. The goal is international acknowledgement.

This will take at least a year, probably, for people to start recognizing your name (it took over 2 years for me). If it is possible, produce a set of tools that relate to your topic. Publish them for free, and write it off as an investment in your future.

For myself, NHibernate Query Analyzer would a huge boost in terms of getting recognized. And Rhino Mocks was probably what clinched the deal. I honestly have no idea how much time & effort I put into Rhino Mocks, but Ohloh estimate that project at $ 12,502,089(!). While I disagree about that number, I did put a lot of effort into it, but it paid itself off several times over.

If you don’t have a blog, get one. Don’t get one at a community site, either. Community sites like blogs.microsoft.co.il are good to get your stuff read, but they have a big weakness in terms of branding yourself. You don’t want to get lost in a crowd, you want people to notice who you are. And most people are going to read your posts in a feed reader, and they are going to notice that the community feed is interesting, not that you are interesting.

Post regularly. I try to have a daily post, but that would probably not be possible for you, try to post at least once a week, and try to time it so it is always on the same date & time. Monday’s midnight usually works.

Okay, I did all of that, what now?

Another note, this is something that you may want to do in parallel to the other efforts.

Unless you become very well known, you won’t be approached, you’ll have to submit session suggestions. Keep an eye on the conferences that interest you, and wait until they have a call for sessions. Submit your stuff. Don’t get offended if they reject you.

If you live in a place that host international conferences (which usually rule Israel out), a good bet is to try to get accepted as a speaker there. You would be considerably cheaper than bringing someone from out of town/country. And that also play a role. Usually, if you managed to get into a conference once, they’ll be much more likely to have you again. They have your speaker eval, and unless you truly sucked (like going on stage and starting to speak in Hebrew at Denmark), and that gives them more confidence in bringing you a second time.

And that is about it for now.

Contrasting UberProf & RavenDB from business perspective

I was recently asked to contrast the business decisions related to the profiler and RavenDB. I thought that it would make an excellent post.

There are a lot of aspects to thing about here, actually. The profiler is an add on tool, it is only useful if you are using one of the supported OR/Ms, but if you do… it:

  • has very low barrier to entry, you need to reference the dll and add a single line of code.
  • provides immediate value, you can see the benefits that it gives you.
  • have very few moving parts that users can break.

NH Prof was released on Jan 1st, 2009. The first sale happened on Jan 2nd, 2009 (thanks Yann!).

The lead time for the profiler tends to be very short. Because there is very little that you need to invest and there is a lot that you gain. Yesterday I introduced a guy to the profiler as a way to help him see what his app is doing, he made a purchase about an hour later.

That is excellent news from my point of view. :-)

RavenDB, on the other hand:

  • has a very high barrier to entry, not so much from technical perspective, but from adoption one.
  • requires you to make significant changes to the way you work.
  • takes time to show why it is beneficial.
  • requires payment only when you actually goes live.
  • requires much higher degree of support for users.

That means that while it takes a few minutes to decide if you want the profiler (and the rest of the 30 days trial is spent getting corporate approving it :-) ), for RavenDB the lead time until you pull out your credit card is much longer.

That has some interesting implications. I actually spent a lot more (time & money) in the profiler than I spent (outright) on RavenDB. But the major difference is what type of investment that would be.

There is a term in economics called sunk cost, that is all the costs associated with building a product up to the point you released it. That is money already spent. But what usually matter a lot more is that once you reached the release point, can the cash flow from a product justify the continued work on the product ( and maybe, at some point, pay for the product development) ?

NH Prof was a big investment for me, but money started coming in shortly afterward, and it became apparent that it was sustainable product. For RavenDB, the costs have actually been a lot lower (since the majority of them represented my own time), but the expectation is that it would take about a year or two before it would be be possible to say if RavenDB is a sustainable product.

In that sense, RavenDB represent a lot riskier investment. If RavenDB wasn’t rattling in my head for so long, I would have probably would have gone to something with much shorter lead time.

It is interesting to me to see how many factors there are in those sort of decisions.  So many things to balance.

European NHibernate Day

I thought that I would announce that I am following JAOO, I am going to head off to the European NHibernate Day, a full day conference dedicated to NHibernate.

I am going to show off a lot of the new features in NHibernate 3.0, Steve Strong is going to discuss Linq to NHibernate and what you can do with it. For extra fun, I am also to spend an hour discussing worst practices in NHibernate. That is going to be an hour full of ranting & raving, which should be amusing.

Profiler Usage Analysis

I have been doing some studying of how people are using the profiler, and it shows some interesting results.

  • Typical profiler session is :
    • NH Prof : 1:15 hours
    • Hibernate Profiler: 1:05 hours
    • EF Prof: 42 minutes
    • L2S Prof: 50 minutes
  • 83% of the profiler users have used it more than once. In fact, here is the # of usages:
    image
    So we have over 50% that use it regularly.
  • Most people use it predominately to view the statements executed:
    image
    This means that the reports are getting comparatively little attention.
  • The results per geographical location are also interesting:
    image

JAOO 2010

I’ll be speaking in JAOO 2010, giving an Introduction to RavenDB in the NoSQL track. This is going to be the first time that I am going to show RavenDB in a major conference, and I am just a tiny bit nervous. This is going to be interesting, because I am going to present to people who are experienced in NoSQL solutions.

In addition to my talk (obviously the highlight of the entire conference :-) ), there are other sessions that I really want to be at:

  • Rx: curing your asynchronous programming blues - Erik Meijer. This is something that have been popping into my sights for a while, but never long enough to sit down and really study it. So I think I’ll take a shortcut through this session :-)

  • Lessons Learned in Large HTTP-Centric Systems – Jim Webber. There are two reasons to go to one of Jim’s talks. The first is the content, the second is the actual presentation style. Take a look at some of his talks to see what I means.

  • Building a Pet Store that will Survive Cyber - Cameron Purdy. This presentation interests me mostly because I don’t believe that what is suggest can be delivered (virtually unlimited scaling in a generic fashion), so it would be very interesting to see what is going on there.

  • Where to put data - Michael T. Nygard. I usually learn new things from Michael, so I’m looking forward to seeing what he has to say about this.

And the conference gods have actually managed to set things up so I’ll be able to be in all of those sessions, and not be busy giving a parallel session.

Abolishing guids

This seems like a minor thing, but it was raised during the design phase of SilverQueues/AgQueues/LuminQ (can someone spot the jokes?).

How does a client identify itself to the server? Consider the fact that there are likely to be clients popping up all the time, so we can’t pre-assign them with meaningful names. The suggestion was brought up to use GUIDs to identify the client. That has the benefit of simplicity. It is easy to write, easy to implement and easy to understand from a conceptual level.

It got shot it down quickly, because while it is easy, I have never met the GUID that I could honestly say I have seen before. Recognizing clients by GUIDs is going to make it much harder to work with the system, however, because GUIDs are so opaque.

Instead of doing that, we will probably go with “clients/329392” or something similar, because that one is human readable. In the end, if you can make it easier to work with, it pays off, big time.

Frustration, thy name is Apple

Anyone who raves about Apple products has never seen what happens when you leave the straight & narrow:

image

Yes, I uninstalled & installed it (multiple times).

No idea what is going on, but pretty annoyed.

On Fluent NHibernate

I noticed some jokes in twitters going around, people saying stuff like “Using Fluent NHibernate, better not let @ayende find out about it”.

I think that I better clarify a few things in this regard. When FNH initially came out, what I really wanted was auto mapping, and it didn’t have that at the time (almost two years ago), but it does now. I also have some disagreement with the API choices (specifically, the decision to rename things from the NH mapping names).

But I am generally grouchy about code (whatever I wrote it or not), so that isn’t unusual. I have used FNH in the past in a few of my projects, but it always came to a point where I needed more from NHibernate and FNH was limiting to me. Please note, I am explicitly referring to myself here, not the general public. I have been working with NHibernate for over 6 years now, and working with NHibernate at NHibernate’s level is place that I am most conformable.

That isn’t the case for a wide variety of other users. Today I had an interesting experience, of guiding a customer from no application to a full blown app using FNH. The entire process (with someone to whom it was the first introduction to NHibernate) took less than an hour, was remarkably pleasant and was focused on exactly the right things.

The Fluent NHibernate team has managed to create a truly remarkable project, well done.