Ayende @ Rahien

Refunds available at head office

Tip: Searching Stored Procedures Text

Here is another tip for those who need to explore big SQL databases with a lot of stored procedures and quite a bit of logic. This will gives you a list of all the procedures that has a certain string in them:

CREATE PROCEDURE dev_FindInSP

      @str nvarchar(max)

AS

select distinct name from sys.procedures procs join syscomments comments

on procs.object_id = comments.id where [text] like '%' + @str + '%'

VB.Net oddity

Can someone explain why VB.Net has a keyword called "AddessOf" ? I would expect such a thing from a low level language, and even C# could be excused if it had this. But AdressOf seems to me like a throwback to ancient times where passing function pointers around was the norm.   

Tags:

Published at

How to kill the C# Compiler


My legendary skills in breaking code have now reached a new level, in which I managed to break the C# Compiler. Check out this code:

 

public delegate void Proc();

 

public class CrashAndBurnAttribute : System.Attribute

{

    public CrashAndBurnAttribute(Proc p)

    { }

}

 

public class CrashAndBurnClient

{

    [CrashAndBurn(delegate { return; })]

    public void Foo()

    {

    }

}

If you try to run compile it from VS.Net you’ll get a nice crash dialog, and the result of running the compilation from the command line is:

Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.42

for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727

Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.

 

crash.cs(11,16): error CS1706: Expression cannot contain anonymous methods

error CS0583: Internal Compiler Error (0xc0000005 at address 5A10A8E0): likely culprit is 'COMPILE'.

       

        An internal error has occurred in the compiler. To work around this problem, try simplifying or changing the program near the locations listed below. Locations at the top of the list are closer to the point at which the internal error occurred. Errors such as this can be reported to Microsoft by using the /errorreport option.

        

crash.cs(11,25): error CS0585: Internal Compiler Error: stage 'COMPILE'

crash.cs(12,14): error CS0584: Internal Compiler Error: stage 'COMPILE' symbol 'CrashAndBurnClient.Foo()'

crash.cs(12,14): error CS0584: Internal Compiler Error: stage 'COMPILE' symbol 'CrashAndBurnClient.Foo()'

crash.cs(12,14): error CS0584: Internal Compiler Error: stage 'COMPILE' symbol 'CrashAndBurnClient.Foo()'

crash.cs(9,14): error CS0584: Internal Compiler Error: stage 'COMPILE' symbol 'CrashAndBurnClient'

crash.cs: error CS0584: Internal Compiler Error: stage 'COMPILE' symbol '<global namespace>'

crash.cs: error CS0586: Internal Compiler Error: stage 'COMPILE'

error CS0587: Internal Compiler Error: stage 'COMPILE'

error CS0587: Internal Compiler Error: stage 'BEGIN'

This is actually sad, since I really wanted to be able to do pass delegates to attributes.

Performance tidbits


I recently had to write a lazy loaded property that picked what its value will be based on the environment it run on. Basically, I had the same old problem of keeping data for the current context only. This usually means the current thread, but in ASP.Net, it is the current request.

My solution for it was simple:

  .

 

So I can store/retrieve data without caring in what environment I’m running on by simply using:

 

CurrentData.Items["current-db-connection"] = new SqlConnection();

I need this ability to store data that is important for the current logical execution thread (be that an ASP.Net request or a real thread). The usual method of implemention the Items property on the CurrentData class would be something like:

public static ICurrentDataStore Items

{

    get

    {

        if (items == null)

            items = CreateCurrentDataStoreForEnviornment();

        return items;

    }

}

 

But I thought that I could get rid of the null check, this should allow the JIT to view the property as a simple accessor, so it would be just as fast as accessing a field directly. I could’ve simply instasitated the items field to the proper value based on the environment, but I wanted to mimic a real lazy loaded property. I ended up with this class:

 

public class DataStoreSwitcher : ICurrentDataStore

{

    private Action<ICurrentDataStore> set;

 

    public DataStoreSwitcher(Action<ICurrentDataStore> set)

    {

        this.set = set;

    }

 

    #region ICurrentDataStore Members

 

    public object this[object key]

    {

        get

        {

            ICurrentDataStore real = EnsureCorrectDataStoreForEnvironment();

            return real[key];

        }

        set

        {

            ICurrentDataStore real = EnsureCorrectDataStoreForEnvironment();

            real[key] = value;

        }

    }

 

    private ICurrentDataStore EnsureCorrectDataStoreForEnvironment()

    {

        ICurrentDataStore real;

        if (HttpContext.Current == null)

            real = new CurrentThreadDataStore();

        else

            real = new CurrentRequestDataStore();

        set(real);

        return real;

    }

 

    #endregion

}

 

I initialize the variable like this:

 

private static ICurrentDataStore items =

    new DataStoreSwitcher(

    delegate(ICurrentDataStore real)

    {

        items = real;

    });

 

In this particular point, there is very little to be gain from this technqiue, and if the case was a hefty resource (choosing database connection, for instance), it can certainly be useful. Beside, I like using delegate J, and it’s a cool thing to do.

 

Tags:

Published at

Performance tuning mystery

I got a stored procedure that was taking an undue amount of time to run, in was about half an hour, and that was really bad. I tried looking at the execution plan and add an index to make it faster, but it wasn't that effective.

The query was an update from a nested select (which contained a rather complex condition and a couple of joins), and I couldn't really figure out how to make it better. I tried running the pieces individually, and they run reasonably fast, so it was the interactions between them that killed the performance. The execution plan showed that the joins were costing very much in terms of performance.

During my attempts to figure it out, I moved the select into a temporary table, and then updated from it, like this:

SELECT To, SUM(p.Amount) TotalAmount
INTO #TempPaymetsAmounts
FROM Payments p, Payments p2, Orders o
WHERE ComplexCondition
GROUP BY [To]

UPDATE PaymentsTotal
SET Total = TotalAmount
FROM #TempPaymetsAmounts
WHERE SimplerCondition

That query run in less than 10 seconds.

The only thing that I did was move the select into a temporary table, but I'm not sure why this should have such a profound effect on the performance of the query.

Tags:

Published at

Code reuse, cursors and NIH, oh my!

I recently discovered that a certain class of programmers has a really bad case of Not Invented here. It got so bad that they decided to implement JOIN and GROUP BY by using cursors. Of course, due to the nature of the problem, they were so excited by their new discovery that they immediately rushed to propogate it thorughout the code base, using none else than the tried and true method of copy-paste-change-one-charater-and-then-move-on-never-looking-back.

I won't speak of the performance of this method, but I think that you can guess that there are orders of magnitude difference between this new way of dealing with the database and the standard method of operation in the industry.

Tags:

Published at

SQL Frustrations

I knew that it was bound to happen.

I started running into highly mysterious errors in SQL Server. The strangest among them is the Missing Columns Mystery.

When I run a stored procedure that calls to other SPs, and so on, I get an error about invalid columns somewhere deep in the stack. But when I run the SP directly, it passes perfectly.

Of course that when I check it, the table has the supposedly missing columns. It drives me crazy, and I can't really get the error consistently, if I run it for a couple of time, it seems to fix itself, but I've no idea how.

Tags:

Published at

Hierarchical queries with Common Table Expressions

[Via Mitch Denny's Blog ] [Via Sql Down Under]

It's usualy hard to efficently get hierarchical data using SQL. But apperantely SQL Server can do that easily. The key to this is to create common table expression that recursively points to itself, like this:

WITH AllEmployeesOf([EmployeeId], [Name], [ManagerName])

AS

(

      SELECT m.[EmployeeId], m.[Name], convert(nvarchar(50),null) as ManagerName

            FROM [Employees] m WHERE @employee_id = m.[EmployeeId]

      UNION ALL

      SELECT e.[EmployeeId], e.[Name], AllEmployeesOf.[Name] as ManagerName

            FROM [Employees] e JOIN AllEmployeesOf ON

      e.[ManagerId] = AllEmployeesOf.[EmployeeId]

)

SELECT * FROM AllEmployeesOf

Before this I would resort to either getting all of the rows or issuing several queries. Nice.

Tags:

Published at

Put down that cursor and step away from the database, Mister!

I'm seeing too much cursors around right now. Way too many. I also keep seeing the same code over & over again. I know that I'm in trouble because to work with this code base I need to:
  • constantly use grep (and deriatives, like AstroGrep, which is cool);
  • get a diff program that support drag & drop from the shell as well as quickly changing only one of the files. (Found WinMerge, which looks cool);
  • pronounce variable names out loud in order to understand their meaning*;
  • watch ingenious workarounds to all sorts of limitations in the number of concurrent nested corsurs in the database;
  • cut code by 70% by deciding that I don't need to verify stuff over & over again.
Argh! Excuse me while I go bang my head against the wall.

* (Hebrew words in English characters)

Tags:

Published at

Some thoughts about Indigo

So I was at the Israeli C# User Group meeting today, and the topic was Indigo, or in the odious name Windows Communications Foundation. Ido had a great presentation about it. I was surprised by how familiar the concepts were to me. There were quite a bit of stuff that that was very familiar to me from Castle (like TransactionScope, TransactionalAttribute, etc).

The architecture looks very extensible and well thought out, and I like the extensability points that they have there. The ability to intercept a method is invaluable in complex scenarios, since you can add behavior to it easily.

The Indigo way of doing it is just cool:

public class IndigoCacheAttribute : Attribute, IOperationInvoker 
{
...
}

The attribute is both the marker and the actor in this case. 

I wish I had something to do with it now, beyond writing the umpteenth calculator service, of course. One really nice thing that I liked about it was that you don't see XML if you don't want to (and I usually won't want to :-)). The programming model seems very easy to work with, with one exception.

You've better not expose ordinary method signatures, but expose methods that takes and return DTOs (or messages, if you like to think about it this way). The reason for that is that you can add stuff later without breaking your code. I know that usually SOA guys like to say Message This & Message That, but when I see something like:

[DataContract]
public class Book
{
 [DataMember]
 public string Title;
 [DataMember]
 public string ISBN;
}

I think DTO, not Message. The fact that it is XML under the cover has better remain under the cover, I much rather work with objects than with raw XML. Now I have another new thing to add to the stack of stuff waiting to be explored, it's just about my height now, and it's getter bigger.

Tags:

Published at

Israeli C# User Group Meeting

I was at the C# User Group meeting today, and Ido gave a talk about Indigo (Windows Communication Foundation - Horrible name) which I found facinating. I'll post about it shortly, I hope.  There was also a huge raffle, but I didn't win anything.

It was the first time that I managed to dedicate time to go to the user group meeting, and if all the talks are at this level, it's well worth it.

Tags:

Published at

Tiny IIS Tip

Yesterday I wrote a long post and lost it because my session has timed out and I had no warning. Today I got a question about this subject, how to move a client browser to a logout page when their client expires.

The solution for this is very easy, actually (but won't help losing work because of a session timeout, unfortantely), you can use this command in the HTML:

<META HTTP-EQUIV="Refresh" CONTENT="1170; URL=/logout.aspx">

This requires work, though, and can get a little tedious after a while. But what I found is that you can set IIS to send it automatically on all requests, so you set it up once, and forget about it. Very simple to setup (Web Site Properties -> Http Headers -> Add), and will work on everything without a single line of code changing.

If you want to prevent users from losing work, you need to use javascript to popup a warning, or make sure that the state is not lost if the client session timed out. (The warning in ati.com site is extremely annoying, for instance).

Sql Tips

The sp_helptext stored procedure can give you the text of any non-table object in the database. I find this incredibly useful, as it saves the need to locate the object in the management studio UI.

Here is an example:
sp_helptext Very_Nasty_Trigger

Results:

Text
------------
CREATE TRIGGER Very_Nasty_Trigger
   ON  NASTY INSTEAD OF INSERT
AS
BEGIN
    SET NOCOUNT ON;
    Declare @stmt nvarchar(max)
    SET @stmt = (select [text] from INSERTED);
    exec sp_executesql @stmt

END



Now I need to find something that works on the table level as well.
Tags:

Published at

Triggers & Auditing: Take 2

Okay, my post about auditing & triggers generated some good questions, so here are a couple of answers.

Before anything else, I made a mistake in the schema and didn't change the defination of the [Posts_Audit_Trail]'s post_id, it shouldn't be an identity, here is the new schema:

CREATE TABLE [dbo].[Posts_Audit_Trail] (

      [audit_post_id] [int] IDENTITY (1, 1) NOT NULL,

    [post_id] [int] NOT NULL ,

    [post_title] [varchar] (50) NULL ,

    [post_blogid] [int] NULL ,

      [changed_by] nvarchar(255),

      [changed_on] datetime,

      [changed_action] nchar(10)

) ON [PRIMARY]

First, what about NHibernate, and other tools that relies on the changed rows count, here is a simple way to avoid breaking them by using the SET NOCOUNT ON/OFF option (I'm showing it for the update trigger only, but it should be clear how to use this):

CREATE TRIGGER Posts_AuditUpdate ON [Posts] FOR UPDATE

AS

      SET NOCOUNT ON

      INSERT INTO [Posts_Audit_Trail]

      (

            [post_id],[post_title],[post_blogid],

            [changed_by],[changed_on],[changed_action]

      )

      SELECT

            post_id,post_title,post_blogid,

            CURRENT_USER,GETDATE(), 'UPDATE-NEW'

      FROM Inserted

      UNION ALL

      SELECT

            post_id,post_title,post_blogid,

            CURRENT_USER,GETDATE(), 'UPDATE-OLD'

      FROM Deleted;

      SET NOCOUNT OFF

Second, about getting the application user instead of the database user (which is a problem in any 3-tiers application), this one require help from the application to get this information. Usually it will be written to the table anyway (last_changed_by, or something like that), and then copied to the auditing table.

Third, using this with complex scenarios. This is a fairly simple scenario, where all you need is to have some sort of tracking on what happened to your data. If you need to have complex things happening (log outside of SQL Server, send mail when sensitive rows change, etc), that should happen on the business layer. While you can make it happen in the DB, you really don't want it to happen there, think about poor you in 5 months time, staring at the screen trying to figure out why the CEO is getting flooded with emails. Like I mentioned, triggers are usually invisible, and that can be bad if you want to find out why stuff is happening. I wouldn't want to track an email sent from the database (and boy, did I see much worse done in the DB layer) by a trigger.

Just think about the time that it would take to think about looking at the triggers. This is something for the quick & simple solutions, not logic whatsoever, just log all the changes and forget about it until you need it.

For anything that require real logic, go with the code, if you've a good DAL, it should be easy enough to add that in (even with extensability) without affecting the rest of the application.

Changes

I look back at my posts for the last couple of weeks, and the are full of SQL stuff. One of the nicest things in having a blog is that you can go back and see what you did at various points in time. I knew that I blog about it, and indeed, I found out that I wrote my first trigger less than three months ago. That was a half day ordeal involving me and a couple of co-workers, checking out all sorts of crazy stuff (recursive triggers, for instance). But writing the last post took me 5 minutes for the SQL, and ten for the text between them :-)

Triggers and auditing

There is some discussion in the Castle Mailing List about auditing using ActiveRecord. I firmly believe that auditing is one of those things that should be done in the database layer. It's very simple to set up, and then you can forget about it until you need it. Here is an example of how simple it is.

Like most of my database samples, it is going to be a Blog->Posts sample. The issue is auditing in a multiply users per blog environment. Here is how the Posts table looks:

CREATE TABLE [dbo].[Posts] (

    [post_id] [int] IDENTITY (1, 1) NOT NULL ,

    [post_title] [varchar] (50) NULL ,

    [post_blogid] [int] NULL

) ON [PRIMARY]

Now, let's add support for auditing. First, the audting table:

CREATE TABLE [dbo].[Posts_Audit_Trail] (

    [post_id] [int] IDENTITY (1, 1) NOT NULL ,

    [post_title] [varchar] (50) NULL ,

    [post_blogid] [int] NULL ,

    [changed_by] nvarchar(255),

    [changed_on] datetime,

    [changed_action] nchar(10)

) ON [PRIMARY]

As you can see, it's an exact duplicate of the Posts table, but with a couple of extra columns to track what happened, when and by whom. Next step, writing a trigger for saving deleted posts:

CREATE TRIGGER Posts_AuditDelete ON [Posts] FOR DELETE

AS

      INSERT INTO [Posts_Audit_Trail]

      (

            [post_id],[post_title],[post_blogid],

            [changed_by],[changed_on],[changed_action]

      )

      SELECT

            post_id,post_title,post_blogid,

            CURRENT_USER,GETDATE(), 'DELETE'

      FROM Deleted;

As you can see, there is nothing much to it, we shove all the deleted records into the audit trail table, and we are done. Doing the same for inserts is very easy:

CREATE TRIGGER Posts_AuditInsert ON [Posts] FOR INSERT

AS

      INSERT INTO [Posts_Audit_Trail]

      (

            [post_id],[post_title],[post_blogid],

            [changed_by],[changed_on],[changed_action]

      )

      SELECT

            post_id,post_title,post_blogid,

            CURRENT_USER,GETDATE(), 'INSERT'

      FROM Inserted;

The only midly tricky part is saving both the old and new version of a post when we are doing an update, and even that is very simple:

CREATE TRIGGER Posts_AuditUpdate ON [Posts] FOR UPDATE

AS

      INSERT INTO [Posts_Audit_Trail]

      (

            [post_id],[post_title],[post_blogid],

            [changed_by],[changed_on],[changed_action]

      )

      SELECT

            post_id,post_title,post_blogid,

            CURRENT_USER,GETDATE(), 'UPDATE-NEW'

      FROM Inserted

      UNION ALL

      SELECT

            post_id,post_title,post_blogid,

            CURRENT_USER,GETDATE(), 'UPDATE-OLD'

      FROM Deleted;

As you can see, it's very simple to add auditing in the database layer. The next step would be writing some sort of UI for this (if this is needed for someone other than the sys admin), but you would need to do it anyway.

The complexity of doing it somewhere else is not trivial. If you've a good DAL, you can add it there, but why bother when the database can do it for you for free?

The only reasons I can think about is that your database doesn't support triggers, or you need to have some business rules involved with the auditing.

UPDATE: What about NHibernate? Since it check the modified rows count, wouldn't that break it (and other tools that rely on the rows count? In general, yes, but there is a very simple solution. In this case I didn't bother, but all you need to do is to use:

SET NOCOUNT ON  

-- Trigger actions

SET NOCOUNT OFF

Feeling Great

I went to work today expecting to be there until it would all work, and I expected it to start working (if at all) only very late into the night. I got off today before I usually leave, with a really great feeling. Everything that should've worked by tonight worked. It is a great feeling. I'm not sure how we did it, because last week I would've probably said that it can't be done by the time frame we had, but we did :-)

Now it's a show & tell and off to the next stage, spit & polish.

Tags:

Published at

Exception Handling - the ugly

If this was an art, I would name it "Without Words" 

HelperObj obj;
int Foo()
{
  try
  {
     return obj.DoWork()
  }
  catch (NullReferenceException )
  {
    obj = new HelperObj();
    return Foo();
  }
}

Some people seems to like exceptions all too much.

History Of Languages

Go and read this, a history of C & History of Basic.

I laughed so hard I couldn't breath.

Tags:

Published at

How to debug your head agasint the wall


I just watch a very nice diagram of SSIS doing its work, and throwing 70% of the input rows into the garbage. I was a bit concerned about that, as you may imagine. I talked before about the experiance of debugging a problem with SSIS, I present this case as a case study.
First, I should mention that I pipe all the errors to a table that contains three columns (which are provided by SSIS). The columns are ErrorColumn, ErrorCode and ErrorOutput. So when I get an error, I just check the error table and see where they happen. In this case, I got the following results (times 70 thousands):

ErrorCode         ErrorColumn        ErrorOutput
-1073442796    162                   null


"Hm, that is strange", I think, and goes to check the SSIS package for the column. I get to the point of the failure, and then I need to use the advance editor to go through each and every one of the dozens of columns that I have there, searching for a matching LineageId.
I have no idea what a LineageId is, by the way. I can't really think of why I would want to use that as my error column identifier rather than the rather nice column name.
Leaving that aside, I'm left with the entirely too opaque error code. Searching Google for this gives nothing, nor does search MSDN. So I'm basically getting told: "I know what the error was, but I am not going to tell you what it was."
After checking that the connection manager was fine, I noticed that it was converting a string to an int, so I tried trimming the string before converting, but it wasn't that, I kept getting the same error on the same column.
I outputted the offensive column as text to the table, and looked at the values (it sounds easy to do, but it takes over 5 minutes to do this, and adjust everything that it will work correctly).
All the values were perfect numbers, and SQL Server had no problem to convert them to numbers.
This is usually the point where I resort to physical violance and hit my head against the wall until I either find the answer or pass out.
After picking myself from the floor and passing the "Who I am? Where am I? What is this big bulge on my head?" stage, I had tried looking over the raw data. That was fine, but I noticed that several numbers were quite a bit bigger than the others. Then I checked the type of the column in the data base numeric(20,0) and the type I was converting to numeric(3,0), and it was obvious what the problem was.
Now, if I could get something like "truncation would occur", that would have saved me several hours of head butting and severe annoyance.

Tags:

Published at

More annoying stuff about SSIS

  • You can't close a package after execution. You've to manually stop debugging and then close the package.
  • No way that I could find to look at just the errors/ warnings from the log.
  • The Flat File Source just blithedly ignored the fact that the file doesn't exists and carried on its work. I need this behaviour in several places, so it would be useful to do it, but I've no idea why or how this works.
  • You can't edit connection managers without a live connection to use.
  • You don't get any results when you search for the errors on Google.

I'm not sure how I got the situation below, but I did get it. I'm pretty sure that this shouldn't happen. It got so many stuff wrong that I don't know where to begin checking. More so, I don't know how to check this stuff.

Tags:

Published at

SQL Management Studio


Here is one of the more annoying things in SQL Management Studio. If you try to use the Summary tab to skip the [dbo] prefix in the treeview, you often get this error:



I'm getting this thing three or four times a day, and then the summary tab stops showing stuff.
Tags:

Published at