SSISThe backlash

time to read 5 min | 874 words

Jamie Thomson has responded to my I Hate SSIS post, he agreed that most of them are valid concerns, but he also brought up some counter arguments that I wanted to respond to. The first thing that I wanted to mention is that JT has a solution for watching variables content, and I have updated the previous post & the "I Hate SSIS" page accordingly.

Now for the parts I disagree with:

Ayende: I wish I had a dollar for every time that SSIS kept track of something it shouldn't. Be it the old configuration, hard coding the connection string inside the package and completely ignoring the configuration
JT: I have never seen this happen in three years of using the product. If it seems as though configurations are not being used then they have been setup wrongly. That is not to say that the process of setting them up couldn't be improved.

I have it happening pretty much every day. Here is a simple story, I had canceled the package configuration, reconfigured a data source to point to the test database, and run the package. It executed itself against the production database!! Once I found that out, I managed to see it do it twice (while the data source point to the test database!), but I haven't been able to consistently reproduce it since. I can assure the reader that I have taken the time to understand how this thing works, out of sheer necessity. It still manage to mess with me.

Ayende: Security? Who needs that
JT: Is this a serious comment?

Please do not try to put my words out of context, the full statement was: "Security? Who needs that: I should also mention that SSIS packages requires sysadmin rights to run when scheduled as a job. Which of course it will not tell you until you have run the job. I am aware of the agent proxy solution"

Ayende: I should also mention that SSIS packages requires sysadmin rights to run when scheduled as a job
JT: This is completely untrue. It is possible to setup proxy accounts that are not sysadmin in order to run packages.

Again, please do not quote out of context, as you can see above, the very next statement acknowledged the existence of proxy solutions, I still want to understand why this exists.

To my comments about the bad configuration scheme and their unpredictability:

JT: Back to my point above, if this is happening then the configurations have been setup wrongly. It NEVER chooses configurations at random. It would be good if the person making the point could make some suggestion as to how it could be improved because if people are experiencing this then there needs to be improvement somewhere. And what's the issue with environment variables?

The issue with environment variables is this is actually something that I would never consider for configuration. Putting a connection string in an environment variables is strange. JT, let us start with a concept that doesn't hard code configuration information to the package. I want to point to a configuration file that is in the same directory as the package, it doesn't let me handle it. I want to choose one of three databases for configuration, depending when I want to do that, etc.

As for their unpredictability, it may have a system for that, but as I pointed above, even with the configuration OFF it will still do things that I don't want.

On UPSERT support:

JT: Hmm...not sure about this one. UPSERT is an operation that would have to be supported by the database platform being inserted/updated wouldn't it? Not sure why this is SSIS's fault. Perhaps I'm misunderstanding in which case I'm happy to be put straight.

It quite trivial to allow update / insert based on a given set of key fields, and it is certainly something that I would expect to see in an ETL product. Given the common need for this. Even something that was DB specific would be welcomed.

SSIS speed, lack there of:

JT: There is room for improvement here in the bloated VS shell but mainly its important to understand WHY this is happening. When a package opens up it tries to validate all external connections. If this is taking a long time then the blame is on the external connections and the network in between, not on SSIS. It is possible to turn off this validation by selecting 'Work Offline' from the SSIS menu.

Um, there is something that is called a background thread, and it is used to do work without freezing the UI. I don't care about the time that it takes to validate things, I want to get things done, let the tool sort those out without interrupting me. Working offline is not a valid option, because then you get a whole lot of validation errors, just for the fun of it.

And last:

SSIS can easily be used in a multi-developer environemnt. I know this because I'm currently working in one.

Good, how do you handle two developers working on the same package? How do you handle branching and merging?

More posts in "SSIS" series:

  1. (27 Jul 2007) The backlash
  2. (15 Apr 2007) I know better than you do
  3. (15 Apr 2007) You really don't need all this data