Ayende @ Rahien

Oct 28 2007

SQL Gotchas

time to read 2 min | 203 words

Tags:

SQL

I think you can imagine the amount of paint involved in having a query behave in an unexpected manner. I have run into both of those recently. This one had me doubting my sanity (imagine this on a table with several hundred thousands records, with a fairly complex query around it:

select 1

where 1 not in (2,3,null)

And then there is this interesting little query:

select 1 where 'e' = 'e '

I refused to believe the result until I saw it myself.

Aug 12 2007

NHibernate for SQL

time to read 1 min | 149 words

Tweet Share Share 4 comments

Tags:

Here is a little known fact, you can ask NHibernate to take any arbitrary result set from the database and turn it into an object.

return session.CreateSqlQuery(@"
select customer.Name as CustomerName,
           count( case order.Type when 1 then 1 else 0 end ) as FastOrder,
           count( case order.Type when 1 then 2 else 0 end ) as SlowOrder,
           count( case order.Type when 1 then 3 else 0 end ) as AirMailOrder,
from Orders order join Customers customer
on order.Customer = customer.Id
group by customer.Name
")
.AddScalar("CustomerName", NHibernateUtil.String)
.AddScalar("FastOrder", NHibernateUtil.Int64)
.AddScalar("SlowOrder", NHibernateUtil.Int64)
.AddScalar("AirMailOrder", NHibernateUtil.Int64)
.SetResultTransformer(new AliasToBeanResultTransformer(typeof (CustomerOrderTypesCount)))
.List();

As you can probably imagine, this will return a list of CustomerOrderTypesCount objects, with the respective properties filled with the values from the result set.
You can also do it in the configuration, and even map full blown entities along side other interesting values.

Aug 12 2007

SQL Challenge, getting historical dataPerformance characteristics

time to read 1 min | 127 words

Tweet Share Share 0 comments

Tags:

SQL

After posting my solution to the problem I got some feedback about the possible performance of the query.
I just run the query over a table with 120,000~ rows, and it completed in less than second and change.
After playing with it a bit, I decided to skip the function and put the results in a table and map that, it is easier to handle it that way with NHibernate, and I can put an index on it that would save a table scan on the months result.
The major cost is a clustered index scan (95%) and nested loops join (5%). Fun stuff :-)

I am getting deep into NHibernate's queries recently, extremely powerful, especially if you combine it with your own customer dialect.

Aug 10 2007

SQL Challenge, getting historical datasolution

time to read 2 min | 349 words

Tweet Share Share 10 comments

Tags:

SQL

As it turned out, this isn't that hard, all I needed to do was remember my trusty DateRange function, modify it to on months instead of days, and it was off to the races. This type of code does make my head hurt a tiny bit, it packs a lot into it.

DECLARE @start DATETIME,
           @end DATETIME

SET @start = '2007-04-01'
set @end = '2007-10-01'

SELECT YEAR(Currentdate) Year,
         MONTH(Currentdate) Month,
         COUNT(Bug.Id) OpenedBugCount
    FROM Dbo.MonthRange(@start,@end )
         JOIN Bugs Bug
           ON YEAR(Currentdate) >= YEAR(Bug.Openedat)
              AND MONTH(Currentdate) >= MONTH(Bug.Openedat)
              AND YEAR(Currentdate) <= YEAR(ISNULL(Bug.Closedat,Currentdate))
              AND MONTH(Currentdate) < MONTH(ISNULL(Bug.Closedat,DateAdd(MONTH,1,Currentdate)))
GROUP BY YEAR(Currentdate),MONTH(Currentdate)

I would get it into NHibernate on Sunday, should be fun, it is about the ninth select is the super report... :-)

Aug 10 2007

SQL Challenge, getting historical data

time to read 1 min | 125 words

Tweet Share Share 9 comments

Tags:

SQL

Here is an interesting problem that I had to face at work. I need to get historical data from this table. Basically, I need to count the number of opened bugs per months.

Id  OpenedAt   ClosedAt
--- ---------- -----------
1   2007-04-02  2007-09-01
2   2007-07-01  NULL
3   2007-07-01  2007-07-23

A solution to this with the data above should print (for 2007-04 to 2007-10):

Month	Year	Opened Bugs
4	2007	1
5	2007	1
6	2007	1
7	2007	2
8	2007	2
9	2007	1
10	2007	1

Have fun :-)

Jul 10 2007

Sending arrays to SQL Server: Xml vs. Comma Separated Values

time to read 4 min | 629 words

Tweet Share Share 4 comments

Tags:

I spoke before about using the XML capabilities of SQL Server in order to easily pass list of values to SQL Server. I thought that this was a pretty good way to go, until I started to look at the performance numbers.

Let us take a look at this simple query:

DECLARE @ids xml
SET @ids = '<ids>
<id>ALFKI</id>
...
<id>SPLIR</id>
</ids>'

SELECT * FROM Customers
WHERE CustomerID IN (SELECT ParamValues.ID.value('.','NVARCHAR(20)')
FROM @ids .nodes('/ids/id') as ParamValues(ID) )

This simple query has a fairly involved execution plan:

This looks to me like way too much stuff for such a simple thing, especially when I see this:

So the XML stuff is taking up 98% of the query?

I then checked the second route, using fnSplit UDF from here. Using it, I got this result:

So it looks like it is significantly more efficient than the XML counter part.

But what about the larger scheme? Running the fnSplit over 9,100 items got me a query that took nearly 45 seconds, while a XML approach over the same set of data had no measurable time over just 91 records.

I then tried a simple SqlCLR function, and got this the same performance from it:

The code for the function is:

 [SqlFunction(FillRowMethodName = "FillRow", TableDefinition = "str NVARCHAR(MAX)")]
 public static IEnumerable Split(SqlString str)
 {
     if (str.IsNull)
         return null;
     return str.Value.Split(',');
 }

 public static void FillRow(object obj, out SqlString str)
 {
     string val = (string) obj;

     if (string.IsNullOrEmpty(val))
         str = new SqlString();
     else
         str = new SqlString(val);
 }

As you can probably guess, there are all sorts of stuff that you can do to make it better if you really want, but this looks like a very good approach already.

Tripling the size of the data we are talking about to ~30,000 items had no measurable difference that I could see.

Obviously, when you are talking about those numbers, an IN is probably not something that you want to use.

Jul 02 2007

Why I have SSIS?!

time to read 1 min | 9 words

Tweet Share Share 0 comments

Tags:

SQL

ERROR: Unused variable declaration, line: "DateTime goingAwayFromOffice = DateTime.Now.AddMinutes(-45);"

Jun 27 2007

SSIS Scheduling Conflicts?

time to read 1 min | 50 words

Tweet Share Share 6 comments

Tags:

SQL

I have two SSIS packages that cannot run concurrently, however, one must run once every 5 minutes, and the second every hour, so they are assured to conflict very soon. I can handle this situation manually, but I wanted to know if there was a builtin way to handle that.

Apr 05 2007

SSIS Integration Woes

time to read 3 min | 491 words

Tweet Share Share 9 comments

Tags:

I am not sure if it is me or SSIS, but something is very wrong here. I just spend an ungodly amount of time trying to solve this simple scenario:

Source database is Oracle, Destination database is SQL Server
A table with large amount of rows
The table has a LastModifed field
Every few minutes, an SSIS process should wake and grab the latest updates from the table, then update a state table, with the last updated date, for the next run

It is something that should take minutes to handle with code, but even now, I just can't make it happen. The major issue here is very simple, I can't pass parameters to the query. The OLE DB Source won't accept parameters when using Oracle, and when using the DateReader source, it simply has no way to pass input parameters that I can find.

Parameters cannot be extracted from the SQL command. The provider might not help to parse parameter information from the command. In that case, use the "SQL command from variable" access mode, in which the entire SQL command is stored in a variable.

ADDITIONAL INFORMATION:

Provider cannot derive parameter information and SetParameterInfo has not been called. (Microsoft OLE DB Provider for Oracle)

I am not an SSIS expert, but I consider myself technically able, and I certainly knows how to search google. No answer on this one. I went and wrote the logic in an Oracle stored procedure, then I faced another interesting issue. There doesn't seem to be a way to call an Oracle SP from SSIS.

SSIS supposed to stand for Integration, and while I wouldn't assume that it would integrate with the DB engine that I wrote last month in 2AM, I don't think that Oracle is such a minor player that integration with it should be so hard.

Please, someone, make my day and show me that I am stupid, because I am very close to throwing the whole thing out and writing it in an envrionment that doesn't force me to jump through hops just to get the simplest things done. (And have much better error reporting, as an aside)

Apr 02 2007

Remote Keys: Cross Database Foreign Keys

time to read 2 min | 264 words

Tweet Share Share 6 comments

Tags:

Here is an interesting product (via Larkware). RemoteKeys Adds cross-database constraints to SQL Server, which is something that is really nice in many scenarios.

Specifically, I am a firm believer that database == application, and that separate applications should use separate databases. (Versioning, deployment, WhoBrokeMyQuery, etc). This product seems to give a nice answer to Referential integrity concerns.

I was interested to know how they did it, and it looks like a good UI on top of Linked Server + INSTEAD OF Triggers. I really like the idea, but a couple of things needs to be considered when using it are:

Availability concerns - Does both databases have the same availability? If not, what happens when either database goes down? The way it works, both databases needs to talk to each other for CUD.
Performance concerns - If we are talking about remote databases, then it is something that you really should consider carefully. The example on their site has US Orders and Europe Orders with remote FK to Customers Database. The problem is that this means that the Customers database needs to issue two remote calls (probably per statement). There may be some smarts there about consolidating checks, but I didn't check.

Cool idea nonetheless, although if I ever see the need, I will create the trigger via code gen and be done with it.

Oren Eini

Oren Eini

CEO of RavenDB

SQL Gotchas

NHibernate for SQL

SQL Challenge, getting historical dataPerformance characteristics

SQL Challenge, getting historical datasolution

SQL Challenge, getting historical data

Sending arrays to SQL Server: Xml vs. Comma Separated Values

Why I have SSIS?!

SSIS Scheduling Conflicts?

SSIS Integration Woes

Remote Keys: Cross Database Foreign Keys

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed