Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive

SQL Gotchas

time to read 2 min | 203 words

I think you can imagine the amount of paint involved in having a query behave in an unexpected manner. I have run into both of those recently. This one had me doubting my sanity (imagine this on a table with several hundred thousands records, with a fairly complex query around it:

select 1

where 1 not in (2,3,null)

And then there is this interesting little query:

select 1 where 'e' = 'e   '

I refused to believe the result until I saw it myself.

time to read 1 min | 149 words


Here is a little known fact, you can ask NHibernate to take any arbitrary result set from the database and turn it into an object.

return session.CreateSqlQuery(@"
select customer.Name as CustomerName,
           count( case order.Type when 1 then 1 else 0 end )  as FastOrder,
           count( case order.Type when 1 then 2 else 0 end )  as SlowOrder,
           count( case order.Type when 1 then 3 else 0 end )  as AirMailOrder,
from Orders order join Customers customer
on order.Customer = customer.Id
group by customer.Name
")
.AddScalar("CustomerName", NHibernateUtil.String)
.AddScalar("FastOrder", NHibernateUtil.Int64)
.AddScalar("SlowOrder", NHibernateUtil.Int64)
.AddScalar("AirMailOrder", NHibernateUtil.Int64)
.SetResultTransformer(new AliasToBeanResultTransformer(typeof (CustomerOrderTypesCount)))
.List();

As you can probably imagine, this will return a list of CustomerOrderTypesCount objects, with the respective properties filled with the values from the result set.
You can also do it in the configuration, and even map full blown entities along side other interesting values.
time to read 1 min | 127 words

After posting my solution to the problem I got some feedback about the possible performance of the query.
I just run the query over a table with 120,000~ rows, and it completed in less than second and change.
After playing with it a bit, I decided to skip the function and put the results in a table and map that, it is easier to handle it that way with NHibernate, and I can put an index on it that would save a table scan on the months result.
The major cost is a clustered index scan (95%) and nested loops join (5%). Fun stuff :-)

I am getting deep into NHibernate's queries recently, extremely powerful, especially if you combine it with your own customer dialect.
time to read 2 min | 349 words

As it turned out, this isn't that hard, all I needed to do was remember my trusty DateRange function, modify it to on months instead of days, and it was off to the races. This type of code does make my head hurt a tiny bit, it packs a lot into it.

DECLARE  @start DATETIME,
           @end DATETIME
               
SET @start = '2007-04-01'
set @end = '2007-10-01'

  SELECT YEAR(Currentdate) Year,
         MONTH(Currentdate) Month,
         COUNT(Bug.Id) OpenedBugCount
    FROM Dbo.MonthRange(@start,@end )
         JOIN Bugs Bug
           ON YEAR(Currentdate) >= YEAR(Bug.Openedat)
              AND MONTH(Currentdate) >= MONTH(Bug.Openedat)
              AND YEAR(Currentdate) <= YEAR(ISNULL(Bug.Closedat,Currentdate))
              AND MONTH(Currentdate) < MONTH(ISNULL(Bug.Closedat,DateAdd(MONTH,1,Currentdate)))
GROUP BY YEAR(Currentdate),MONTH(Currentdate)

I would get it into NHibernate on Sunday, should be fun, it is about the ninth select is the super report... :-)

time to read 1 min | 130 words

Here is an interesting problem that I had to face at work. I need to get historical data from this table. Basically, I need to count the number of opened bugs per months.

image

Id  OpenedAt   ClosedAt
--- ---------- -----------
1   2007-04-02  2007-09-01
2   2007-07-01  NULL
3   2007-07-01  2007-07-23

 

 

A solution to this with the data above should print (for 2007-04 to 2007-10):

Month Year Opened Bugs
4 2007 1
5 2007 1
6 2007 1
7 2007 2
8 2007 2
9 2007 1
10 2007 1

Have fun :-)

time to read 4 min | 640 words

I spoke before about using the XML capabilities of SQL Server in order to easily pass list of values to SQL Server. I thought that this was a pretty good way to go, until I started to look at the performance numbers.

Let us take a look at this simple query:

DECLARE @ids xml
SET @ids = '<ids>
      <id>ALFKI</id>
...
      <id>SPLIR</id>
</ids>'

SELECT * FROM Customers
WHERE CustomerID IN (SELECT ParamValues.ID.value('.','NVARCHAR(20)')
FROM @ids .nodes('/ids/id') as ParamValues(ID) )

This simple query has a fairly involved execution plan:

image

This looks to me like way too much stuff for such a simple thing, especially when I see this:

image

So the XML stuff is taking up 98% of the query?

I then checked the second route, using fnSplit UDF from here. Using it, I got this result:

image

So it looks like it is significantly more efficient than the XML counter part.

But what about the larger scheme? Running the fnSplit over 9,100 items got me a query that took nearly 45 seconds, while a XML approach over the same set of data had no measurable time over just 91 records.

I then tried a simple SqlCLR function, and got this the same performance from it:

image

The code for the function is:

 [SqlFunction(FillRowMethodName = "FillRow", TableDefinition = "str NVARCHAR(MAX)")]
 public static IEnumerable Split(SqlString str)
 {
     if (str.IsNull)
         return null;
     return str.Value.Split(',');
 }

 public static void FillRow(object obj, out SqlString str)
 {
     string val = (string) obj;

     if (string.IsNullOrEmpty(val))
         str = new SqlString();
     else
         str = new SqlString(val);
 }

As you can probably guess, there are all sorts of stuff that you can do to make it better if you really want, but this looks like a very good approach already.

Tripling the size of the data we are talking about to ~30,000 items had no measurable difference that I could see.

Obviously, when you are talking about those numbers, an IN is probably not something that you want to use.

time to read 1 min | 50 words

I have two SSIS packages that cannot run concurrently, however, one must run once every 5 minutes, and the second every hour, so they are assured to conflict very soon. I can handle this situation manually, but I wanted to know if there was a builtin way to handle that.

time to read 3 min | 491 words

I am not sure if it is me or SSIS, but something is very wrong here. I just spend an ungodly amount of time trying to solve this simple scenario:

  • Source database is Oracle, Destination database is SQL Server
  • A table with large amount of rows
  • The table has a LastModifed field
  • Every few minutes, an SSIS process should wake and grab the latest updates from the table, then update a state table, with the last updated date, for the next run

It is something that should take minutes to handle with code, but even now, I just can't make it happen. The major issue here is very simple, I can't pass parameters to the query. The OLE DB Source won't accept parameters when using Oracle, and when using the DateReader source, it simply has no way to pass input parameters that I can find.

Parameters cannot be extracted from the SQL command. The provider might not help to parse parameter information from the command. In that case, use the "SQL command from variable" access mode, in which the entire SQL command is stored in a variable.

ADDITIONAL INFORMATION:

Provider cannot derive parameter information and SetParameterInfo has not been called. (Microsoft OLE DB Provider for Oracle)

I am not an SSIS expert, but I consider myself technically able, and I certainly knows how to search google. No answer on this one. I went and wrote the logic in an Oracle stored procedure, then I faced another interesting issue. There doesn't seem to be a way to call an Oracle SP from SSIS.

SSIS supposed to stand for Integration, and while I wouldn't assume that it would integrate with the DB engine that I wrote last month in 2AM, I don't think that Oracle is such a minor player that integration with it should be so hard.

Please, someone, make my day and show me that I am stupid, because I am very close to throwing the whole thing out and writing it in an envrionment that doesn't force me to jump through hops just to get the simplest things done. (And have much better error reporting, as an aside)

time to read 2 min | 264 words

Here is an interesting product (via Larkware). RemoteKeys Adds cross-database constraints to SQL Server, which is something that is really nice in many scenarios.

Specifically, I am a firm believer that database == application, and that separate applications should use separate databases. (Versioning, deployment, WhoBrokeMyQuery, etc). This product seems to give a nice answer to  Referential integrity concerns.

I was interested to know how they did it, and it looks like a good UI on top of Linked Server + INSTEAD OF Triggers. I really like the idea, but a couple of things needs to be considered when using it are:

  • Availability concerns - Does both databases have the same availability? If not, what happens when either database goes down? The way it works, both databases needs to talk to each other for CUD.
  • Performance concerns - If we are talking about remote databases, then it is something that you really should consider carefully. The example on their site has US Orders and Europe Orders with remote FK to Customers Database. The problem is that this means that the Customers database needs to issue two remote calls (probably per statement). There may be some smarts there about consolidating checks, but I didn't check.

Cool idea nonetheless, although if I ever see the need, I will create the trigger via code gen and be done with it.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}