A lesson in performance
A while ago I posted a performance analysis of Sending arrays to SQL Server: Xml vs. Comma Separated Values. The context for that was sending large number of parameters to the server to be processed in an IN expression. We have hit the 2,100 parameters limit of SQL server a few times, and that became critical.
The reason that we had so many items to send to the IN expression was that we do both caching and calculation on the code, and then we need to get the data from that.
As it turned out, I got a call today that said that the String Split method was causing performance issues. Conversely, the XmlIn method had no issue. This happens when the server is under load. I am not sure why this is happening, but the change was very quick & painless.
Comments
Could it be the reallocation of @sInputList in the fnSplit UDF is chomping memory?
@sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))),
@sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))
Seems like it would be better to extract the list elements bit-by-bit by storing the current position as you roll through the string and passing this current position to charindex.
For example, check out the f_split_string function that is used in the ASP.Net Sql membership provider sprocs.
Looking back at the original post you referenced, I see a comment mentioning that "even though you are returning a table from the UDF, you can't index it, so it doesn't have the same performance as joining 2 real tables".
In fact, this is not true. There's no reason that you can't set a primary key on the table returned from the split function. Here's the code that I've been using in my database (it differs from what you've been doing in that it uses two columns, the first one being the index of the item in the list, and is what is indexed):
ALTER function [Private].[StringToKeyedIntTable] (@ValueList varchar(MAX), @Delim char)
returns @Rows table
(
Number INT IDENTITY PRIMARY KEY ,
Value int
)
as
begin
declare @i int
declare @v varchar(40)
set @i = charindex(@Delim, @ValueList)
while (@i > 0)
if (len(@ValueList) > 0)
return
end
Often - surprisingly - this task is better accomplished in a different way. Sending large IN clauses normally spins a lot of DB time generating query plans. Parsing the lists inside the database is very slow.
You'll find that often the fastest way of doing this is to actually do a SQL Bulk Insert of the data into a table, do an INNER JOIN against it, then rollback the transaction.
This is often also the fastest way to join searches with data from external engines (eg lists of primary keys from Lucene).
In case when I need to pass numeric data as array to SQL Server I use following technique: serialize array to binary stream, send it to SQL Server as varbinary(max), and deserialize it in the CLR-table-valued function. Sample for int array is here:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=999945&SiteID=1
Comment preview