SQL Challenge, getting historical datasolution
As it turned out, this isn't that hard, all I needed to do was remember my trusty DateRange function, modify it to on months instead of days, and it was off to the races. This type of code does make my head hurt a tiny bit, it packs a lot into it.
DECLARE @start DATETIME,
@end DATETIME
SET @start = '2007-04-01'
set @end = '2007-10-01'
SELECT YEAR(Currentdate) Year,
MONTH(Currentdate) Month,
COUNT(Bug.Id) OpenedBugCount
FROM Dbo.MonthRange(@start,@end )
JOIN Bugs Bug
ON YEAR(Currentdate) >= YEAR(Bug.Openedat)
AND MONTH(Currentdate) >= MONTH(Bug.Openedat)
AND YEAR(Currentdate) <= YEAR(ISNULL(Bug.Closedat,Currentdate))
AND MONTH(Currentdate) < MONTH(ISNULL(Bug.Closedat,DateAdd(MONTH,1,Currentdate)))
GROUP BY YEAR(Currentdate),MONTH(Currentdate)
I would get it into NHibernate on Sunday, should be fun, it is about the ninth select is the super report... :-)
More posts in "SQL Challenge, getting historical data" series:
- (12 Aug 2007) Performance characteristics
- (10 Aug 2007) solution
Comments
Seeing all the stuff from the answers to your challenge, compared to the cleaner thing you wrote here, I find just another proof that a solid UDF is a life-saver, T-SQL-wise.
I'm not sure how well this query will scale and as you've mentioned it is a little hard to grasp.
I think I would go with a variable which is initialized with the amount of bugs that were opened and not closed before the start date (or were not closed at all) and two GROUP BY queries for the report period - one for OpenedAt and one for ClosedAt. I would then iterate through the months in question and increment (or decrement) the variable accordingly.
My aim is to reduce the number of table-scans the DBMS performs and make the code easier to handle.
Yoni,
It is hard to grasp because you need to understand the join condition. I usually dislike putting logic there, but this is the best place for this.
Can you show the code that you are thinking of, I have a hunch that it is going to be significantly more complex.
Beyond that, can you explain why this would require more table scans from the DB? It looks to me that it can be optimized with a simple index.
Ayende,
I may be getting too creative here but back in the days I was coding a lot of reports and was faced with a lot of problematic queries which resulted with bad execution plans. I think your query will require at least one or two index-scans per month in the report if not more.
So I was thinking of a solution which would limit the amount of scans (which cost more as the table becomes larger) and would also translate to coherent, intent revealing, code:
delegate Dictionary<DateTime, int> GetBugCountGroupedByMonths(string dateFieldToGroupBy);
This way may seem more complex but it does remove complexity from the execution plan of the DBMS and will probably perform and scale better.
I also think that your original query would be more intent revealing if it contained a nested select for counting the bugs instead of a join.
Yoni,
Just a note:
getBugCountGroupedByMonths("getdate()); DROP DATABASE Production; --");
Now, the real problem here is that you method requires N+2 round trips to the database, where N is the number of months in the selected range. It also means N+2 index scans on the table, etc.
I don't see how it can perform better.
Ayende,
Dynamic SQL can be fun and harmless, SQL injection is something you should solve at the user input level...
If you look closer at the code there are actually 3 round-trips to the database. The rest of the code uses the in-memory dictionaries which contain up to N key-value-pairs which can hardly affect performance.
Outside T-SQL currentdate is a function. perhaps you should name the set returned from month range to aid in readability.
DECLARE @start DATETIME, @end DATETIME
SET @start = '2007-04-01'
set @end = '2007-10-01'
SELECT YEAR(Currentdate) Year,
FROM Dbo.MonthRange(@start,@end ) AS Dates
JOIN Bugs Bug ON
I made a simple solution for a similar problem before and it can be pretty simple.
I think we can focus on the start date for begining calculations as we are counting bugs opened and close the same month.
We can also limitate the range by defining OpenedAt >= start date and ClosedAt <= end date.
Then, if we use sum instead of count, we can make simple counts without using an alternate table for date ranges (and not to forget the months where there is no bug closed.
Create function fGetOpenedBugsByPeriod (
)
RETURNS TABLE
AS
RETURN (
SELECT [Year] = Year(Openedat),
FROM dbo.Bugs
WHERE 1=1
GROUP BY
)
GO
With this system, we can easilly add another calculation directly in the same table,as here for exemple the number of bugs closed the month after.
Rui,
How would you get the opened bugs count for the last two years?
Looks to me like you would need a different statement per date range.
Ayende,
I may be in error, but for me it works in all the cases, even if you work on a range greater than 2 years. I supposed we are talking about bugs opened and close during a same calendar month, not about bugs closed in less than 1 month since a date, did I make a misunderstanding? Thank you
Comment preview