Solving a Tricky SQL Problem
I'm starting to get quite a bit of mail from this blog. Some of those questions are about subjects I can answer immediately, some require a fair amount of work (which can be had, if you really want), and the more interesting ones are those that require some thinking, but does not require too much time. This question from Dave is the best one so far, and I got his permission to blog about it, so I'm doubly happy.
The issue is working against a legacy database to get the data for further processing. I'll let Dave explain the issue, since he does it much better:
I have to write a query to generate a report over some interesting data. It's basically scheduling which days people are working. The data looks like this:
Id | EmpName | StartDate | EndDate | Roster |
1 | Bob | 12/06/2006 | 18/06/2006 | _*___** |
2 | Mary | 12/06/2006 | 18/06/2006 | *_*__*_ |
The trick is, the roster field contains a string with a _ or * depending on wether the person is scheduled to work that day or not, but the first character always starts on the sunday. The startdate and enddate can be any day of the week.
In the example above, the 12-jun is a monday, so monday corresponds to the second character in the roster string, so Bob's working and Mary's not.
The roster string wraps around, so the first character of the roster string actually corresponds with the enddate here! Now, this roster string could be 7, 10, 14 days long.
I could get the report out if I can write a query to get it to this:
Employee DateWorking Bob 12/06/2006 Bob 16/06/2006 Mary 13/06/2006 Mary 16/06/2006
By the way, I haven't asked, but I'll bet that this schema has originated from a MainFrame, if not currently, than in its recent past.
First I created the schema I needed:
CREATE TABLE Schedules
(
Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
EmpName NVARCHAR(255) NOT NULL,
StartDate DATETIME NOT NULL,
EndDate DATETIME NOT NULL,
Roster NVARCHAR(50) NOT NULL
);
GO
INSERT INTO Schedules
SELECT 'Bob','12-Jun-06','18-Jun-06','_*___**'
UNION ALL
SELECT 'Mary','12-Jun-06','18-Jun-06','*_*__*_'
GO
Then, I started playing with DatePart(), getting the day of the week of StartDate from each row. This gave me the index I needed into the Roster column. But, this only told me whatever the employee worked or didn't work on the start date, which isn't very helpful. What I needed was a way to check for all the values between StartDate and EndDate.
I posted about this issue a while ago, and I made use of this techqnique here:
CREATE FUNCTION DateRange ( @start datetime, @end datetime )
RETURNS @DateRange TABLE ( CurrentDate datetime )
AS
BEGIN
WHILE (@start <= @end)
BEGIN
INSERT INTO @DateRange(CurrentDate) VALUES(@start)
SELECT @start = DATEADD(day,1,@start)
END
RETURN
END
GO
Conceptually, what I wanted was this:
SELECT
IndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)
FROM Schedules, DateRange(StartDate,EndDate)
Unfortantely, DateRange() is a table valued function, and what this query ask from SQL Server is to join each row in the Schedules table to another table. This is not possible, of course.
I settled on faking it using this appraoch:
WITH AllDatesInTable(CurrentDate) AS
(
SELECT CurrentDate FROM dbo.DateRange(
(SELECT MIN(StartDate) FROM Schedules),
(SELECT Max(EndDate) FROM Schedules) )
)
SELECT
TestIndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)
FROM Schedules JOIN AllDatesInTable
ON CurrentDate BETWEEN StartDate AND EndDate
This query uses Common Table Expression to define a table that has all the dates in the Schedules table. Notice that I constrained it to all the dates in the current row. In essense, this give me a row per each date in the date range of each row. This is the basis of solving this problem.
The other issue is the wrapping of the day index in the roster. This is a bit complicated because we need to take into account three things. SQL Server string handling is 1 base, not 0 based (argh!), we are shifting based on the start date functionality, and we need to wrap around correctly. In order to handle this issue I created this function:
CREATE FUNCTION IndexInRoster(@StartDate DATETIME, @CurrentDate DATETIME, @RosterLen INT)
RETURNS INT AS
BEGIN
DECLARE @Result int
SET @Result = (DATEDIFF(day,@StartDate,@CurrentDate) + DATEPART(dw,@StartDate)) % (@RosterLen)
IF @Result = 0
RETURN @RosterLen
RETURN @Result
END
GO
The check for @Result equals 0 is there because SQL Server is using 1 based string handling.
Brining it all together, we get this:
WITH AllDatesInSchedulesTable(CurrentDate) AS
(
SELECT CurrentDate FROM dbo.DateRange(
(SELECT MIN(StartDate) FROM Schedules),
(SELECT Max(EndDate) FROM Schedules) )
)
SELECT
EmpName,
CONVERT(NVARCHAR, CurrentDate,103) Date,
HasWorked = CASE SUBSTRING(Roster,
dbo.IndexInRoster(StartDate,CurrentDate,LEN(Roster)), 1)
WHEN '_' THEN 0
WHEN '*' THEN 1
ELSE NULL
END
FROM Schedules JOIN AllDatesInSchedulesTable
ON CurrentDate BETWEEN StartDate AND EndDate
And the result of this query:
EmpName | Date | HasWorked |
Bob | 12/06/2006 | 1 |
Mary | 12/06/2006 | 0 |
Bob | 13/06/2006 | 0 |
Mary | 13/06/2006 | 1 |
Bob | 14/06/2006 | 0 |
Mary | 14/06/2006 | 0 |
Bob | 15/06/2006 | 0 |
Mary | 15/06/2006 | 0 |
Bob | 16/06/2006 | 1 |
Mary | 16/06/2006 | 1 |
Bob | 17/06/2006 | 1 |
Mary | 17/06/2006 | 0 |
Bob | 18/06/2006 | 0 |
Mary | 18/06/2006 | 1 |
And from here it is trivial to get to whatever format you want.
Comments
Comment preview