Solving a Tricky SQL Problem

time to read 55 min | 10901 words

I'm starting to get quite a bit of mail from this blog. Some of those questions are about subjects I can answer immediately, some require a fair amount of work (which can be had, if you really want), and the more interesting ones are those that require some thinking, but does not require too much time. This question from Dave is the best one so far, and I got his permission to blog about it, so I'm doubly happy.

The issue is working against a legacy database to get the data for further processing. I'll let Dave explain the issue, since he does it much better:

I have to write a query to generate a report over some interesting data.  It's basically scheduling which days people are working.  The data looks like this:

Id EmpName StartDate EndDate Roster
1 Bob 12/06/2006 18/06/2006 _*___**
2 Mary 12/06/2006 18/06/2006 *_*__*_

The trick is, the roster field contains a string with a _ or * depending on wether the person is scheduled to work that day or not, but the first character always starts on the sunday.  The startdate and enddate can be any day of the week.
In the example above, the 12-jun is a monday, so monday corresponds to the second character in the roster string, so Bob's working and Mary's not.
The roster string wraps around, so the first character of the roster string actually corresponds with the enddate here!  Now, this roster string could be 7, 10, 14 days long. 

I could get the report out if I can write a query to get it to this:

 

Employee DateWorking
Bob 12/06/2006
Bob 16/06/2006
Mary 13/06/2006
Mary 16/06/2006

 

By the way, I haven't asked, but I'll bet that this schema has originated from a MainFrame, if not currently, than in its recent past.

First I created the schema I needed:

 

CREATE TABLE Schedules

(

        Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

        EmpName NVARCHAR(255) NOT NULL,

        StartDate DATETIME NOT NULL,

        EndDate DATETIME NOT NULL,

        Roster NVARCHAR(50) NOT NULL

);

GO

 

INSERT INTO Schedules

SELECT 'Bob','12-Jun-06','18-Jun-06','_*___**'

UNION ALL

SELECT 'Mary','12-Jun-06','18-Jun-06','*_*__*_'

 

GO

 

Then, I started playing with DatePart(), getting the day of the week of StartDate from each row. This gave me the index I needed into the Roster column. But, this only told me whatever the employee worked or didn't work on the start date, which isn't very helpful. What I needed was a way to check for all the values between StartDate and EndDate.

I posted about this issue a while ago, and I made use of this techqnique here:

 

CREATE FUNCTION DateRange ( @start datetime, @end datetime )

RETURNS @DateRange TABLE ( CurrentDate datetime )

AS

BEGIN

      WHILE (@start <= @end)

      BEGIN

            INSERT INTO @DateRange(CurrentDate) VALUES(@start)

            SELECT @start = DATEADD(day,1,@start)

      END

      RETURN

END

GO

Conceptually, what I wanted was this:

SELECT

      IndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)

FROM Schedules, DateRange(StartDate,EndDate)

Unfortantely, DateRange() is a table valued function, and what this query ask from SQL Server is to join each row in the Schedules table to another table. This is not possible, of course.

I settled on faking it using this appraoch:

WITH AllDatesInTable(CurrentDate) AS

(

        SELECT CurrentDate FROM dbo.DateRange(

                  (SELECT MIN(StartDate) FROM Schedules),

                        (SELECT Max(EndDate) FROM Schedules) )

)

SELECT

      TestIndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)

FROM Schedules JOIN AllDatesInTable

ON CurrentDate BETWEEN StartDate AND EndDate

This query uses Common Table Expression to define a table that has all the dates in the Schedules table. Notice that I constrained it to all the dates in the current row. In essense, this give me a row per each date in the date range of each row. This is the basis of solving this problem.

The other issue is the wrapping of the day index in the roster. This is a bit complicated because we need to take into account three things. SQL Server string handling is 1 base, not 0 based (argh!), we are shifting based on the start date functionality, and we need to wrap around correctly. In order to handle this issue I created this function:

CREATE FUNCTION IndexInRoster(@StartDate DATETIME, @CurrentDate DATETIME, @RosterLen INT)

RETURNS INT AS

BEGIN

      DECLARE @Result int

      SET @Result = (DATEDIFF(day,@StartDate,@CurrentDate) + DATEPART(dw,@StartDate)) % (@RosterLen)

      IF @Result = 0

            RETURN @RosterLen

     

      RETURN @Result

END

GO

The check for @Result equals 0 is there because SQL Server is using 1 based string handling.

Brining it all together, we get this:

WITH AllDatesInSchedulesTable(CurrentDate) AS

(

        SELECT CurrentDate FROM dbo.DateRange(

                  (SELECT MIN(StartDate) FROM Schedules),

                        (SELECT Max(EndDate) FROM Schedules) )

)

SELECT

        EmpName,

        CONVERT(NVARCHAR, CurrentDate,103) Date,

        HasWorked = CASE SUBSTRING(Roster,

                dbo.IndexInRoster(StartDate,CurrentDate,LEN(Roster)), 1)

                        WHEN '_' THEN 0

                        WHEN '*' THEN 1

                        ELSE NULL

            END

FROM Schedules JOIN AllDatesInSchedulesTable

ON CurrentDate BETWEEN StartDate AND EndDate

And the result of this query:

EmpName Date HasWorked
Bob 12/06/2006 1
Mary 12/06/2006 0
Bob 13/06/2006 0
Mary 13/06/2006 1
Bob 14/06/2006 0
Mary 14/06/2006 0
Bob 15/06/2006 0
Mary 15/06/2006 0
Bob 16/06/2006 1
Mary 16/06/2006 1
Bob 17/06/2006 1
Mary 17/06/2006 0
Bob 18/06/2006 0
Mary 18/06/2006 1

And from here it is trivial to get to whatever format you want.