Wednesday, July 16, 2008

SQL Server Cursors

What are cursors for?
--------------------------
Cursors were created to bridge the 'impedence mismatch' between the 'record- based' culture of conventional programming and the set-based world of the relational database.
They had a useful purpose in allowing existing applications to change from ISAM or KSAM databases, such as DBaseII, to SQL Server with the minimum of upheaval. DBLIB and ODBC make extensive use of them to 'spoof' simple file-based data sources. Relational database programmers won't need them but, if you have an application that understands only the process of iterating through resultsets, like flicking through a card index, then you'll probably need a cursor.


Where would you use a Cursor?
----------------------------------------
An simple example of an application for which cursors can provide a good solution is one that requires running totals.
A cumulative graph of monthly sales to date is a good example, as is a cashbook with a running balance.
We'll try four different approaches to getting a running total...

/*so lets build a very simple cashbook */
CREATE TABLE #cb ( cb_ID INT IDENTITY(1,1),--sequence of entries 1..n
Et VARCHAR(10), --entryType
amount money)--quantity
INSERT INTO #cb(et,amount) SELECT 'balance',465.00
INSERT INTO #cb(et,amount) SELECT 'sale',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',434.30
INSERT INTO #cb(et,amount) SELECT 'purchase',20.04
INSERT INTO #cb(et,amount) SELECT 'purchase',65.00
INSERT INTO #cb(et,amount) SELECT 'sale',23.22
INSERT INTO #cb(et,amount) SELECT 'sale',45.80
INSERT INTO #cb(et,amount) SELECT 'purchase',34.08
INSERT INTO #cb(et,amount) SELECT 'purchase',78.30
INSERT INTO #cb(et,amount) SELECT 'purchase',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',75.22
INSERT INTO #cb(et,amount) SELECT 'sale',5.80
INSERT INTO #cb(et,amount) SELECT 'purchase',3.08
INSERT INTO #cb(et,amount) SELECT 'sale',3.29
INSERT INTO #cb(et,amount) SELECT 'sale',100.80
INSERT INTO #cb(et,amount) SELECT 'sale',100.22
INSERT INTO #cb(et,amount) SELECT 'sale',23.80

/* 1) Running Total using Correlated sub-query */
SELECT [Entry Type]=Et, amount,
[balance after transaction]=(
SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount
END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb
ORDER BY #cb.cb_id /*
2)Running Total using simple inner-join and group by clause */

SELECT [Entry Type]=MIN(#cb.Et), [amount]=MIN (#cb.amount),
[balance after transaction]= SUM(CASE WHEN total.Et='purchase' THEN -total.amount ELSE total.amount END)
FROM #cb total INNER JOIN #cb
ON total.cb_id <= #cb.cb_id
GROUP BY #cb.cb_id
ORDER BY #cb.cb_id

/* 3)and here is a very different technique that takes advantege of the quirky behavionr of SET in an UPDATE command in SQL Server */
DECLARE @cb TABLE(cb_ID INT,--sequence of entries 1..n Et VARCHAR(10), --entryType amount money,--quantity total money)
DECLARE @total money SET @total = 0
INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,CASE
WHEN Et='purchase' THEN -amount
ELSE amount END,0
FROM #cb UPDATE @cb
SET @total = total = @total + amount FROM @cb SELECT [Entry Type]=Et, [amount]=amount, [balance after transaction]=total FROM @cb ORDER BY cb_id

/* 4)or you can give up trying to do it a set-based way and iterate through the table */
DECLARE @ii INT, @iiMax INT, @CurrentBalance money DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb
WHILE @ii<=@iiMax
BEGIN
SELECT @currentBalance=@currentBalance +CASE WHEN Et='purchase' THEN -amount
ELSE amount END FROM #cb WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total)
SELECT @ii,@currentBalance SELECT @ii=@ii+1
END
SELECT [Entry Type]=Et,amount,total

/* or alternatively you can use....A CURSOR!!! the use of a cursor will normally involve a
DECLARE, OPEN, several FETCHs, a CLOSE and a DEALLOCATE */

SET Nocount ON DECLARE @Runningtotals TABLE (cb_id INT, Et VARCHAR(10), --entryType amount money, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money --Declare the cursor --declare current_line cursor -- SQL-92 syntax--only scroll forward
DECLARE current_line CURSOR fast_forward--SQL Server only
--only scroll forward
FOR SELECT Et,amount FROM #cb ORDER BY cb_id FOR READ ONLY --now we open the cursor to populate any temporary tables (in the case of -- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line
--fetch the first row
FETCH NEXT FROM current_line INTO @Et,@amount WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN
SELECT @CurrentBalance = COALESCE(@CurrentBalance,0) +CASE
WHEN @Et='purchase'
THEN -@amount ELSE @amount
END
INSERT INTO @Runningtotals (Et, amount,Total)
SELECT @Et,@Amount,@CurrentBalance -- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line INTO @Et,@amount
END
SELECT [Entry Type]=Et,amount,Total FROM @Runningtotals ORDER BY cb_id
CLOSE current_line
--Do not forget to close when its result set is not needed. --especially a global updateable cursor!
DEALLOCATE current_line -- although the Cursor code looks bulky and complex, on small tables it will
-- execute just as quickly as a simple iteration, and will be faster with tables
-- of any size if you forget to put an index on the table through which you're iterating!
-- The first two solutions are faster with small tables but slow down
-- exponentially as the table size grows.
Result
---------
balance 465.00 465.00
sale 56.00 521.00
sale 434.30 955.30
purchase 20.04 935.26
purchase 65.00 870.26
sale 23.22 893.48
sale 45.80 939.28
purchase 34.08 905.20
purchase 78.30 826.90
purchase 56.00 770.90
sale 75.22 846.12
sale 5.80 851.92
purchase 3.08 848.84
sale 3.29 852.13
sale 100.80 952.93
sale 100.22 1053.15
sale 23.80 1076.95
Global Cursors
------------------------
If you are doing something really complicated with a listbox, or scrolling through a rapidly-changing table whilst making updates,
a GLOBAL cursor could be a good solution,
but it is very much geared for traditional client-server applications, because cursors have a lifetime only of the connection.
Each 'client' therefore needs their own connection.
The GLOBAL cursors defined in a connection will be implicitly deallocated at disconnect.
Global Cursors can be passed too and from stored procedure and referenced in triggers.
They can be assigned to local variables. A global cursor can therefore be passed as a parameter to a number of stored procedures
Here is an example, though one is struggling to think of anything useful in a short example
CREATE PROCEDURE spReturnEmployee ( @EmployeeLastName VARCHAR(20), @MyGlobalcursor CURSOR VARYING OUTPUT )
AS
BEGIN
SET NOCOUNT ON
SET @MyGlobalcursor = CURSOR STATIC FOR
SELECT lname, fname FROM pubs.dbo.employee
WHERE lname = @EmployeeLastName
OPEN @MyGlobalcursor
END
DECLARE @FoundEmployee CURSOR, @LastName VARCHAR(20), @FirstName VARCHAR(20)
EXECUTE spReturnEmployee 'Lebihan', @FoundEmployee OUTPUT
--see if anything was found
--note we are careful to check the right cursor!
IF CURSOR_STATUS('variable', '@FoundEmployee') = 0 SELECT 'no such employee'
ELSE BEGIN FETCH NEXT FROM @FoundEmployee INTO @LastName, @FirstName
SELECT @FirstName+' '+@LastName END CLOSE @FoundEmployee
DEALLOCATE @FoundEmployee
Transact-SQL cursors are efficient when contained in stored procedures and triggers.
This is because everything is compiled into one execution plan on the server and there is no overhead of network traffic whilst fetching rows.
Are Cursors Slow?
----------------------------
So what really are the performance differences?
Let's set up a test-rig. We'll give it a really big cashbook to work on and give it a task that doesn't disturb SSMS/Query analyser too much.
We'll calculate the average balance, and the highest and lowest balance.
Now, which solution is going to be the best?
--recreate the cashbook but make it big!
DROP TABLE #cb
CREATE TABLE #cb (cb_ID INT IDENTITY(1,1),
--sequence of entries 1..n Et VARCHAR(10),
--entryType amount money)
--quantity
INSERT INTO #cb(et,amount)
SELECT 'balance',465.00
DECLARE @ii INT
SELECT @ii=0
WHILE @ii<20000
BEGIN
INSERT INTO #cb(et,amount)
SELECT CASE WHEN RAND()<0.5
THEN 'sale' ELSE 'purchase'
END, CAST(RAND()*180.00 AS money)
SELECT @ii=@ii+1
END
--and put an index on it
CREATE CLUSTERED INDEX idxcbid ON #cb(cb_id)
--first try the correlated subquery approach...
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
SELECT MIN(balance), AVG(balance), MAX(balance) FROM ( SELECT [balance]=( SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb)g
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
elapsed time (secs)
---------------------------
250
-- Now let's try the "quirky" technique using SET
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
DECLARE @cb TABLE(cb_ID INT,
--sequence of entries 1..n
Et VARCHAR(10),
--entryType amount money,--quantity total money)
DECLARE @total money SET @total = 0
INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,
CASE WHEN Et='purchase'
THEN -amount ELSE amount
END,0
FROM #cb
UPDATE @cb
SET @total = total = @total + amount
FROM @cb
SELECT MIN(Total), AVG(Total), MAX(Total) FROM @cb
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
elapsed time (secs)
-----------------------------
1
almost too fast to be measured in seconds
-- now the simple iterative solution
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
SET nocount ON
DECLARE @ii INT, @iiMax INT, @CurrentBalance money
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb
WHILE @ii<=@iiMax
BEGIN
SELECT @currentBalance=@currentBalance +CASE
WHEN Et='purchase'
THEN -amount
ELSE amount
END
FROM #cb
WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total)
SELECT @ii,@currentBalance
SELECT @ii=@ii+1
END
SELECT MIN(Total), AVG(Total), MAX(Total) FROM @Runningtotals
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/* elapsed time (secs)
-------------------
2
thats a lot better than a correlated subquery but slower than using the SET trick now what about a cursor? */
SET Nocount ON
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money
--Declare the cursor --declare current_line cursor
-- SQL-92 syntax
---scroll forward only
DECLARE current_line CURSOR fast_forward--SQL Server only
---scroll forward FOR SELECT Et,amount FROM #cb
ORDER BY cb_id FOR READ ONLY
--now we open the cursor to populate any temporary tables (in the case of -- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line --fetch the first row
FETCH NEXT FROM current_line INTO @Et,@amount
WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN SELECT @CurrentBalance = COALESCE(@CurrentBalance,0) +CASE
WHEN @Et='purchase' THEN -@amount ELSE @amount
END
INSERT INTO @Runningtotals (Total) SELECT @CurrentBalance
-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line INTO @Et,@amount
END SELECT MIN(Total), AVG(Total), MAX(Total) FROM @Runningtotals
CLOSE current_line
--Do not forget to close when result set is not needed.
--especially a global updateable cursor!
DEALLOCATE current_line SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/* elapsed time (secs)
------------------------------
2
The iterative solution takes exactly the same time as the cursor!
(I got it to be slower (3 secs) with a SQL92 standard cursor
Cursor Variables
------------------------
--@@CURSOR_ROWS
The number of rows in the cursor --@@FETCH_STATUS Boolean value, success or failure of most recent fetch
---2 if a keyset FETCH returns a deleted row So here is a test harness just to see what the two variables will give at various points.
Try changing the cursor type to see what @@Cursor_Rows and @@Fetch_Status returns.
It works on our temporary Table */
--Declare the cursor
DECLARE @Bucket INT
--declare current_line cursor
--we only want to scroll forward
DECLARE current_line CURSOR keyset
--we scroll about (no absolute fetch)
/* TSQL extended cursors can be specified
[LOCAL or GLOBAL]
[FORWARD_ONLY or SCROLL]
[STATIC, KEYSET, DYNAMIC or FAST_FORWARD]
[READ_ONLY, SCROLL_LOCKS or OPTIMISTIC]
[TYPE_WARNING]*/
FOR SELECT 1 FROM #cb SELECT @@FETCH_STATUS, @@CURSOR_ROWS
OPEN current_line --fetch the first row
FETCH NEXT --NEXT , PRIOR, FIRST, LAST, ABSOLUTE n or RELATIVE n
FROM current_line INTO @bucket WHILE @@FETCH_STATUS = 0
--whilst all is well
BEGIN SELECT @@FETCH_STATUS, @@CURSOR_ROWS
FETCH NEXT FROM current_line INTO @Bucket
END
CLOSE current_line
DEALLOCATE current_line
/* if you change the cursor type definition routine above you'll notice that @@CURSOR_ROWS returns different values a negative value >1
is the number of rows currently in the keyset.
If it is -1 The cursor is dynamic.
A 0 means that no cursors are open or no rows qualified for the last opened cursor or the last-opened cursor is closed or deallocated.
a positive integer represents the number of rows in the cursor the most important type of cursors are...
FORWARD_ONLY
you can only go forward in sequence from data source, and changes made
to the underlying data source appear instantly.
DYNAMIC
Similar to FORWARD_ONLY, but You can access data using any order.
STATIC
Rows are returned as 'read only' without showing changes to the underlying data source. The data may be accessed in any order.
KEYSET
A dynamic data set with changes made to the underlying data appearing instantly, but insertions do not appear.

Cursor Optimization
---------------------------------
Use them only as a last resort. Set-based operations are usually fastest (but not always-see above), then a simple iteration, followed by a cursor
. Make sure that the cursor's SELECT statement contains only the rows and columns you need
. To avoid the overhead of locks, Use READ ONLY cursors rather than updatable cursors, whenever possible.
. , static and keyset cursors cause a temporary table to be created in TEMPDB, which can prove to be slow
. Use FAST_FORWARD cursors, whenever possible, and choose FORWARD_ONLY
cursors if you need updatable cursor and you only need to FETCH NEXT.

No comments:

Blog Archive