Sunday, July 20, 2008

More on Flat File Bulk Import methods speed comparison in SQL Server 2005

1. Database recovery model was set to Bulk logged. (This improved performance by 2-3 seconds for each method)

2. I tried BULK INSERT and BCP with and without TABLOCK option.
3. I also measured general processor utilization and RAM. IO reads and writes were similar in all methods.

everything else was the same.

Method Time (seconds) CPU utilization (%) RAM change (MB)
Bulk insert with Tablock 4648 75 +5MB
SSIS FastParse 5812 75 +30Mb
BCP with Tablock 6300 100 +5MB
Bulk insert 7812 75 +5MB
OpenRowset 8750 95 +5MB
BCP 11250 100 +5MB

SSIS also has a bulk insert tablock option set to true in the SQL Server destination. So my guess is that approx 1 second overhead comes from package startup time. However SSIS uses more memory than Bulk insert.

So if you can use TABLOCK option Bulk insert is the way to go. If not SSIS is.

It must be taken into account that the the test flat file i used had only 4 integer columns and you could set the FastParse option to all of them. In real world there are no such ideal conditions. And since i had only integer type i don't have to specify collations.

Modified scripts are:

BULK INSERT testBulkInsert
FROM 'd:\work\test.txt'
WITH (
FORMATFILE='d:\work\testImport-f-n.Fmt',
TABLOCK
)

insert into testOpenRowset(c1, c2, c3, c4)
SELECT t1.c1, t1.c2, t1.c3, t1.c4
FROM OPENROWSET
(
BULK 'd:\work\test.txt',
FORMATFILE = 'd:\work\testImport-f-n.Fmt'
) AS t1(c1, c2, c3, c4);

exec master..xp_cmdshell 'bcp test.dbo.testBCP in d:\work\test.txt -T -b1000000 -fd:\work\testImport-f-n.Fmt -h"tablock"'

Wednesday, July 16, 2008

SQL Server Cursors

What are cursors for?
--------------------------
Cursors were created to bridge the 'impedence mismatch' between the 'record- based' culture of conventional programming and the set-based world of the relational database.
They had a useful purpose in allowing existing applications to change from ISAM or KSAM databases, such as DBaseII, to SQL Server with the minimum of upheaval. DBLIB and ODBC make extensive use of them to 'spoof' simple file-based data sources. Relational database programmers won't need them but, if you have an application that understands only the process of iterating through resultsets, like flicking through a card index, then you'll probably need a cursor.


Where would you use a Cursor?
----------------------------------------
An simple example of an application for which cursors can provide a good solution is one that requires running totals.
A cumulative graph of monthly sales to date is a good example, as is a cashbook with a running balance.
We'll try four different approaches to getting a running total...

/*so lets build a very simple cashbook */
CREATE TABLE #cb ( cb_ID INT IDENTITY(1,1),--sequence of entries 1..n
Et VARCHAR(10), --entryType
amount money)--quantity
INSERT INTO #cb(et,amount) SELECT 'balance',465.00
INSERT INTO #cb(et,amount) SELECT 'sale',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',434.30
INSERT INTO #cb(et,amount) SELECT 'purchase',20.04
INSERT INTO #cb(et,amount) SELECT 'purchase',65.00
INSERT INTO #cb(et,amount) SELECT 'sale',23.22
INSERT INTO #cb(et,amount) SELECT 'sale',45.80
INSERT INTO #cb(et,amount) SELECT 'purchase',34.08
INSERT INTO #cb(et,amount) SELECT 'purchase',78.30
INSERT INTO #cb(et,amount) SELECT 'purchase',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',75.22
INSERT INTO #cb(et,amount) SELECT 'sale',5.80
INSERT INTO #cb(et,amount) SELECT 'purchase',3.08
INSERT INTO #cb(et,amount) SELECT 'sale',3.29
INSERT INTO #cb(et,amount) SELECT 'sale',100.80
INSERT INTO #cb(et,amount) SELECT 'sale',100.22
INSERT INTO #cb(et,amount) SELECT 'sale',23.80

/* 1) Running Total using Correlated sub-query */
SELECT [Entry Type]=Et, amount,
[balance after transaction]=(
SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount
END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb
ORDER BY #cb.cb_id /*
2)Running Total using simple inner-join and group by clause */

SELECT [Entry Type]=MIN(#cb.Et), [amount]=MIN (#cb.amount),
[balance after transaction]= SUM(CASE WHEN total.Et='purchase' THEN -total.amount ELSE total.amount END)
FROM #cb total INNER JOIN #cb
ON total.cb_id <= #cb.cb_id
GROUP BY #cb.cb_id
ORDER BY #cb.cb_id

/* 3)and here is a very different technique that takes advantege of the quirky behavionr of SET in an UPDATE command in SQL Server */
DECLARE @cb TABLE(cb_ID INT,--sequence of entries 1..n Et VARCHAR(10), --entryType amount money,--quantity total money)
DECLARE @total money SET @total = 0
INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,CASE
WHEN Et='purchase' THEN -amount
ELSE amount END,0
FROM #cb UPDATE @cb
SET @total = total = @total + amount FROM @cb SELECT [Entry Type]=Et, [amount]=amount, [balance after transaction]=total FROM @cb ORDER BY cb_id

/* 4)or you can give up trying to do it a set-based way and iterate through the table */
DECLARE @ii INT, @iiMax INT, @CurrentBalance money DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb
WHILE @ii<=@iiMax
BEGIN
SELECT @currentBalance=@currentBalance +CASE WHEN Et='purchase' THEN -amount
ELSE amount END FROM #cb WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total)
SELECT @ii,@currentBalance SELECT @ii=@ii+1
END
SELECT [Entry Type]=Et,amount,total

/* or alternatively you can use....A CURSOR!!! the use of a cursor will normally involve a
DECLARE, OPEN, several FETCHs, a CLOSE and a DEALLOCATE */

SET Nocount ON DECLARE @Runningtotals TABLE (cb_id INT, Et VARCHAR(10), --entryType amount money, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money --Declare the cursor --declare current_line cursor -- SQL-92 syntax--only scroll forward
DECLARE current_line CURSOR fast_forward--SQL Server only
--only scroll forward
FOR SELECT Et,amount FROM #cb ORDER BY cb_id FOR READ ONLY --now we open the cursor to populate any temporary tables (in the case of -- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line
--fetch the first row
FETCH NEXT FROM current_line INTO @Et,@amount WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN
SELECT @CurrentBalance = COALESCE(@CurrentBalance,0) +CASE
WHEN @Et='purchase'
THEN -@amount ELSE @amount
END
INSERT INTO @Runningtotals (Et, amount,Total)
SELECT @Et,@Amount,@CurrentBalance -- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line INTO @Et,@amount
END
SELECT [Entry Type]=Et,amount,Total FROM @Runningtotals ORDER BY cb_id
CLOSE current_line
--Do not forget to close when its result set is not needed. --especially a global updateable cursor!
DEALLOCATE current_line -- although the Cursor code looks bulky and complex, on small tables it will
-- execute just as quickly as a simple iteration, and will be faster with tables
-- of any size if you forget to put an index on the table through which you're iterating!
-- The first two solutions are faster with small tables but slow down
-- exponentially as the table size grows.
Result
---------
balance 465.00 465.00
sale 56.00 521.00
sale 434.30 955.30
purchase 20.04 935.26
purchase 65.00 870.26
sale 23.22 893.48
sale 45.80 939.28
purchase 34.08 905.20
purchase 78.30 826.90
purchase 56.00 770.90
sale 75.22 846.12
sale 5.80 851.92
purchase 3.08 848.84
sale 3.29 852.13
sale 100.80 952.93
sale 100.22 1053.15
sale 23.80 1076.95
Global Cursors
------------------------
If you are doing something really complicated with a listbox, or scrolling through a rapidly-changing table whilst making updates,
a GLOBAL cursor could be a good solution,
but it is very much geared for traditional client-server applications, because cursors have a lifetime only of the connection.
Each 'client' therefore needs their own connection.
The GLOBAL cursors defined in a connection will be implicitly deallocated at disconnect.
Global Cursors can be passed too and from stored procedure and referenced in triggers.
They can be assigned to local variables. A global cursor can therefore be passed as a parameter to a number of stored procedures
Here is an example, though one is struggling to think of anything useful in a short example
CREATE PROCEDURE spReturnEmployee ( @EmployeeLastName VARCHAR(20), @MyGlobalcursor CURSOR VARYING OUTPUT )
AS
BEGIN
SET NOCOUNT ON
SET @MyGlobalcursor = CURSOR STATIC FOR
SELECT lname, fname FROM pubs.dbo.employee
WHERE lname = @EmployeeLastName
OPEN @MyGlobalcursor
END
DECLARE @FoundEmployee CURSOR, @LastName VARCHAR(20), @FirstName VARCHAR(20)
EXECUTE spReturnEmployee 'Lebihan', @FoundEmployee OUTPUT
--see if anything was found
--note we are careful to check the right cursor!
IF CURSOR_STATUS('variable', '@FoundEmployee') = 0 SELECT 'no such employee'
ELSE BEGIN FETCH NEXT FROM @FoundEmployee INTO @LastName, @FirstName
SELECT @FirstName+' '+@LastName END CLOSE @FoundEmployee
DEALLOCATE @FoundEmployee
Transact-SQL cursors are efficient when contained in stored procedures and triggers.
This is because everything is compiled into one execution plan on the server and there is no overhead of network traffic whilst fetching rows.
Are Cursors Slow?
----------------------------
So what really are the performance differences?
Let's set up a test-rig. We'll give it a really big cashbook to work on and give it a task that doesn't disturb SSMS/Query analyser too much.
We'll calculate the average balance, and the highest and lowest balance.
Now, which solution is going to be the best?
--recreate the cashbook but make it big!
DROP TABLE #cb
CREATE TABLE #cb (cb_ID INT IDENTITY(1,1),
--sequence of entries 1..n Et VARCHAR(10),
--entryType amount money)
--quantity
INSERT INTO #cb(et,amount)
SELECT 'balance',465.00
DECLARE @ii INT
SELECT @ii=0
WHILE @ii<20000
BEGIN
INSERT INTO #cb(et,amount)
SELECT CASE WHEN RAND()<0.5
THEN 'sale' ELSE 'purchase'
END, CAST(RAND()*180.00 AS money)
SELECT @ii=@ii+1
END
--and put an index on it
CREATE CLUSTERED INDEX idxcbid ON #cb(cb_id)
--first try the correlated subquery approach...
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
SELECT MIN(balance), AVG(balance), MAX(balance) FROM ( SELECT [balance]=( SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb)g
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
elapsed time (secs)
---------------------------
250
-- Now let's try the "quirky" technique using SET
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
DECLARE @cb TABLE(cb_ID INT,
--sequence of entries 1..n
Et VARCHAR(10),
--entryType amount money,--quantity total money)
DECLARE @total money SET @total = 0
INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,
CASE WHEN Et='purchase'
THEN -amount ELSE amount
END,0
FROM #cb
UPDATE @cb
SET @total = total = @total + amount
FROM @cb
SELECT MIN(Total), AVG(Total), MAX(Total) FROM @cb
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
elapsed time (secs)
-----------------------------
1
almost too fast to be measured in seconds
-- now the simple iterative solution
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
SET nocount ON
DECLARE @ii INT, @iiMax INT, @CurrentBalance money
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb
WHILE @ii<=@iiMax
BEGIN
SELECT @currentBalance=@currentBalance +CASE
WHEN Et='purchase'
THEN -amount
ELSE amount
END
FROM #cb
WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total)
SELECT @ii,@currentBalance
SELECT @ii=@ii+1
END
SELECT MIN(Total), AVG(Total), MAX(Total) FROM @Runningtotals
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/* elapsed time (secs)
-------------------
2
thats a lot better than a correlated subquery but slower than using the SET trick now what about a cursor? */
SET Nocount ON
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money
--Declare the cursor --declare current_line cursor
-- SQL-92 syntax
---scroll forward only
DECLARE current_line CURSOR fast_forward--SQL Server only
---scroll forward FOR SELECT Et,amount FROM #cb
ORDER BY cb_id FOR READ ONLY
--now we open the cursor to populate any temporary tables (in the case of -- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line --fetch the first row
FETCH NEXT FROM current_line INTO @Et,@amount
WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN SELECT @CurrentBalance = COALESCE(@CurrentBalance,0) +CASE
WHEN @Et='purchase' THEN -@amount ELSE @amount
END
INSERT INTO @Runningtotals (Total) SELECT @CurrentBalance
-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line INTO @Et,@amount
END SELECT MIN(Total), AVG(Total), MAX(Total) FROM @Runningtotals
CLOSE current_line
--Do not forget to close when result set is not needed.
--especially a global updateable cursor!
DEALLOCATE current_line SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/* elapsed time (secs)
------------------------------
2
The iterative solution takes exactly the same time as the cursor!
(I got it to be slower (3 secs) with a SQL92 standard cursor
Cursor Variables
------------------------
--@@CURSOR_ROWS
The number of rows in the cursor --@@FETCH_STATUS Boolean value, success or failure of most recent fetch
---2 if a keyset FETCH returns a deleted row So here is a test harness just to see what the two variables will give at various points.
Try changing the cursor type to see what @@Cursor_Rows and @@Fetch_Status returns.
It works on our temporary Table */
--Declare the cursor
DECLARE @Bucket INT
--declare current_line cursor
--we only want to scroll forward
DECLARE current_line CURSOR keyset
--we scroll about (no absolute fetch)
/* TSQL extended cursors can be specified
[LOCAL or GLOBAL]
[FORWARD_ONLY or SCROLL]
[STATIC, KEYSET, DYNAMIC or FAST_FORWARD]
[READ_ONLY, SCROLL_LOCKS or OPTIMISTIC]
[TYPE_WARNING]*/
FOR SELECT 1 FROM #cb SELECT @@FETCH_STATUS, @@CURSOR_ROWS
OPEN current_line --fetch the first row
FETCH NEXT --NEXT , PRIOR, FIRST, LAST, ABSOLUTE n or RELATIVE n
FROM current_line INTO @bucket WHILE @@FETCH_STATUS = 0
--whilst all is well
BEGIN SELECT @@FETCH_STATUS, @@CURSOR_ROWS
FETCH NEXT FROM current_line INTO @Bucket
END
CLOSE current_line
DEALLOCATE current_line
/* if you change the cursor type definition routine above you'll notice that @@CURSOR_ROWS returns different values a negative value >1
is the number of rows currently in the keyset.
If it is -1 The cursor is dynamic.
A 0 means that no cursors are open or no rows qualified for the last opened cursor or the last-opened cursor is closed or deallocated.
a positive integer represents the number of rows in the cursor the most important type of cursors are...
FORWARD_ONLY
you can only go forward in sequence from data source, and changes made
to the underlying data source appear instantly.
DYNAMIC
Similar to FORWARD_ONLY, but You can access data using any order.
STATIC
Rows are returned as 'read only' without showing changes to the underlying data source. The data may be accessed in any order.
KEYSET
A dynamic data set with changes made to the underlying data appearing instantly, but insertions do not appear.

Cursor Optimization
---------------------------------
Use them only as a last resort. Set-based operations are usually fastest (but not always-see above), then a simple iteration, followed by a cursor
. Make sure that the cursor's SELECT statement contains only the rows and columns you need
. To avoid the overhead of locks, Use READ ONLY cursors rather than updatable cursors, whenever possible.
. , static and keyset cursors cause a temporary table to be created in TEMPDB, which can prove to be slow
. Use FAST_FORWARD cursors, whenever possible, and choose FORWARD_ONLY
cursors if you need updatable cursor and you only need to FETCH NEXT.

Tuesday, July 15, 2008

Maximum Capacity Specifications for SQL Server 2005

Batch size --- 65,536 * Network Packet Size --- 65,536 * Network Packet Size
Bytes per short string column --- 8,000 ---8,000
Bytes per text, ntext, image, varchar(max), nvarchar(max),
varbinary(max), or XML column --- 231 -1 bytes --- 231 -1 bytes
Bytes per GROUP BY, ORDER BY --- 8,060 --- 8,060
Bytes per index --- 900 --- 900
Bytes per foreign key --- 900 --- 900
Bytes per primary key --- 900 --- 900
Bytes per row --- 8,060 --- 8,060
Bytes in source text of a stored procedure --- Lesser of batch size or 250 MB --- Lesser of batch size or 250 MB
Clustered indexes per table --- 1 --- 1
Columns in GROUP BY, ORDER BY --- Limited only by number of bytes --- Limited only by number of bytes
Columns or expressions in a GROUP BY WITH CUBE or WITH ROLLUP
statement --- 10 --- 10
Columns per index --- 16 --- 16
Columns per foreign key --- 16 --- 16
Columns per primary key --- 16 --- 16
Columns per base table --- 1,024 --- 1,024
Columns per SELECT statement --- 4,096 --- 4,096
Columns per INSERT statement --- 1,024 --- 1,024
Connections per client --- Maximum value of configured connections --- Maximum value of configured connections
Database size --- 1,048,516 terabytes --- 1,048,516
Databases per instance of SQL Server --- 32,767 --- 32,767
Filegroups per database --- 32,767 --- 256
Files per database --- 32,767 --- 32,767
File size (data) --- 32 terabytes --- 32
File size (log) --- 32 terabytes --- 32 terabytes
Foreign key table references per table --- 253 --- 253
Identifier length (in characters) --- 128 --- 128
Instances per computer --- 50 --- 50
Length of a string containing SQL statements (batch size) --- 65,536 * Network packet size --- 65,536 * Network packet size
Locks per connection --- Maximum locks per server --- Maximum locks per server
Locks per instance of SQL Server --- Up to 2,147,483,647 --- Limited only by memory
Nested stored procedure levels --- 32 --- 32
Nested subqueries --- 32 --- 32
Nested trigger levels --- 32 --- 32
Nonclustered indexes per table --- 249 --- 249
Objects concurrently open in an instance of SQL Server --- 2,147,483,647 per database (depending on available memory) --- 2,147,483,647 per database (depending on available memory)
Objects in a database --- 2,147,483,647 --- 2,147,483,647
Parameters per stored procedure --- 2,100 --- 2,100
Parameters per user-defined function --- 2,100 --- 2,100
REFERENCES per table --- 253 --- 253
Rows per table --- Limited by available storage --- Limited by available storage
Tables per database --- Limited by number of objects in a database --- Limited by number of objects in a database
Tables per SELECT statement --- 256 --- 256
Triggers per table --- Limited by number of objects in a database --- Limited by number of objects in a database
UNIQUE indexes or constraints per table --- 249 nonclustered and 1 clustered --- 249 nonclustered and 1 clustered

Set password on SQL Server backup file

Use the backup database command with PASSWORD option. Sample is shown below;
----------------------------------------------------
BACKUP DATABASE DBName TO DISK='c:\DBName.bak' WITH INIT,PASSWORD='abc'

Restore with password
---------------------
RESTORE DATABASE DBName
FROM DISK = 'C:\DBName.bak'
WITH PASSWORD = 'abc',
MOVE 'DBName_Data' TO 'd:\MSSQL\MSSQL\DATA\DBName_Data.MDF',
MOVE 'DBName_Log' TO 'd:\MSSQL\MSSQL\DATA\DBName_Log.LDF'

A Story of the Deleted Transaction Log

After getting over the ridiculousness of the issue I began researching how to resolve the problem. I ran into a few hints in BOL and then other hints on the Internet. After all the research, I determined the best issue would be to use the DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option. BOL stated this option could repair a transaction log. Well what could be more broken about a transaction log than not having one at all?

Once I determined I would go down this path the first problem was that the db would not go online, not even in emergency mode. So what I did was stop the SQL Server service and then I renamed the MDF file. Let's say the database name was DbLive and I renamed the MDF from DbLive.mdf to DbBad.mdf. I then started the SQL Server service and created a new DbLive db with a corresponding DbLive_log.ldf file. I then stopped the SQL service again and deleted the DbLive.mdf file. I then renamed DbBad.mdf back to DbLive.mdf. I then started the SQL service again.

You can imagine that these steps didn't fix the problem. The db did not come on line, but at least the db now had a transaction log, albeit one from a different database. But as far as SQL server was concerned the log was there now. It was just in a jacked-up state (yes that is a technical term).

The next thing I did was run ALTER DATABASE DbLive SET EMERGENCY. This put the db in emergency mode and allowed SQL server to work with the db at an emergency level. Next I ran "sp_dboption 'DbLive', 'single user', 'true' ". This set the db to single user mode preparing it for the DBCC that I would run next. Finally I ran "DBCC CHECKDB ('DbLive', REPAIR_ALLOW_DATA_LOSS)". After about an hour the db was ONLINE!



/*
Stop SQL service, rename DbLive.mdf to DbBad.mdf.
started SQL service, created fake DbLive db (with log etc)
Stopped SQL service
Deleted DbLIve.mdf
Renamed DbBad.MDF to DbLive.MDF
Started SQL service.
Ran following script:
*/
ALTER DATABASE DbLive SET EMERGENCY
sp_dboption 'DbLive', 'single user', 'true'
DBCC CHECKDB ('DbLive', REPAIR_ALLOW_DATA_LOSS)
sp_dboption 'DbLive', 'single user', 'false'

Monday, July 14, 2008

Search text in SPs @ SQL Server 2005

SELECT Name
FROM sys.procedures
WHERE OBJECT_DEFINITION(object_id) LIKE '%foobar%'

SELECT OBJECT_NAME(object_id)
FROM sys.sql_modules
WHERE Definition LIKE '%foobar%'
AND OBJECTPROPERTY(object_id, 'IsProcedure') = 1

SELECT ROUTINE_NAME
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_DEFINITION LIKE '%foobar%'
AND ROUTINE_TYPE = 'PROCEDURE'

In SQL Server 2000
Let's say you are searching for 'foobar' in all your stored procedures. You can do this using the INFORMATION_SCHEMA.ROUTINES view, or syscomments:

SELECT ROUTINE_NAME, ROUTINE_DEFINITION
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_DEFINITION LIKE '%foobar%'
AND ROUTINE_TYPE='PROCEDURE'

Blog Archive