Friday, August 08, 2008

Codd's 12 rules

Codd's 12 rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. Codd, a pioneer of the relational model for databases,
designed to define what is required from a database management system in order for it to be considered relational, i.e., an RDBMS.

The rules

Rule 0: The system must qualify as relational, as a database, and as a management system.

For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database.

Rule 1 : The information rule:

All information in the database is to be represented in one and only one way, namely by values in column positions within rows of tables.

Rule 2: The guaranteed access rule:

All data must be accessible with no ambiguity. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number," in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active online catalog based on the relational model:

The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language. That is, users must be able to access the database's structure (catalog) using the same query language that they use to access the database's data.

Rule 5: The comprehensive data sublanguage rule:

The system must support at least one relational language that

(a) Has a linear syntax
(b) Can be used both interactively and within application programs,
(c) Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule:

All views that are theoretically updatable must be updatable by the system.

Rule 7: High-level insert, update, and delete:

The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

Rule 8: Physical data independence:

Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence:

Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.

Rule 10: Integrity independence:

Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications.

Rule 11: Distribution independence:

The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully :

(a) when a distributed version of the DBMS is first introduced; and
(b) when existing distributed data are redistributed around the system.

Rule 12: The nonsubversion rule:

If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system, for example, bypassing a relational security or integrity constraint.

ER Model: http://www.utexas.edu/its/archive/windows/database/datamodeling/dm/erintro.html

Tuesday, August 05, 2008

Setting Up SQL 2005 Deployments - Migrating from SQL 2000

Setting up Microsoft SQL Server 2005 and migrating existing data from SQL 2000 is a relatively easy task if your implementation is enclosed and you do not touch default settings. There are, however many subtleties that can cause SQL migrations to go horribly wrong. If you understand the SQL Server model, and the changes that have been made to SQL Server, many of these problems can easily be side stepped.

In this entry, I’ll cover a few basic principals of SQL 2005 and a suggested path to migration so that you may avoid these steps.

Collation

First and most importantly, SQL 2005 has changed the “collation” model. Collation is essentially the comparisons on which indices get built and comparison operators operate. If you are not careful, changing collation during setup can cause a mess of problems when migrating SQL 2000 data. Although the internals of your database will operate without problems, new SQL 2005 databases will not be able to communicate and pull data from your migrated 2000 database.

This situation is cross deployment. If you have one SQL server running in one collation and another collation running in another, you may experience problems with log shipping, mirroring, and cross-database communications.

The SQL 2005 model has shifted, using a new model. SQL Analysis Services 2005 will not use the old collations at all, however the SQL server itself is capable of running databases in the SQL 2000 collations for backwards compatibility. If you are running in a mixed environment, I suggest that you setup your SQL 2005 servers to run in a backwards-compatible fashion. If you plan to migrate completely to 2005, I see no reason not to lose the SQL 2000 collation for the SQL 2005 collation; this will ensure that future upgrades will go by with much greater ease.

By default, SQL 2005 is configured to run in “SQL_Latin1_General_CI_AS”, or “SQL Server; Latin1 General; Case Insensitive; Accent Sensitive”. This and all other modes beginning with “SQL”, are SQL 2000 compatibility modes. If you need to configure SQL server to talk with 2000 database servers, the default mode is the one you want. If you want a clean SQL 2005 installation, the 2005 equivalent “Latin1_General_CI_AS” will serve you better.

Choosing the default collation mode is important, as it affects tempdb (the temporary tables), master, and all newly created databases. This is not a setting that should be overlooked.

Although the collation mode can be set on a database, object (stored procedure, table or even column level), I would recommend to sticking with a single collation mode. Multiple collation modes not only complicate deployments, but also cause comparison issues across data fields. The only scenario where I can see a collation-specific object necessary is in locations that handle things like case-sensitive passwords. There you may want to have a CS (case sensitive) collation. There are however, better ways of achieving case-sensitivity for those types of comparisons.

Database Compatibility Mode

SQL Server 2005 is capable of running in several “database compatibility modes”, allowing 2000 and 7.0 databases to run within a SQL 2005 instance. This compatibility mode has benefits, but will restrict your feature use (such as .NET UDT’s) when running in down-level versions. If you will be performing any transactions where SQL 2000 will be manipulating or requesting your SQL 2005 data, you may want to run the SQL 2005 database in a backwards-compatible mode to ensure that queries will run. If you are working in a full SQL 2005 deployment, the compatibility mode should be set to “SQL Server 9.0” (2005).

Database compatibility mode can be accessed by right clicking on a database in the SQL Server Management Studio, clicking properties, then the “options” tab.

SQL Authentication, Mixed Mode and Windows Authentication

Note that it is suggested that you utilize “Windows Authentication” mode in your SQL deployments unless you have reason otherwise. This is the most secure implementation. Mixed mode will work fine (and I use it in my deployment for reasons I won’t explain here), but note that you will be creating Windows and/or Domain accounts to achieve things such as SSIS (formally DTS) and scheduled jobs.

Deployment and Migration

Deploying SQL Server 2005 and migrating data from SQL 2000 is not a difficult task, but if you wish to modify collations and compatibility modes of your databases, this deployment process must be done carefully. Although backing up your database on 2000 and restoring it on 2005 works, changes in collation mode and other SQL 2005 changes may cause unexpected problems. It is best to do a clean migration as I have described in this document. This will ensure your transition from SQL 2000 to SQL 2005 is smooth and your databases are consistent.

My migration solution includes 4 basic steps:

  • Install SQL 2005 and set collation modes correctly.

  • Script SQL 2000 databases.

  • Execute CREATE scripts on the SQL 2005 instance.

  • Import data from the SQL 2000 databases into the newly created SQL 2005 databases.

Pre-Install

Ensure you backup your SQL 2000 databases and uninstall SQL 2000 before continuing.

Installing SQL

SQL Installation is fairly straightforward, and I will not document it in detail. I would like to note however, that it is useful to start ALL SQL services after the installation is complete, and that you will need to choose your collation settings at this point. Use my guide and description above to gear collation towards your needs.

Setup Your Existing Databases

Setting up your existing databases is where the fun begins. For this example, I will be focusing on deployment on same-machine. New machine deployments will be handled in a very similar fashion so this is still a good guide (a few extra steps are required for same-machine deployments. This setup assumes that you will want to upgrade your databases to SQL 9.0 (2005) compatibility mode (so that you can use 2005-specific features).

  1. Add any existing SQL logins you have in 2000 to the logins in SQL 2005. You don’t need to configure their databases, but rather ensure they exist.

  2. Load your database backup into the new SQL 2005 installation with a new name. The name of the database and location it is to be stored should not be equal to the final location of the database.

  3. Run “EXEC sp_change_users_login update_one [@UserNamePattern] [@LoginName]” for each user in the database so that they are mapped back to their SQL Login in the new setup. This will ensure that all objects (including users) are scripted. More information on sp_change_users_login can be found here: http://www.transactsql.com/html/sp_change_users_login.html

  4. Right click on your database, and choose “Properties”.

  5. Click the “Options” tab and ensure the database compatibility mode is set to SQL 9.0. This ensures all objects (including SQL users) are scripted.

  6. Right click on the restored database, select “Tasks > Generate Scripts”.

  7. Ensure the correct database is selected to script, and check “Script all objects in the selected database”.

  8. Next, you are presented with a list in “Choose Script Options”. You will want to change:

    1. Script Database Create” to true

  9. Script Object-Level Permissions” to true

  10. Script USE DATABASE” to true

  11. DO NOT set “Script Collation” to true.

  12. Choose “Script to New Query Window” and execute the scripting process.

  13. When the new window is opened do a find and replace in the document for your restored database name to the final database name.

  14. Change the file paths in the “CREATE DATABASE” statement to reflect the final file path of the database.

  15. Execute the script.

  16. Refresh the databases in the object explorer and right click on the newly created database. Select “Tasks > Import Data”.

  17. Choose “SQL Native Client” as your data source, the SQL 2005 machine name as the database, and “Windows Authentication”. Select the restored 2000 database.

  18. Choose “SQL Native Client” for the destination, and ensure your final database is selected.

  19. Select “Copy data from one or more tables of views”.

  20. Select all tables, make sure no views are checked off.

    1. Check “Optimize for many tables”

  21. Check “Run in a transaction”

  22. NOTE: You may have to update timestamps/rowversions to not be copied (click “edit” on the mapping of a table, and set the destination of a column to “ignore”).

  23. Click next and run the import process.

At this point, you should have a fully working SQL 2005 structured database!

After Thoughts

Many of the new SQL 2005 features are ones that an upgrade won’t give you full benefit of. You may want to consider creating a few .NET UDT’s for more complex fields in your database. This may help you eliminate an abundance of rows that comprise one value, or values stored in a non-standard format for comparison purposes (ex. The “Version” datatype cannot be stored as string and be compared with ‘<’ or ‘>’ ).

Also, one of the big advantages of SQL 2005 is schemas. If you are properly authenticating systems (not using ‘sa’ for everything), you may want to consider sectioning off your applications, or pieces of specific applications with schemas. This helps maintain data integrity, security, and overall cleanliness (gives some order to those long stored procedure lists.

Sunday, July 20, 2008

More on Flat File Bulk Import methods speed comparison in SQL Server 2005

1. Database recovery model was set to Bulk logged. (This improved performance by 2-3 seconds for each method)

2. I tried BULK INSERT and BCP with and without TABLOCK option.
3. I also measured general processor utilization and RAM. IO reads and writes were similar in all methods.

everything else was the same.

Method Time (seconds) CPU utilization (%) RAM change (MB)
Bulk insert with Tablock 4648 75 +5MB
SSIS FastParse 5812 75 +30Mb
BCP with Tablock 6300 100 +5MB
Bulk insert 7812 75 +5MB
OpenRowset 8750 95 +5MB
BCP 11250 100 +5MB

SSIS also has a bulk insert tablock option set to true in the SQL Server destination. So my guess is that approx 1 second overhead comes from package startup time. However SSIS uses more memory than Bulk insert.

So if you can use TABLOCK option Bulk insert is the way to go. If not SSIS is.

It must be taken into account that the the test flat file i used had only 4 integer columns and you could set the FastParse option to all of them. In real world there are no such ideal conditions. And since i had only integer type i don't have to specify collations.

Modified scripts are:

BULK INSERT testBulkInsert
FROM 'd:\work\test.txt'
WITH (
FORMATFILE='d:\work\testImport-f-n.Fmt',
TABLOCK
)

insert into testOpenRowset(c1, c2, c3, c4)
SELECT t1.c1, t1.c2, t1.c3, t1.c4
FROM OPENROWSET
(
BULK 'd:\work\test.txt',
FORMATFILE = 'd:\work\testImport-f-n.Fmt'
) AS t1(c1, c2, c3, c4);

exec master..xp_cmdshell 'bcp test.dbo.testBCP in d:\work\test.txt -T -b1000000 -fd:\work\testImport-f-n.Fmt -h"tablock"'

Wednesday, July 16, 2008

SQL Server Cursors

What are cursors for?
--------------------------
Cursors were created to bridge the 'impedence mismatch' between the 'record- based' culture of conventional programming and the set-based world of the relational database.
They had a useful purpose in allowing existing applications to change from ISAM or KSAM databases, such as DBaseII, to SQL Server with the minimum of upheaval. DBLIB and ODBC make extensive use of them to 'spoof' simple file-based data sources. Relational database programmers won't need them but, if you have an application that understands only the process of iterating through resultsets, like flicking through a card index, then you'll probably need a cursor.


Where would you use a Cursor?
----------------------------------------
An simple example of an application for which cursors can provide a good solution is one that requires running totals.
A cumulative graph of monthly sales to date is a good example, as is a cashbook with a running balance.
We'll try four different approaches to getting a running total...

/*so lets build a very simple cashbook */
CREATE TABLE #cb ( cb_ID INT IDENTITY(1,1),--sequence of entries 1..n
Et VARCHAR(10), --entryType
amount money)--quantity
INSERT INTO #cb(et,amount) SELECT 'balance',465.00
INSERT INTO #cb(et,amount) SELECT 'sale',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',434.30
INSERT INTO #cb(et,amount) SELECT 'purchase',20.04
INSERT INTO #cb(et,amount) SELECT 'purchase',65.00
INSERT INTO #cb(et,amount) SELECT 'sale',23.22
INSERT INTO #cb(et,amount) SELECT 'sale',45.80
INSERT INTO #cb(et,amount) SELECT 'purchase',34.08
INSERT INTO #cb(et,amount) SELECT 'purchase',78.30
INSERT INTO #cb(et,amount) SELECT 'purchase',56.00
INSERT INTO #cb(et,amount) SELECT 'sale',75.22
INSERT INTO #cb(et,amount) SELECT 'sale',5.80
INSERT INTO #cb(et,amount) SELECT 'purchase',3.08
INSERT INTO #cb(et,amount) SELECT 'sale',3.29
INSERT INTO #cb(et,amount) SELECT 'sale',100.80
INSERT INTO #cb(et,amount) SELECT 'sale',100.22
INSERT INTO #cb(et,amount) SELECT 'sale',23.80

/* 1) Running Total using Correlated sub-query */
SELECT [Entry Type]=Et, amount,
[balance after transaction]=(
SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount
END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb
ORDER BY #cb.cb_id /*
2)Running Total using simple inner-join and group by clause */

SELECT [Entry Type]=MIN(#cb.Et), [amount]=MIN (#cb.amount),
[balance after transaction]= SUM(CASE WHEN total.Et='purchase' THEN -total.amount ELSE total.amount END)
FROM #cb total INNER JOIN #cb
ON total.cb_id <= #cb.cb_id
GROUP BY #cb.cb_id
ORDER BY #cb.cb_id

/* 3)and here is a very different technique that takes advantege of the quirky behavionr of SET in an UPDATE command in SQL Server */
DECLARE @cb TABLE(cb_ID INT,--sequence of entries 1..n Et VARCHAR(10), --entryType amount money,--quantity total money)
DECLARE @total money SET @total = 0
INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,CASE
WHEN Et='purchase' THEN -amount
ELSE amount END,0
FROM #cb UPDATE @cb
SET @total = total = @total + amount FROM @cb SELECT [Entry Type]=Et, [amount]=amount, [balance after transaction]=total FROM @cb ORDER BY cb_id

/* 4)or you can give up trying to do it a set-based way and iterate through the table */
DECLARE @ii INT, @iiMax INT, @CurrentBalance money DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb
WHILE @ii<=@iiMax
BEGIN
SELECT @currentBalance=@currentBalance +CASE WHEN Et='purchase' THEN -amount
ELSE amount END FROM #cb WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total)
SELECT @ii,@currentBalance SELECT @ii=@ii+1
END
SELECT [Entry Type]=Et,amount,total

/* or alternatively you can use....A CURSOR!!! the use of a cursor will normally involve a
DECLARE, OPEN, several FETCHs, a CLOSE and a DEALLOCATE */

SET Nocount ON DECLARE @Runningtotals TABLE (cb_id INT, Et VARCHAR(10), --entryType amount money, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money --Declare the cursor --declare current_line cursor -- SQL-92 syntax--only scroll forward
DECLARE current_line CURSOR fast_forward--SQL Server only
--only scroll forward
FOR SELECT Et,amount FROM #cb ORDER BY cb_id FOR READ ONLY --now we open the cursor to populate any temporary tables (in the case of -- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line
--fetch the first row
FETCH NEXT FROM current_line INTO @Et,@amount WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN
SELECT @CurrentBalance = COALESCE(@CurrentBalance,0) +CASE
WHEN @Et='purchase'
THEN -@amount ELSE @amount
END
INSERT INTO @Runningtotals (Et, amount,Total)
SELECT @Et,@Amount,@CurrentBalance -- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line INTO @Et,@amount
END
SELECT [Entry Type]=Et,amount,Total FROM @Runningtotals ORDER BY cb_id
CLOSE current_line
--Do not forget to close when its result set is not needed. --especially a global updateable cursor!
DEALLOCATE current_line -- although the Cursor code looks bulky and complex, on small tables it will
-- execute just as quickly as a simple iteration, and will be faster with tables
-- of any size if you forget to put an index on the table through which you're iterating!
-- The first two solutions are faster with small tables but slow down
-- exponentially as the table size grows.
Result
---------
balance 465.00 465.00
sale 56.00 521.00
sale 434.30 955.30
purchase 20.04 935.26
purchase 65.00 870.26
sale 23.22 893.48
sale 45.80 939.28
purchase 34.08 905.20
purchase 78.30 826.90
purchase 56.00 770.90
sale 75.22 846.12
sale 5.80 851.92
purchase 3.08 848.84
sale 3.29 852.13
sale 100.80 952.93
sale 100.22 1053.15
sale 23.80 1076.95
Global Cursors
------------------------
If you are doing something really complicated with a listbox, or scrolling through a rapidly-changing table whilst making updates,
a GLOBAL cursor could be a good solution,
but it is very much geared for traditional client-server applications, because cursors have a lifetime only of the connection.
Each 'client' therefore needs their own connection.
The GLOBAL cursors defined in a connection will be implicitly deallocated at disconnect.
Global Cursors can be passed too and from stored procedure and referenced in triggers.
They can be assigned to local variables. A global cursor can therefore be passed as a parameter to a number of stored procedures
Here is an example, though one is struggling to think of anything useful in a short example
CREATE PROCEDURE spReturnEmployee ( @EmployeeLastName VARCHAR(20), @MyGlobalcursor CURSOR VARYING OUTPUT )
AS
BEGIN
SET NOCOUNT ON
SET @MyGlobalcursor = CURSOR STATIC FOR
SELECT lname, fname FROM pubs.dbo.employee
WHERE lname = @EmployeeLastName
OPEN @MyGlobalcursor
END
DECLARE @FoundEmployee CURSOR, @LastName VARCHAR(20), @FirstName VARCHAR(20)
EXECUTE spReturnEmployee 'Lebihan', @FoundEmployee OUTPUT
--see if anything was found
--note we are careful to check the right cursor!
IF CURSOR_STATUS('variable', '@FoundEmployee') = 0 SELECT 'no such employee'
ELSE BEGIN FETCH NEXT FROM @FoundEmployee INTO @LastName, @FirstName
SELECT @FirstName+' '+@LastName END CLOSE @FoundEmployee
DEALLOCATE @FoundEmployee
Transact-SQL cursors are efficient when contained in stored procedures and triggers.
This is because everything is compiled into one execution plan on the server and there is no overhead of network traffic whilst fetching rows.
Are Cursors Slow?
----------------------------
So what really are the performance differences?
Let's set up a test-rig. We'll give it a really big cashbook to work on and give it a task that doesn't disturb SSMS/Query analyser too much.
We'll calculate the average balance, and the highest and lowest balance.
Now, which solution is going to be the best?
--recreate the cashbook but make it big!
DROP TABLE #cb
CREATE TABLE #cb (cb_ID INT IDENTITY(1,1),
--sequence of entries 1..n Et VARCHAR(10),
--entryType amount money)
--quantity
INSERT INTO #cb(et,amount)
SELECT 'balance',465.00
DECLARE @ii INT
SELECT @ii=0
WHILE @ii<20000
BEGIN
INSERT INTO #cb(et,amount)
SELECT CASE WHEN RAND()<0.5
THEN 'sale' ELSE 'purchase'
END, CAST(RAND()*180.00 AS money)
SELECT @ii=@ii+1
END
--and put an index on it
CREATE CLUSTERED INDEX idxcbid ON #cb(cb_id)
--first try the correlated subquery approach...
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
SELECT MIN(balance), AVG(balance), MAX(balance) FROM ( SELECT [balance]=( SELECT SUM(--the correlated subquery
CASE WHEN total.Et='purchase'
THEN -total.amount
ELSE total.amount END)
FROM #cb total WHERE total.cb_id <= #cb.cb_id )
FROM #cb)g
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
elapsed time (secs)
---------------------------
250
-- Now let's try the "quirky" technique using SET
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
DECLARE @cb TABLE(cb_ID INT,
--sequence of entries 1..n
Et VARCHAR(10),
--entryType amount money,--quantity total money)
DECLARE @total money SET @total = 0
INSERT INTO @cb(cb_id,Et,amount,total)
SELECT cb_id,Et,
CASE WHEN Et='purchase'
THEN -amount ELSE amount
END,0
FROM #cb
UPDATE @cb
SET @total = total = @total + amount
FROM @cb
SELECT MIN(Total), AVG(Total), MAX(Total) FROM @cb
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
elapsed time (secs)
-----------------------------
1
almost too fast to be measured in seconds
-- now the simple iterative solution
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
SET nocount ON
DECLARE @ii INT, @iiMax INT, @CurrentBalance money
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
SELECT @ii=MIN(cb_id), @iiMax=MAX(cb_id),@CurrentBalance=0 FROM #cb
WHILE @ii<=@iiMax
BEGIN
SELECT @currentBalance=@currentBalance +CASE
WHEN Et='purchase'
THEN -amount
ELSE amount
END
FROM #cb
WHERE cb_ID=@ii
INSERT INTO @runningTotals(cb_id, Total)
SELECT @ii,@currentBalance
SELECT @ii=@ii+1
END
SELECT MIN(Total), AVG(Total), MAX(Total) FROM @Runningtotals
SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/* elapsed time (secs)
-------------------
2
thats a lot better than a correlated subquery but slower than using the SET trick now what about a cursor? */
SET Nocount ON
DECLARE @StartTime Datetime
SELECT @StartTime= GETDATE()
DECLARE @Runningtotals TABLE (cb_id INT, Total money)
DECLARE @CurrentBalance money, @Et VARCHAR(10), @amount money
--Declare the cursor --declare current_line cursor
-- SQL-92 syntax
---scroll forward only
DECLARE current_line CURSOR fast_forward--SQL Server only
---scroll forward FOR SELECT Et,amount FROM #cb
ORDER BY cb_id FOR READ ONLY
--now we open the cursor to populate any temporary tables (in the case of -- cursors) etc..
--Cursors are unusual because they can be made GLOBAL to the connection.
OPEN current_line --fetch the first row
FETCH NEXT FROM current_line INTO @Et,@amount
WHILE @@FETCH_STATUS = 0--whilst all is well
BEGIN SELECT @CurrentBalance = COALESCE(@CurrentBalance,0) +CASE
WHEN @Et='purchase' THEN -@amount ELSE @amount
END
INSERT INTO @Runningtotals (Total) SELECT @CurrentBalance
-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM current_line INTO @Et,@amount
END SELECT MIN(Total), AVG(Total), MAX(Total) FROM @Runningtotals
CLOSE current_line
--Do not forget to close when result set is not needed.
--especially a global updateable cursor!
DEALLOCATE current_line SELECT [elapsed time (secs)]=DATEDIFF(second,@StartTime,GETDATE())
/* elapsed time (secs)
------------------------------
2
The iterative solution takes exactly the same time as the cursor!
(I got it to be slower (3 secs) with a SQL92 standard cursor
Cursor Variables
------------------------
--@@CURSOR_ROWS
The number of rows in the cursor --@@FETCH_STATUS Boolean value, success or failure of most recent fetch
---2 if a keyset FETCH returns a deleted row So here is a test harness just to see what the two variables will give at various points.
Try changing the cursor type to see what @@Cursor_Rows and @@Fetch_Status returns.
It works on our temporary Table */
--Declare the cursor
DECLARE @Bucket INT
--declare current_line cursor
--we only want to scroll forward
DECLARE current_line CURSOR keyset
--we scroll about (no absolute fetch)
/* TSQL extended cursors can be specified
[LOCAL or GLOBAL]
[FORWARD_ONLY or SCROLL]
[STATIC, KEYSET, DYNAMIC or FAST_FORWARD]
[READ_ONLY, SCROLL_LOCKS or OPTIMISTIC]
[TYPE_WARNING]*/
FOR SELECT 1 FROM #cb SELECT @@FETCH_STATUS, @@CURSOR_ROWS
OPEN current_line --fetch the first row
FETCH NEXT --NEXT , PRIOR, FIRST, LAST, ABSOLUTE n or RELATIVE n
FROM current_line INTO @bucket WHILE @@FETCH_STATUS = 0
--whilst all is well
BEGIN SELECT @@FETCH_STATUS, @@CURSOR_ROWS
FETCH NEXT FROM current_line INTO @Bucket
END
CLOSE current_line
DEALLOCATE current_line
/* if you change the cursor type definition routine above you'll notice that @@CURSOR_ROWS returns different values a negative value >1
is the number of rows currently in the keyset.
If it is -1 The cursor is dynamic.
A 0 means that no cursors are open or no rows qualified for the last opened cursor or the last-opened cursor is closed or deallocated.
a positive integer represents the number of rows in the cursor the most important type of cursors are...
FORWARD_ONLY
you can only go forward in sequence from data source, and changes made
to the underlying data source appear instantly.
DYNAMIC
Similar to FORWARD_ONLY, but You can access data using any order.
STATIC
Rows are returned as 'read only' without showing changes to the underlying data source. The data may be accessed in any order.
KEYSET
A dynamic data set with changes made to the underlying data appearing instantly, but insertions do not appear.

Cursor Optimization
---------------------------------
Use them only as a last resort. Set-based operations are usually fastest (but not always-see above), then a simple iteration, followed by a cursor
. Make sure that the cursor's SELECT statement contains only the rows and columns you need
. To avoid the overhead of locks, Use READ ONLY cursors rather than updatable cursors, whenever possible.
. , static and keyset cursors cause a temporary table to be created in TEMPDB, which can prove to be slow
. Use FAST_FORWARD cursors, whenever possible, and choose FORWARD_ONLY
cursors if you need updatable cursor and you only need to FETCH NEXT.

Tuesday, July 15, 2008

Maximum Capacity Specifications for SQL Server 2005

Batch size --- 65,536 * Network Packet Size --- 65,536 * Network Packet Size
Bytes per short string column --- 8,000 ---8,000
Bytes per text, ntext, image, varchar(max), nvarchar(max),
varbinary(max), or XML column --- 231 -1 bytes --- 231 -1 bytes
Bytes per GROUP BY, ORDER BY --- 8,060 --- 8,060
Bytes per index --- 900 --- 900
Bytes per foreign key --- 900 --- 900
Bytes per primary key --- 900 --- 900
Bytes per row --- 8,060 --- 8,060
Bytes in source text of a stored procedure --- Lesser of batch size or 250 MB --- Lesser of batch size or 250 MB
Clustered indexes per table --- 1 --- 1
Columns in GROUP BY, ORDER BY --- Limited only by number of bytes --- Limited only by number of bytes
Columns or expressions in a GROUP BY WITH CUBE or WITH ROLLUP
statement --- 10 --- 10
Columns per index --- 16 --- 16
Columns per foreign key --- 16 --- 16
Columns per primary key --- 16 --- 16
Columns per base table --- 1,024 --- 1,024
Columns per SELECT statement --- 4,096 --- 4,096
Columns per INSERT statement --- 1,024 --- 1,024
Connections per client --- Maximum value of configured connections --- Maximum value of configured connections
Database size --- 1,048,516 terabytes --- 1,048,516
Databases per instance of SQL Server --- 32,767 --- 32,767
Filegroups per database --- 32,767 --- 256
Files per database --- 32,767 --- 32,767
File size (data) --- 32 terabytes --- 32
File size (log) --- 32 terabytes --- 32 terabytes
Foreign key table references per table --- 253 --- 253
Identifier length (in characters) --- 128 --- 128
Instances per computer --- 50 --- 50
Length of a string containing SQL statements (batch size) --- 65,536 * Network packet size --- 65,536 * Network packet size
Locks per connection --- Maximum locks per server --- Maximum locks per server
Locks per instance of SQL Server --- Up to 2,147,483,647 --- Limited only by memory
Nested stored procedure levels --- 32 --- 32
Nested subqueries --- 32 --- 32
Nested trigger levels --- 32 --- 32
Nonclustered indexes per table --- 249 --- 249
Objects concurrently open in an instance of SQL Server --- 2,147,483,647 per database (depending on available memory) --- 2,147,483,647 per database (depending on available memory)
Objects in a database --- 2,147,483,647 --- 2,147,483,647
Parameters per stored procedure --- 2,100 --- 2,100
Parameters per user-defined function --- 2,100 --- 2,100
REFERENCES per table --- 253 --- 253
Rows per table --- Limited by available storage --- Limited by available storage
Tables per database --- Limited by number of objects in a database --- Limited by number of objects in a database
Tables per SELECT statement --- 256 --- 256
Triggers per table --- Limited by number of objects in a database --- Limited by number of objects in a database
UNIQUE indexes or constraints per table --- 249 nonclustered and 1 clustered --- 249 nonclustered and 1 clustered

Set password on SQL Server backup file

Use the backup database command with PASSWORD option. Sample is shown below;
----------------------------------------------------
BACKUP DATABASE DBName TO DISK='c:\DBName.bak' WITH INIT,PASSWORD='abc'

Restore with password
---------------------
RESTORE DATABASE DBName
FROM DISK = 'C:\DBName.bak'
WITH PASSWORD = 'abc',
MOVE 'DBName_Data' TO 'd:\MSSQL\MSSQL\DATA\DBName_Data.MDF',
MOVE 'DBName_Log' TO 'd:\MSSQL\MSSQL\DATA\DBName_Log.LDF'

A Story of the Deleted Transaction Log

After getting over the ridiculousness of the issue I began researching how to resolve the problem. I ran into a few hints in BOL and then other hints on the Internet. After all the research, I determined the best issue would be to use the DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option. BOL stated this option could repair a transaction log. Well what could be more broken about a transaction log than not having one at all?

Once I determined I would go down this path the first problem was that the db would not go online, not even in emergency mode. So what I did was stop the SQL Server service and then I renamed the MDF file. Let's say the database name was DbLive and I renamed the MDF from DbLive.mdf to DbBad.mdf. I then started the SQL Server service and created a new DbLive db with a corresponding DbLive_log.ldf file. I then stopped the SQL service again and deleted the DbLive.mdf file. I then renamed DbBad.mdf back to DbLive.mdf. I then started the SQL service again.

You can imagine that these steps didn't fix the problem. The db did not come on line, but at least the db now had a transaction log, albeit one from a different database. But as far as SQL server was concerned the log was there now. It was just in a jacked-up state (yes that is a technical term).

The next thing I did was run ALTER DATABASE DbLive SET EMERGENCY. This put the db in emergency mode and allowed SQL server to work with the db at an emergency level. Next I ran "sp_dboption 'DbLive', 'single user', 'true' ". This set the db to single user mode preparing it for the DBCC that I would run next. Finally I ran "DBCC CHECKDB ('DbLive', REPAIR_ALLOW_DATA_LOSS)". After about an hour the db was ONLINE!



/*
Stop SQL service, rename DbLive.mdf to DbBad.mdf.
started SQL service, created fake DbLive db (with log etc)
Stopped SQL service
Deleted DbLIve.mdf
Renamed DbBad.MDF to DbLive.MDF
Started SQL service.
Ran following script:
*/
ALTER DATABASE DbLive SET EMERGENCY
sp_dboption 'DbLive', 'single user', 'true'
DBCC CHECKDB ('DbLive', REPAIR_ALLOW_DATA_LOSS)
sp_dboption 'DbLive', 'single user', 'false'

Monday, July 14, 2008

Search text in SPs @ SQL Server 2005

SELECT Name
FROM sys.procedures
WHERE OBJECT_DEFINITION(object_id) LIKE '%foobar%'

SELECT OBJECT_NAME(object_id)
FROM sys.sql_modules
WHERE Definition LIKE '%foobar%'
AND OBJECTPROPERTY(object_id, 'IsProcedure') = 1

SELECT ROUTINE_NAME
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_DEFINITION LIKE '%foobar%'
AND ROUTINE_TYPE = 'PROCEDURE'

In SQL Server 2000
Let's say you are searching for 'foobar' in all your stored procedures. You can do this using the INFORMATION_SCHEMA.ROUTINES view, or syscomments:

SELECT ROUTINE_NAME, ROUTINE_DEFINITION
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_DEFINITION LIKE '%foobar%'
AND ROUTINE_TYPE='PROCEDURE'

Monday, May 12, 2008

History of SQL Server

In 1988, Microsoft released its first version of SQL Server.

It was designed for the OS/2 platform and was developed jointly by Microsoft and Sybase.

During the early 1990s, Microsoft began to develop a new version of SQL Server for the NT platform. While it was under development, Microsoft decided that SQL Server should be tightly coupled with the NT operating system.

In 1992, Microsoft assumed core responsibility for the future of SQL Server for NT. In 1993, Windows NT 3.1 and SQL Server 4.2 for NT were released. Microsoft's philosophy of combining a high-performance database with an easy-to-use interface proved to be very successful.

Microsoft quickly became the second most popular vendor of high-end relational database software.

In 1994, Microsoft and Sybase formally ended their partnership.

In 1995, Microsoft released version 6.0 of SQL Server. This release was a major rewrite of SQL Server's core technology. Version 6.0 substantially improved performance, provided built-in replication, and delivered centralized administration.

In 1996, Microsoft released version 6.5 of SQL Server. This version brought significant enhancements to the existing technology and provided several new features.

In 1997, Microsoft released version 6.5 Enterprise Edition. In 1998, Microsoft released version 7.0 of SQL Server, which was a complete rewrite of the database engine.

In 2000, Microsoft released SQL Server 2000. SQL Server version 2000 is Microsoft's most significant release of SQL Server to date. This version further builds upon the SQL Server 7.0 framework. According to the SQL Server development team, the changes to the database engine are designed to provide an architecture that will last for the next 10 years.

Rather than listing all the new features and enhancements found in 2000, I've decided to list my favorite changes. The remainder of this chapter is dedicated to discussing these new features found in version 2000.

SQL Server 2005

SQL Server 2005, released in November 2005, is the successor to SQL Server 2000. It included native support for managing XML data, in addition to relational data. For this purpose, it defined an xml data type that could be used either as a data type in database columns or as literals in queries. XML columns can be associated with XSD schemas; XML data being stored is verified against the schema. XML is converted to an internal binary data type before being stored in the database. Specialized indexing methods were made available for XML data. XML data is queried using XQuery; SQL Server 2005 added some extensions to the T-SQL language to allow embedding XQuery queries in T-SQL. In addition, it also defines a new extension to XQuery, called XML DML, that allows query-based modifications to XML data. SQL Server 2005 also allows a database server to be exposed over web services using TDS packets encapsulated within SOAP requests. When the data is access over web services, results are returned as XML.[27]

For relational data, T-SQL has been augmented with error handling features and support for recursive queries. SQL Server 2005 has also been enhanced with new indexing algorithms and better error recovery systems. Data pages are checksummed for better error resiliency, and optimistic concurrency support has been added for better performance. Permissions and access control have been made more granular and the query processor handles concurrent execution of queries in a more efficient way. Partitions on tables and indexes are supported natively, so scaling out a database onto a cluster is easier. SQL CLR was introduced with SQL Server 2005 to let it integrate with the .NET Framework.


SQL Server 2008

The next version of SQL Server is SQL Server 2008, code-named "Katmai", slated to launch on February 27, 2008 and release (RTM) in Q3 2008. most recent CTP was made available on February 19, 2008. SQL Server 2008 aims[33] to make data management self-tuning, self organizing, and self maintaining with the development of SQL Server Always On technologies, to provide near-zero downtime. SQL Server 2008 will also include support for structured and semi-structured data, including digital media formats for pictures, audio, video and other multimedia data. In current versions, such multimedia data can be stored as BLOBs (binary large objects), but they are generic bitstreams. Intrinsic awareness of multimedia data will allow specialized functions to be performed on them. According to Paul Flessner, senior Vice President, Server Applications, Microsoft Corp., SQL Server 2008 can be a data storage backend for different varieties of data: XML, email, time/calendar, file, document, spatial, etc as well as perform search, query, analysis, sharing, and synchronization across all data types.[30]

Other new data types include specialized date and time types and a Spatial data type for location-dependent data.[34] Better support for unstructured and semi-structured data is provided using the FILESTREAM[35] data type has been added, which can be used to reference any file stored on the file system.[36] Structured data and metadata about the file is stored in SQL Server database, whereas the unstructured component is stored in the file system. Such files can be accessed both via Win32 file handling APIs as well as via SQL Server using T-SQL; doing the latter accesses the file data as a binary BLOB. Backing up and restoring the database backs up or restores the referenced files as well.[37] SQL Server 2008 also natively supports hierarchical data, and included T-SQL constructs to directly deal with them, without using recursive queries.

SQL Server Release History
Version Year Release Name Codename
4.21
(WinNT)
1993 SQL Server 4.21 -
6.0 1995 SQL Server 6.0 SQL95
6.5 1996 SQL Server 6.5 Hydra
7.0 1999 SQL Server 7.0 Sphinx
- 1999 SQL Server 7.0
OLAP Tools
Plato
8.0 2000 SQL Server 2000 Shiloh
8.0 2003 SQL Server 2000
64-bit Edition
Liberty
9.0 2005 SQL Server 2005 Yukon
10.0 2008? SQL Server 2008 Katmai

SQL Server 2000 Vs SQL Server 2005 (Differences)

Differences in DBA

Feature SQL Server 2000 SQL Server 2005
Security Owner = Schema, hard to remove old users at times Schema is separate. Better granularity in easily controlling security. Logins can be authenticated by certificates.
Encryption No options built in, expensive third party options with proprietary skills required to implement properly. Encryption and key management build in.
High Availability Clustering or Log Shipping require Enterprise Edition. Expensive hardware. Clustering, Database Mirroring or Log Shipping available in Standard Edition. Database Mirroring can use cheap hardware.
Scalability Limited to 2GB, 4CPUs in Standard Edition. Limited 64-bit support. 4 CPU, no RAM limit in Standard Edition. More 64-bit options offer chances for consolidation.


Differences in Development

Feature SQL Server 2000 SQL Server 2005
Server Programming Extensions Limited to extended stored procedures, which are difficult to write and can impact the server stability. The incorporation of the CLR into the relational engine allows managed code written in .NET languages to run. Different levels of security can protect the server from poorly written code.
T-SQL Error Handling Limited to checking @@error, no much flexibility. Addition of TRY/CATCH allows more mature error handling. More error_xx functions can gather additional information about errors.
T-SQL Language SQL Language enhanced from previous versions providing strong data manipulation capabilities. All the power of SQL Server 2000 with the addition of CTEs for complex, recursive problems, enhanced TOP capabilities, PIVOT/APPLY/Ranking functions, and ROW_NUMBER
Auditing Limited support using triggers to audit changes. Robust event handling with EVENT NOTIFICATIONS, the OUTPUT clauses, and DDL triggers.
Large Data Types Limited to 8k for normal data without moving to TEXT datatypes. TEXT is hard to work with in programming environments. Includes the new varchar(max) types that can store up to 2GB of data in a single column/row.
XML Limited to transforming relational data into XML with SELECT statements, and some simple query work with transformed documents. Native XML datatype, support for schemas and full XPATH/XQUERY querying of data.
ADO.NET v1.1 of ADO.NET included enhancements for client development. v2 has more features, including automatic failover for database mirroring, support for multiple active result sets (MARS), tracing of calls, statistics, new isolation levels and more.
Messaging No messaging built into SQL Server. Includes Service Broker, a full-featured asynchronous messaging system that has evolved from Microsoft Message Queue (MSMQ), which is integrated into Windows.
Reporting Services An extremely powerful reporting environment, but a 1.0 product. Numerous enhancements, run-time sorting, direct printing, viewer controls and an enhanced developer experience.
ETL DTS is a very easy to use and intuitive tool. Limited capabilities for sources and transformations. Some constructs, such as loops, were very difficult to implement. Integration Services is a true programming environment allowing almost any source of data to be used and many more types of transformations to occur. Very complex environment that is difficult for non-DBAs to use. Requires programming skills.
Full-Text Search Workable solution, but limited in its capabilities. Cumbersome to work with in many situations. More open architecture, allowing integration and plug-ins of third party extensions. Much more flexible in search capabilities.

Wednesday, August 08, 2007

PL/SQL FAQs

PLSQL faq

· What is PL/SQL and what is it used for?
· Should one use PL/SQL or Java to code procedures and triggers?
· How can one see if somebody modified any code?
· How can one search PL/SQL code for a string/key value?
· How can one keep a history of PL/SQL code changes?
· How can I protect my PL/SQL source code?
· Can one print to the screen from PL/SQL?
· Can one read/write files from PL/SQL?
· Can one call DDL statements from PL/SQL?
· Can one use dynamic SQL statements from PL/SQL?
· What is the difference between %TYPE and %ROWTYPE?
· What is the result of comparing NULL with NULL?
· How does one get the value of a sequence into a PL/SQL variable?
· Can one execute an operating system command from PL/SQL?
· How does one loop through tables in PL/SQL?
· How often should one COMMIT in a PL/SQL loop? / What is the best commit strategy?
· I can SELECT from SQL*Plus but not from PL/SQL. What is wrong?
· What is a mutating and constraining table?
· Can one pass an object/table as an argument to a remote procedure?
· Is it better to put code in triggers or procedures? What is the difference?
· Is there a PL/SQL Engine in SQL*Plus?
· Is there a limit on the size of a PL/SQL block?
· Where can one find more info about PL/SQL?

Back to Oracle FAQ Index

What is PL/SQL and what is it used for?
PL/SQL is Oracle's Procedural Language extension to SQL. PL/SQL's language syntax, structure and data types are similar to
that of ADA. The PL/SQL language includes object oriented programming techniques such as encapsulation, function
overloading, information hiding (all but inheritance). PL/SQL is commonly used to write data-centric programs to manipulate
data in an Oracle database.
· Back to top of file

Should one use PL/SQL or Java to code procedures and triggers?
Internally the Oracle database supports two procedural languages, namely PL/SQL and Java. This leads to questions like
"Which of the two is the best?" and "Will Oracle ever desupport PL/SQL in favour of Java?".
Many Oracle applications are based on PL/SQL and it would be difficult of Oracle to ever desupport PL/SQL. In fact, all
indications are that PL/SQL still has a bright future ahead of it. Many enhancements are still being made to PL/SQL. For
example, Oracle 9iDB supports native compilation of Pl/SQL code to binaries.
PL/SQL and Java appeal to different people in different job roles. The following table briefly describes the difference between
these two language environments:
PL/SQL:
· Data centric and tightly integrated into the database
· Proprietary to Oracle and difficult to port to other database systems
· Data manipulation is slightly faster in PL/SQL than in Java
· Easier to use than Java (depending on your background)
Java:
· Open standard, not proprietary to Oracle
· Incurs some data conversion overhead between the Database and Java type systems
· Java is more difficult to use (depending on your background)
· Back to top of file

How can one see if somebody modified any code?
Code for stored procedures, functions and packages is stored in the Oracle Data Dictionary. One can detect code changes by
looking at the LAST_DDL_TIME column in the USER_OBJECTS dictionary view. Example:
SELECT OBJECT_NAME,
TO_CHAR(CREATED, 'DD-Mon-RR HH24:MI') CREATE_TIME,
TO_CHAR(LAST_DDL_TIME, 'DD-Mon-RR HH24:MI') MOD_TIME,
STATUS
FROM USER_OBJECTS
WHERE LAST_DDL_TIME > '&CHECK_FROM_DATE';
· Back to top of file

How can one search PL/SQL code for a string/ key value?
The following query is handy if you want to know where a certain table, field or expression is referenced in your PL/SQL source
code.
SELECT TYPE, NAME, LINE
FROM USER_SOURCE
WHERE UPPER(TEXT) LIKE '%&KEYWORD%';
· Back to top of file

How can one keep a history of PL/SQL code changes?
One can build a history of PL/SQL code changes by setting up an AFTER CREATE schema (or database) level trigger (available
from Oracle 8.1.7). This way one can easily revert to previous code should someone make any catastrophic changes. Look at
this example:
CREATE TABLE SOURCE_HIST -- Create history table
AS SELECT SYSDATE CHANGE_DATE, USER_SOURCE.*
FROM USER_SOURCE WHERE 1=2;

CREATE OR REPLACE TRIGGER change_hist -- Store code in hist table
AFTER CREATE ON SCOTT.SCHEMA -- Change SCOTT to your schema name
DECLARE
BEGIN
if DICTIONARY_OBJ_TYPE in ('PROCEDURE', 'FUNCTION',
'PACKAGE', 'PACKAGE BODY', 'TYPE') then
-- Store old code in SOURCE_HIST table
INSERT INTO SOURCE_HIST
SELECT sysdate, user_source.* FROM USER_SOURCE
WHERE TYPE = DICTIONARY_OBJ_TYPE
AND NAME = DICTIONARY_OBJ_NAME;
end if;
EXCEPTION
WHEN OTHERS THEN
raise_application_error(-20000, SQLERRM);
END;
/
show errors
· Back to top of file

How can I protect my PL/SQL source code?
PL/SQL V2.2, available with Oracle7.2, implements a binary wrapper for PL/SQL programs to protect the source code.
This is done via a standalone utility that transforms the PL/SQL source code into portable binary object code (somewhat larger
than the original). This way you can distribute software without having to worry about exposing your proprietary algorithms
and methods. SQL*Plus and SQL*DBA will still understand and know how to execute such scripts. Just be careful, there is no
"decode" command available.
The syntax is:
wrap iname=myscript.sql oname=xxxx.plb
· Back to top of file

Can one print to the screen from PL/SQL?
One can use the DBMS_OUTPUT package to write information to an output buffer. This buffer can be displayed on the screen
from SQL*Plus if you issue the SET SERVEROUTPUT ON; command. For example:
set serveroutput on
begin
dbms_output.put_line('Look Ma, I can print from PL/SQL!!!');
end;
/
DBMS_OUTPUT is useful for debugging PL/SQL programs. However, if you print too much, the output buffer will overflow. In
that case, set the buffer size to a larger value, eg.: set serveroutput on size 200000
If you forget to set serveroutput on type SET SERVEROUTPUT ON once you remember, and then EXEC NULL;. If you haven't
cleared the DBMS_OUTPUT buffer with the disable or enable procedure, SQL*Plus will display the entire contents of the buffer
when it executes this dummy PL/SQL block.
· Back to top of file

Can one read/write files from PL/SQL?
Included in Oracle 7.3 is an UTL_FILE package that can read and write operating system files. The directory you intend writing
to has to be in your INIT.ORA file (see UTL_FILE_DIR=... parameter). Before Oracle 7.3 the only means of writing a file was to
use DBMS_OUTPUT with the SQL*Plus SPOOL command.
Copy this example to get started:
DECLARE
fileHandler UTL_FILE.FILE_TYPE;
BEGIN
fileHandler := UTL_FILE.FOPEN('/tmp', 'myfile', 'w');
UTL_FILE.PUTF(fileHandler, 'Look ma, I''m writing to a file!!!\n');
UTL_FILE.FCLOSE(fileHandler);
EXCEPTION
WHEN utl_file.invalid_path THEN
raise_application_error(-20000, 'ERROR: Invalid path for file or path not in INIT.ORA.');
END;
/
· Back to top of file

Can one call DDL statements from PL/SQL?
One can call DDL statements like CREATE, DROP, TRUNCATE, etc. from PL/SQL by using the "EXECUTE IMMEDATE" statement.
Users running Oracle versions below 8i can look at the DBMS_SQL package (see FAQ about Dynamic SQL).
begin
EXECUTE IMMEDIATE 'CREATE TABLE X(A DATE)';
end;
NOTE: The DDL statement in quotes should not be terminated with a semicolon.
· Back to top of file

Can one use dynamic SQL statements from PL/SQL?
Starting from Oracle8i one can use the "EXECUTE IMMEDIATE" statement to execute dynamic SQL and PL/SQL statements
(statements created at run-time). Look at these examples. Note that statements are NOT terminated by semicolons:
EXECUTE IMMEDIATE 'CREATE TABLE x (a NUMBER)';

-- Using bind variables...
sql_stmt := 'INSERT INTO dept VALUES (:1, :2, :3)';
EXECUTE IMMEDIATE sql_stmt USING dept_id, dept_name, location;

-- Returning a cursor...
sql_stmt := 'SELECT * FROM emp WHERE empno = :id';
EXECUTE IMMEDIATE sql_stmt INTO emp_rec USING emp_id;
One can also use the older DBMS_SQL package (V2.1 and above) to execute dynamic statements. Look at these examples:
CREATE OR REPLACE PROCEDURE DYNSQL AS
cur integer;
rc integer;
BEGIN
cur := DBMS_SQL.OPEN_CURSOR;
DBMS_SQL.PARSE(cur, 'CREATE TABLE X (Y DATE)', DBMS_SQL.NATIVE);
rc := DBMS_SQL.EXECUTE(cur);
DBMS_SQL.CLOSE_CURSOR(cur);
END;
/
More complex DBMS_SQL example using bind variables:
CREATE OR REPLACE PROCEDURE DEPARTMENTS(NO IN DEPT.DEPTNO%TYPE) AS
v_cursor integer;
v_dname char(20);
v_rows integer;
BEGIN
v_cursor := DBMS_SQL.OPEN_CURSOR;
DBMS_SQL.PARSE(v_cursor, 'select dname from dept where deptno > :x', DBMS_SQL.V7);
DBMS_SQL.BIND_VARIABLE(v_cursor, ':x', no);
DBMS_SQL.DEFINE_COLUMN_CHAR(v_cursor, 1, v_dname, 20);
v_rows := DBMS_SQL.EXECUTE(v_cursor);
loop
if DBMS_SQL.FETCH_ROWS(v_cursor) = 0 then
exit;
end if;
DBMS_SQL.COLUMN_VALUE_CHAR(v_cursor, 1, v_dname);
DBMS_OUTPUT.PUT_LINE('Deptartment name: '||v_dname);
end loop;
DBMS_SQL.CLOSE_CURSOR(v_cursor);
EXCEPTION
when others then
DBMS_SQL.CLOSE_CURSOR(v_cursor);
raise_application_error(-20000, 'Unknown Exception Raised: '||sqlcode||' '||sqlerrm);
END;
/
· Back to top of file

What is the difference between %TYPE and %ROWTYPE?
The %TYPE and %ROWTYPE constructs provide data independence, reduces maintenance costs, and allows programs to adapt
as the database changes to meet new business needs.
%ROWTYPE is used to declare a record with the same types as found in the specified database table, view or cursor. Example:
DECLARE
v_EmpRecord emp%ROWTYPE;
%TYPE is used to declare a field with the same type as that of a specified table's column. Example:
DECLARE
v_EmpNo emp.empno%TYPE;
· Back to top of file

What is the result of comparing NULL with NULL?
NULL is neither equal to NULL, nor it is not equal to NULL. Any comparison to NULL is evaluated to NULL. Look at this code
example to convince yourself.
declare
a number := NULL;
b number := NULL;
begin
if a=b then
dbms_output.put_line('True, NULL = NULL');
elsif a<>b then
dbms_output.put_line('False, NULL <> NULL');
else
dbms_output.put_line('Undefined NULL is neither = nor <> to NULL');
end if;
end;
· Back to top of file

How does one get the value of a sequence into a PL/SQL variable?
As you might know, one cannot use sequences directly from PL/SQL. Oracle (for some silly reason) prohibits this:
i := sq_sequence.NEXTVAL;
However, one can use embedded SQL statements to obtain sequence values:
select sq_sequence.NEXTVAL into :i from dual;
Thanks to Ronald van Woensel
· Back to top of file

Can one execute an operating system command from PL/SQL?
There is no direct way to execute operating system commands from PL/SQL in Oracle7. However, one can write an external
program (using one of the precompiler languages, OCI or Perl with Oracle access modules) to act as a listener on a database
pipe (SYS.DBMS_PIPE). Your PL/SQL program then put requests to run commands in the pipe, the listener picks it up and run
the requests. Results are passed back on a different database pipe. For an Pro*C example, see chapter 8 of the Oracle
Application Developers Guide.
In Oracle8 one can call external 3GL code in a dynamically linked library (DLL or shared object). One just write a library in C/
C++ to do whatever is required. Defining this C/C++ function to PL/SQL makes it executable. Look at this External Procedure
example.
· Back to top of file

How does one loop through tables in PL/SQL?
Look at the following nested loop code example.
DECLARE
CURSOR dept_cur IS
SELECT deptno
FROM dept
ORDER BY deptno;
-- Employee cursor all employees for a dept number
CURSOR emp_cur (v_dept_no DEPT.DEPTNO%TYPE) IS
SELECT ename
FROM emp
WHERE deptno = v_dept_no;
BEGIN
FOR dept_rec IN dept_cur LOOP
dbms_output.put_line('Employees in Department '||TO_CHAR(dept_rec.deptno));
FOR emp_rec in emp_cur(dept_rec.deptno) LOOP
dbms_output.put_line('...Employee is '||emp_rec.ename);
END LOOP;
END LOOP;
END;
/
· Back to top of file

How often should one COMMIT in a PL/SQL loop? / What is the best commit strategy?
Contrary to popular believe, one should COMMIT less frequently within a PL/SQL loop to prevent ORA-1555 (Snapshot too old)
errors. The higher the frequency of commit, the sooner the extents in the rollback segments will be cleared for new
transactions, causing ORA-1555 errors.
To fix this problem one can easily rewrite code like this:
FOR records IN my_cursor LOOP
...do some stuff...
COMMIT;
END LOOP;
... to ...
FOR records IN my_cursor LOOP
...do some stuff...
i := i+1;
IF mod(i, 10000) THEN -- Commit every 10000 records
COMMIT;
END IF;
END LOOP;
If you still get ORA-1555 errors, contact your DBA to increase the rollback segments.
NOTE: Although fetching across COMMITs work with Oracle, is not supported by the ANSI standard.
· Back to top of file

I can SELECT from SQL*Plus but not from PL/SQL. What is wrong?
PL/SQL respect object privileges given directly to the user, but does not observe privileges given through roles. The
consequence is that a SQL statement can work in SQL*Plus, but will give an error in PL/SQL. Choose one of the following
solutions:
· Grant direct access on the tables to your user. Do not use roles!
· GRANT select ON scott.emp TO my_user;

· Define your procedures with invoker rights (Oracle 8i and higher);
· Move all the tables to one user/schema.
· Back to top of file

What is a mutating and constraining table?
"Mutating" means "changing". A mutating table is a table that is currently being modified by an update, delete, or insert
statement. When a trigger tries to reference a table that is in state of flux (being changed), it is considered "mutating" and
raises an error since Oracle should not return data that has not yet reached its final state.
Another way this error can occur is if the trigger has statements to change the primary, foreign or unique key columns of the
table off which it fires. If you must have triggers on tables that have referential constraints, the workaround is to enforce the
referential integrity through triggers as well.
There are several restrictions in Oracle regarding triggers:
· A row-level trigger cannot query or modify a mutating table. (Of course, NEW and OLD still can be accessed by the
trigger) .
· A statement-level trigger cannot query or modify a mutating table if the trigger is fired as the result of a CASCADE delete.
· Etc.
· Back to top of file

Can one pass an object/table as an argument to a remote procedure?
The only way the same object type can be referenced between two databases is via a database link. Note that it is not
enough to just use the same type definitions. Look at this example:
-- Database A: receives a PL/SQL table from database B
CREATE OR REPLACE PROCEDURE pcalled(TabX DBMS_SQL.VARCHAR2S) IS
BEGIN
-- do something with TabX from database B
null;
END;
/

-- Database B: sends a PL/SQL table to database A
CREATE OR REPLACE PROCEDURE pcalling IS
TabX DBMS_SQL.VARCHAR2S@DBLINK2;
BEGIN
pcalled@DBLINK2(TabX);
END;
/
· Back to top of file

Is it better to put code in triggers or procedures? What is the difference?
In earlier releases of Oracle it was better to put as much code as possible in procedures rather than triggers. At that stage
procedures executed faster than triggers as triggers had to be re-compiled every time before executed (unless cached). In
more recent releases both triggers and procedures are compiled when created (stored p-code) and one can add as much code
as one likes in either procedures or triggers.
· Back to top of file

Is there a PL/SQL Engine in SQL*Plus?
No. Unlike Oracle Forms, SQL*Plus does not have an embedded PL/SQL engine. Thus, all your PL/SQL code is sent directly to
the database engine for execution. This makes it much more efficient as SQL statements are not stripped off and sent to the
database individually.
· Back to top of file

Is there a limit on the size of a PL/SQL block?
Yes, the max size is not an explicit byte limit, but related to the parse tree that is created when you compile the code. You can
run the following select statement to query the size of an existing package or procedure:
SQL> select * from dba_object_size where name = 'procedure_name';

Monday, July 23, 2007

Make Money Blogging

make money blogging, bloggers, blogging, blogging software, blogging tool, blogging tools

Sunday, July 22, 2007

Informatica Interview Questions with Answers

1. Why we use stored procedure transformation?
For populating and maintaining data bases.
2 Why we use partitioning the session in informatica?
Partitioning achieves the session performance by reducing the time
period of reading the source and loading the data into target.
3 Why we use lookup transformations?
Lookup Transformations can access data from relational tables that
are not sources in mapping. With Lookup transformation, we can
accomplish the following tasks:
Get a related value-Get the Employee Name from Employee table based on
the Employee IDPerform Calculation.
Update slowly changing dimension tables - We can use unconnected lookup
transformation to determine whether the records already exist in the
target or not.
4 Why use the lookup transformation ?
To perform the following tasks.
Get a related value. For example, if your source table includes
employee ID, but you want to include the employee name in your target
table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a
calculation, such as gross sales per invoice or sales tax, but not the
calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup
transformation to determine whether records already exist in the
target.
5 Why u use repository connectivity?
When u edit,schedule the sesion each time,informatica server directly
communicates the repository to check whether or not the session and
users r valid.All the metadata of sessions and mappings will be stored
in repository.
6 While importing the relational source defintion from database,what
are the meta data of source U import?
Source name
Database location
Column names
Datatypes
Key constraints
7 Which transformation should we use to normalize the COBOL and
relational sources?
Normalizer Transformation.
When U drag the COBOL source in to the mapping Designer workspace,the
normalizer transformation automatically appears,creating input and
output ports for every column in the source.
8 Which transformation should u need while using the cobol sources
as source defintions?
Normalizer transformaiton which is used to normalize the data.Since
cobol sources r oftenly consists of Denormailzed data.
9 Which tool U use to create and manage sessions and batches and to
monitor and stop the informatica server?
Informatica server manager.
10 Where should U place the flat file to import the flat file
defintion to the designer?
Place it in local folder
11 When the informatica server marks that a batch is failed?
If one of session is configured to "run if previous completes" and
that previous session fails.
12 What r two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process,
and sends post-session email when the session completes.
The DTM process. Creates threads to initialize the session, read,
write, and transform data, and handle pre- and post-session operations.
13 What r the unsupported repository objects for a mapplet?
COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations.
Pre or post session stored procedures
Target defintions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source defintions
14 What r the types of metadata that stores in repository?
Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations
15 What r the types of maping wizards that r to be provided in
Informatica?
The Designer provides two mapping wizards to help you create mappings
quickly and easily. Both wizards are designed to create mappings for
loading and maintaining star schemas, a series of dimensions related to
a central fact table.
Getting Started Wizard. Creates mappings to load static fact and
dimension tables, as well as slowly growing dimension tables.
Slowly Changing Dimensions Wizard. Creates mappings to load slowly
changing dimension tables based on the amount of historical dimension
data you want to keep and the method you choose to handle historical
dimension data.
16 What r the types of maping in Getting Started Wizard?
Simple Pass through maping :
Loads a static fact or dimension table by inserting all rows. Use this
mapping when you want to drop all existing data from your table before
loading new data.
Slowly Growing target :
Loads a slowly growing fact or dimension table by inserting new rows.
Use this mapping to load new data when existing data does not require
updates.
17 What r the types of lookup?
Connected and unconnected
18 What r the types of lookup caches?
Persistent cache: U can save the lookup cache files and reuse them
the next time the informatica server processes a lookup transformation
configured to use the cache.
Recache from database: If the persistent cache is not synchronized with
he lookup table,U can configure the lookup transformation to rebuild
the lookup cache.
Static cache: U can configure a static or readonly cache for only
lookup table.By default informatica server creates a static cache.It
caches the lookup table and lookup values in the cache for each row
that comes into the transformation.when the lookup condition is
true,the informatica server does not update the cache while it
prosesses the lookup transformation.
Dynamic cache: If u want to cache the target table and insert new rows
into cache and the target,u can create a look up transformation to use
dynamic cache.The informatica server dynamically inerts data to the
target table.
shared cache: U can share the lookup cache between multiple
transactions.U can share unnamed cache between transformations inthe
same maping.
19 What r the types of groups in Router transformation?
Input group Output group
The designer copies property information from the input ports of the
input group to create a set of output ports for each output group.
Two types of output groups
User defined groups
Default group
U can not modify or delete default groups.
20 What r the types of data that passes between informatica server
and stored procedure?
3 types of data
Input/Out put parameters
Return Values
Status code.
21 what r the transformations that restricts the partitioning of
sessions?
Advanced External procedure tranformation and External procedure
transformation: This
transformation contains a check box on the properties tab to allow
partitioning.
Aggregator Transformation: If u use sorted ports u can not parttion the
assosiated source
Joiner Transformation : U can not partition the master source for a
joiner transformation
Normalizer Transformation
XML targets.
22 What r the tasks that source qualifier performs?
Join data originating from same source data base.
Filter records when the informatica server reads source data.
Specify an outer join rather than the default inner join
specify sorted records.
Select only distinct values from the source.
Creating custom query to issue a special SELECT statement for the
informatica server to read
source data.
23 What r the tasks that Loadmanger process will do?
Manages the session and batch scheduling: Whe u start the informatica
server the load maneger launches and queries the repository for a list
of sessions configured to run on the informatica server.When u
configure the session the loadmanager maintains list of list of
sessions and session start times.When u sart a session loadmanger
fetches the session information from the repository to perform the
validations and verifications prior to starting DTM process.
Locking and reading the session: When the informatica server starts a
session lodamaager locks the session from the repository.Locking
prevents U starting the session again and again.
Reading the parameter file: If the session uses a parameter
files,loadmanager reads the parameter file and verifies that the
session level parematers are declared in the file
Verifies permission and privelleges: When the sesson starts load manger
checks whether or not the user have privelleges to run the session.
Creating log files: Loadmanger creates logfile contains the status of
session.
24 what r the settiings that u use to cofigure the joiner
transformation?
Master and detail source
Type of join
Condition of the join
25 What r the session parameters?
Session parameters r like maping parameters,represent values U might
want to change between
sessions such as database connections or source files.
Server manager also allows U to create userdefined session
parameters.Following r user defined
session parameters.
Database connections
Source file names: use this parameter when u want to change the name or
location of
session source file between session runs
Target file name : Use this parameter when u want to change the name or
location of
session target file between session runs.
Reject file name : Use this parameter when u want to change the name or
location of
session reject files between session runs.
26 What r the scheduling options to run a sesion?
U can shedule a session to run at a given time or intervel,or u can
manually run the session.
Different options of scheduling
Run only on demand: server runs the session only when user starts
session explicitly
Run once: Informatica server runs the session only once at a specified
date and time.
Run every: Informatica server runs the session at regular intervels as
u configured.
Customized repeat: Informatica server runs the session at the dats and
times secified in the repeat dialog box.
27 What are the reusable transforamtions?
Reusable transformations can be used in multiple mappings.When u need
to incorporate this transformation into maping,U add an instance of it
to maping.Later if U change the definition of the transformation ,all
instances of it inherit the changes.Since the instance of reusable
transforamation is a pointer to that transforamtion,U can change the
transforamation in the transformation developer,its instances
automatically reflect these changes.This feature can save U great deal
of work.
28 What r the rank caches?
During the session ,the informatica server compares an inout row with
rows in the datacache.If the input row out-ranks a stored row,the
informatica server replaces the stored row with the input row.The
informatica server stores group information in an index cache and row
data in a data cache.
29 What r the out put files that the informatica server creates
during the session running?
Informatica server log: Informatica server(on unix) creates a log for
all status and error messages(default name: pm.server.log).It also
creates an error log for error messages.These files will be created in
informatica home directory.
Session log file: Informatica server creates session log file for each
session.It writes information about session into log files such as
initialization process,creation of sql commands for reader and writer
threads,errors encountered and load summary.The amount of detail in
session log file depends on the tracing level that u set.
Session detail file: This file contains load statistics for each
targets in mapping.Session detail include information such as table
name,number of rows written or rejected.U can view this file by double
clicking on the session in monitor window
Performance detail file: This file contains information known as
session performance details which helps U where performance can be
improved.To genarate this file select the performance detail option in
the session property sheet.
Reject file: This file contains the rows of data that the writer does
notwrite to targets.
Control file: Informatica server creates control file and a target file
when U run a session that uses the external loader.The control file
contains the information about the target flat file such as data format
and loading instructios for the external loader.
Post session email: Post session email allows U to automatically
communicate information about a session run to designated recipents.U
can create two different messages.One if the session completed
sucessfully the other if the session fails.
Indicator file: If u use the flat file as a target,U can configure the
informatica server to create indicator file.For each target row,the
indicator file contains a number to indicate whether the row was marked
for insert,update,delete or reject.
output file: If session writes to a target file,the informatica server
creates the target file based on file prpoerties entered in the session
property sheet.
Cache files: When the informatica server creates memory cache it also
creates cache files.For the following circumstances informatica server
creates index and datacache files.
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation
30 What r the options in the target session of update strategy
transsformatioin?
Insert
Delete
Update
Update as update
Update as insert
Update esle insert
Truncate table

Monday, July 16, 2007

Thursday, July 12, 2007

Data Integration: How Times Have Changed

Data Integration: How Times Have Changed

Enterprise data integration has clearly "arrived." The road had many twists and turns, yet data integration has not just survived, it has grown in strength and stature. How do we apply our collective learning from market developments to position ourselves better for 2007 and beyond?

Enterprise data integration, just a few years ago, meant no more than executing bucketfuls of spaghetti-like extract-transform-load (ETL) processes to load bulky and often unwieldy data marts and data warehouses. That was then. Spurred by product development and refocusing, artful solution convergence, and a flurry of mergers and acquisitions, the data integration landscape is now dramatically different. The primary goal remains to bring data from its source(s) to its destination(s) in a timely manner and useful form, but that is now a very loaded statement. You still have ETL, but in addition, you get access to a wide variety of data sources, services and applications in real-time, near-real time and batch modes. There's also data profiling, cleansing and standardization, query federation and virtual data models as well as master data management or "data verticalization" through hubs. These product hubs and customer hubs are glued together with integrated metadata management and service-oriented architectures, ready for consumption in your applications. Driven more by vendor innovation and "big picture" thinking than by customer demand, data integration moves ever closer to being a much-respected fixture in IT shops.

If you haven't looked at data integration solutions lately, do so today. In particular, customers who need data provisioning through enterprise application integration (EAI) and service-oriented and enterprise service bus architectures (SOA/ESB) would do well to take a close look at data integration technologies as well.

So where is enterprise data integration headed? For many vendors and customers, the primary purpose of integrating data across the enterprise is business intelligence (BI) or its latest avatar, corporate performance management (CPM). There's a lesson in this: if you are looking to maximize your return on BI/BPM investments, consider strengthening the "back end" data integration.

BI or CPM need not be the raison d'etre for data integration efforts. As SOA and collaborative solutions flourish in your organization, data integration becomes an integral component of the enterprise architecture and, thus, a key enabler of the business architecture. Data visualization and reporting solutions will remain important beneficiaries of data integration, but let your vision go beyond BI and BPM.

Blog Archive