Friday, August 08, 2008

Codd's 12 rules

Codd's 12 rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F. Codd, a pioneer of the relational model for databases,
designed to define what is required from a database management system in order for it to be considered relational, i.e., an RDBMS.

The rules

Rule 0: The system must qualify as relational, as a database, and as a management system.

For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database.

Rule 1 : The information rule:

All information in the database is to be represented in one and only one way, namely by values in column positions within rows of tables.

Rule 2: The guaranteed access rule:

All data must be accessible with no ambiguity. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number," in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active online catalog based on the relational model:

The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language. That is, users must be able to access the database's structure (catalog) using the same query language that they use to access the database's data.

Rule 5: The comprehensive data sublanguage rule:

The system must support at least one relational language that

(a) Has a linear syntax
(b) Can be used both interactively and within application programs,
(c) Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule:

All views that are theoretically updatable must be updatable by the system.

Rule 7: High-level insert, update, and delete:

The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

Rule 8: Physical data independence:

Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence:

Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.

Rule 10: Integrity independence:

Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications.

Rule 11: Distribution independence:

The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully :

(a) when a distributed version of the DBMS is first introduced; and
(b) when existing distributed data are redistributed around the system.

Rule 12: The nonsubversion rule:

If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system, for example, bypassing a relational security or integrity constraint.

ER Model: http://www.utexas.edu/its/archive/windows/database/datamodeling/dm/erintro.html

Tuesday, August 05, 2008

Setting Up SQL 2005 Deployments - Migrating from SQL 2000

Setting up Microsoft SQL Server 2005 and migrating existing data from SQL 2000 is a relatively easy task if your implementation is enclosed and you do not touch default settings. There are, however many subtleties that can cause SQL migrations to go horribly wrong. If you understand the SQL Server model, and the changes that have been made to SQL Server, many of these problems can easily be side stepped.

In this entry, I’ll cover a few basic principals of SQL 2005 and a suggested path to migration so that you may avoid these steps.

Collation

First and most importantly, SQL 2005 has changed the “collation” model. Collation is essentially the comparisons on which indices get built and comparison operators operate. If you are not careful, changing collation during setup can cause a mess of problems when migrating SQL 2000 data. Although the internals of your database will operate without problems, new SQL 2005 databases will not be able to communicate and pull data from your migrated 2000 database.

This situation is cross deployment. If you have one SQL server running in one collation and another collation running in another, you may experience problems with log shipping, mirroring, and cross-database communications.

The SQL 2005 model has shifted, using a new model. SQL Analysis Services 2005 will not use the old collations at all, however the SQL server itself is capable of running databases in the SQL 2000 collations for backwards compatibility. If you are running in a mixed environment, I suggest that you setup your SQL 2005 servers to run in a backwards-compatible fashion. If you plan to migrate completely to 2005, I see no reason not to lose the SQL 2000 collation for the SQL 2005 collation; this will ensure that future upgrades will go by with much greater ease.

By default, SQL 2005 is configured to run in “SQL_Latin1_General_CI_AS”, or “SQL Server; Latin1 General; Case Insensitive; Accent Sensitive”. This and all other modes beginning with “SQL”, are SQL 2000 compatibility modes. If you need to configure SQL server to talk with 2000 database servers, the default mode is the one you want. If you want a clean SQL 2005 installation, the 2005 equivalent “Latin1_General_CI_AS” will serve you better.

Choosing the default collation mode is important, as it affects tempdb (the temporary tables), master, and all newly created databases. This is not a setting that should be overlooked.

Although the collation mode can be set on a database, object (stored procedure, table or even column level), I would recommend to sticking with a single collation mode. Multiple collation modes not only complicate deployments, but also cause comparison issues across data fields. The only scenario where I can see a collation-specific object necessary is in locations that handle things like case-sensitive passwords. There you may want to have a CS (case sensitive) collation. There are however, better ways of achieving case-sensitivity for those types of comparisons.

Database Compatibility Mode

SQL Server 2005 is capable of running in several “database compatibility modes”, allowing 2000 and 7.0 databases to run within a SQL 2005 instance. This compatibility mode has benefits, but will restrict your feature use (such as .NET UDT’s) when running in down-level versions. If you will be performing any transactions where SQL 2000 will be manipulating or requesting your SQL 2005 data, you may want to run the SQL 2005 database in a backwards-compatible mode to ensure that queries will run. If you are working in a full SQL 2005 deployment, the compatibility mode should be set to “SQL Server 9.0” (2005).

Database compatibility mode can be accessed by right clicking on a database in the SQL Server Management Studio, clicking properties, then the “options” tab.

SQL Authentication, Mixed Mode and Windows Authentication

Note that it is suggested that you utilize “Windows Authentication” mode in your SQL deployments unless you have reason otherwise. This is the most secure implementation. Mixed mode will work fine (and I use it in my deployment for reasons I won’t explain here), but note that you will be creating Windows and/or Domain accounts to achieve things such as SSIS (formally DTS) and scheduled jobs.

Deployment and Migration

Deploying SQL Server 2005 and migrating data from SQL 2000 is not a difficult task, but if you wish to modify collations and compatibility modes of your databases, this deployment process must be done carefully. Although backing up your database on 2000 and restoring it on 2005 works, changes in collation mode and other SQL 2005 changes may cause unexpected problems. It is best to do a clean migration as I have described in this document. This will ensure your transition from SQL 2000 to SQL 2005 is smooth and your databases are consistent.

My migration solution includes 4 basic steps:

  • Install SQL 2005 and set collation modes correctly.

  • Script SQL 2000 databases.

  • Execute CREATE scripts on the SQL 2005 instance.

  • Import data from the SQL 2000 databases into the newly created SQL 2005 databases.

Pre-Install

Ensure you backup your SQL 2000 databases and uninstall SQL 2000 before continuing.

Installing SQL

SQL Installation is fairly straightforward, and I will not document it in detail. I would like to note however, that it is useful to start ALL SQL services after the installation is complete, and that you will need to choose your collation settings at this point. Use my guide and description above to gear collation towards your needs.

Setup Your Existing Databases

Setting up your existing databases is where the fun begins. For this example, I will be focusing on deployment on same-machine. New machine deployments will be handled in a very similar fashion so this is still a good guide (a few extra steps are required for same-machine deployments. This setup assumes that you will want to upgrade your databases to SQL 9.0 (2005) compatibility mode (so that you can use 2005-specific features).

  1. Add any existing SQL logins you have in 2000 to the logins in SQL 2005. You don’t need to configure their databases, but rather ensure they exist.

  2. Load your database backup into the new SQL 2005 installation with a new name. The name of the database and location it is to be stored should not be equal to the final location of the database.

  3. Run “EXEC sp_change_users_login update_one [@UserNamePattern] [@LoginName]” for each user in the database so that they are mapped back to their SQL Login in the new setup. This will ensure that all objects (including users) are scripted. More information on sp_change_users_login can be found here: http://www.transactsql.com/html/sp_change_users_login.html

  4. Right click on your database, and choose “Properties”.

  5. Click the “Options” tab and ensure the database compatibility mode is set to SQL 9.0. This ensures all objects (including SQL users) are scripted.

  6. Right click on the restored database, select “Tasks > Generate Scripts”.

  7. Ensure the correct database is selected to script, and check “Script all objects in the selected database”.

  8. Next, you are presented with a list in “Choose Script Options”. You will want to change:

    1. Script Database Create” to true

  9. Script Object-Level Permissions” to true

  10. Script USE DATABASE” to true

  11. DO NOT set “Script Collation” to true.

  12. Choose “Script to New Query Window” and execute the scripting process.

  13. When the new window is opened do a find and replace in the document for your restored database name to the final database name.

  14. Change the file paths in the “CREATE DATABASE” statement to reflect the final file path of the database.

  15. Execute the script.

  16. Refresh the databases in the object explorer and right click on the newly created database. Select “Tasks > Import Data”.

  17. Choose “SQL Native Client” as your data source, the SQL 2005 machine name as the database, and “Windows Authentication”. Select the restored 2000 database.

  18. Choose “SQL Native Client” for the destination, and ensure your final database is selected.

  19. Select “Copy data from one or more tables of views”.

  20. Select all tables, make sure no views are checked off.

    1. Check “Optimize for many tables”

  21. Check “Run in a transaction”

  22. NOTE: You may have to update timestamps/rowversions to not be copied (click “edit” on the mapping of a table, and set the destination of a column to “ignore”).

  23. Click next and run the import process.

At this point, you should have a fully working SQL 2005 structured database!

After Thoughts

Many of the new SQL 2005 features are ones that an upgrade won’t give you full benefit of. You may want to consider creating a few .NET UDT’s for more complex fields in your database. This may help you eliminate an abundance of rows that comprise one value, or values stored in a non-standard format for comparison purposes (ex. The “Version” datatype cannot be stored as string and be compared with ‘<’ or ‘>’ ).

Also, one of the big advantages of SQL 2005 is schemas. If you are properly authenticating systems (not using ‘sa’ for everything), you may want to consider sectioning off your applications, or pieces of specific applications with schemas. This helps maintain data integrity, security, and overall cleanliness (gives some order to those long stored procedure lists.

Blog Archive