Introduction to Transaction SQL

This is intended to be a complete reference of best practices and technique of T-SQL and its many uses, and will give you a flavor for where it came from and how it can be used. We will begin by covering the basics of the language. If you're looking for a particular topic you might want to skip to the relevant chapter, but to really make use of took you need a solid foundation to build from.

We will cover a lot of topics and answer a number of questions about SQL Server 2000 and Transact SQL in the next few pages. We will begin with some history and build into some practical applications of the language. Specifically in this chapter we will cover topics including:

  • What is it and where it came from
  • When and why you will use it
  • How to use some of the specific features that make this language so unique and powerful
  • What you can do with it

What is it and where did it come from?

Since the advent of the microprocessor and the widespread use of the PC in the early 1980's, affordable technology has finally caught up with the theory of relational databases. Introduced in 1970 by E.F. Codd as a way to manage large amounts of information by using a language based on what he called "Relational Calculus", the relational database has become central to our civilization. Information about our daily lives, what we buy, where we shop, the banks we use, the cars we drive, all the information we need to be able to transact business is kept in relational databases.

Fortunately someone had the good idea to standardize how we interact with these systems, and the Structured Query Language was born. Originally called SEQUEL (Structured English QUEry Language),it was developed by Microsoft research in support of an experimental relational database called System R that eventually morphed into DB2. It defines the access paths, not the physical implementation, for managing the information and data structures. Since the 1980's the American National Standards Institute (ANSI) has developed criteria for certifying database engines, like SQL-86, SQL-92 and SQL-99 are instances of those criteria that have been released. The current specification is over 2000 pages in length and is available at a cost from ANSI’s web site (www.ansi.org) or from the International Standards Organization ISO (www.iso.ch). Software manufacturers use these standard to drive the feature set they include to ensure that their product is certifiably ANSI compliant.

The Transact SQL Language is a superset of the ANSI specifications, meaning that it not only implements the required functionality, but it provides much more. It includes a number of programming extensions including variable handling, flow control, looping, and error handling, that allow the developer to write sophisticated logic to be executed on the server. In addition, the SQL language includes facilities for controlling access to the data, declaring relational integrity (DRI), and defining business rules to be enforced at the server level.

A brief history of Microsoft SQL Server

Microsoft came to be in the relational database arena in a circuitous route that started as an offshoot from a joint development effort with IBM do create the next generation of Operating Systems O/S 2. In 1985 the Intel 80286 processor was released and promised to provide release from the constraints of the 640KB memory space and a 8 bit architecture of the 8088 processor, and a new operating system was needed to compete with the recently released Macintosh. Shortly after the companies committed to work together, IBM announced they were developing a special higher end version of O/S 2 that would include a Database Manager that would be partially compatible with DB/2 on the Mainframe and a network manager based on the Systems Network Architecture (SNA) protocol. This left Microsoft holding the bag because who would buy the standard version?

At the time, most of the PC based desktop databases were single user flat file systems that was dominated by products like dBASE, Paradox and Rbase. In order to compete with IBM, Microsoft turned to Sybase, who was a startup company in 1987 that was trying to establish itself in the database market competing with Informix and Oracle. They had a product called DataServer that they were developing for the mid-range UNIX based computers. By convincing Sybase that PC’s would be fertile ground for growing future customers of their midrange product, Microsoft convinced Sybase to partner with them to port their product to O/S 2. And to add more fodder to the fire in this web of alliances and counter alliances, Microsoft went the additional step of convincing Ashton-Tate to support this new multi-user product and to develop future versions of dBASE to act as a client to the new Server based product. In the fall of 1988, the 3 companies announced their new product called Ashton-Tate/Microsoft SQL Server for O/S 2. The first beta of the product shipped that fall and version 1.0 was released in 1989.

By 1990 the relationship with Ashton-Tate wasn’t working. The new version of their product was late and had other problems as well. The different paradigm to developing for a multi-user environment with SQL and set based processing and that for a single user record at a time approach of dBASE showed the marriage wouldn’t last. In the time since the arrangement was announced the companies had found that their priorities had shifted, and Microsoft and Sybase decided to break with Ashton-Tate and released Microsoft SQL Server 1.1 in 1991.

A significant feature of note is that release 1.1 included support for the new client platform Windows 3.0. SQL Server was one of the first applications that provided a direct interface to develop applications against it using dynamic link libraries (DLL) calls. Version 1.11 included some bug fixes, but was still primarily a Sybase developed and supported product. Microsoft was dependent on Sybase to fix bugs and support customers with their issues. Sybase at the time was rapidly growing their mid-range market and found their priorities shifting more towards that market. Microsoft did eventually get permission for read-only access to the source code, but all changes had to be coordinated with Sybase in order that they could keep the various versions in sync.

Release 1.0 for Microsoft SQL Server corresponded with the code base for Sybase’s release 3.0 on other platforms, and after a couple years of adding new features and functionality, Sybase was planning to release version 4.0. This new version would include support for stored procedures, unions, backup devices among other things.

As some may remember, Windows 3.0 was a 16 bit operating system, but the new version of O/S 2 v2.0 would be 32 bit. Unfortunately IBM was running into problems developing a stable 32 bit product that they could release in a timely fashion. As a result, version 4.2 of Microsoft SQL server was released for 16 bit O/S 2 v1.3 in 1992.

At about this time, Microsoft realized that they needed to break from IBM and build their own 32 bit operating system, which they called Windows NT. It would include a symmetric multiprocessing environment with asynchronous I/O. The new operating system would be multi-threaded and would provide standard facilities for logging and management of the processes. Microsoft decided to bet the farm on this new platform and moved to make SQL Server one of the first applications for it when Windows NT was released to the public in July of 1993. Within 3 months Microsoft SQL Server 4.2 for NT was shipping to the general public.

By the end of 1993, the relationship between Microsoft and Sybase had changed as the company’s core focus had evolved from startup to maturity in the high tech industry. The new Windows NT platform was proving to be an alternative to the mini-computer UNIX based environment not only based on performance, but also on ease of use. The last release of SQL Server 4.2 was the last time that Microsoft and Sybase would share code. Since 1991 Microsoft had been slowly growing their stable of development expertise to be able stand on their own supporting and evolving the SQL Server product. In April of 1994, the companies announced the end of their relationship and each company went on to develop the product line exclusively for their respective platforms.

By the end of 1994 Microsoft began putting together plans for the next version of SQL Server. Originally called SQL95 but later released as Version 6.0 it would include support for scrollable cursors, database replication, better performance, a greatly expanded library of system stored procedures, and the inclusion of a better database management tool. For the first time, this new release would be developed entirely by Microsoft. The SQL Server team at Microsoft delivered it on time and it was a great success in the marketplace..

Version 6.5 came out 10 months later and included support for the Internet, data warehousing and distributed transactions. Another key event was that it gained certification as ANSI SQL compliant. SQL Server 7.0 brought us full row level locking, a new query processing component that became Microsoft Data Engine (MSDE) that supports distributed queries and a more efficient scheme for processing ad-hoc queries. SQL Server 2000 added user defined functions, new data types, and a host of other new features we will cover later in this chapter.

The list of enhancements in SQL Server 2000 includes

With each release, SQL Server has expanded its functionality and improved its performance as an enterprise level database engine. While there may be some who still try to deny it, Microsoft has established themselves as a serious player in the DBMS industry and has built a core competency at taking a good product and making it better.

What can we do with it?

The Structured Query Language (SQL) is the primary way of interacting with most relational database management systems. SQL Server is no exception. While we assume you have a base level of understanding, we will quickly cover the basics of the language before we get into how SQL Server has extended it to provide the developer with better ways of using the database. If you find that this discussion goes too fast, there are several good books available including Robin Dewson's book 'SQL Server 2000 Programming' and 'Beginning SQL Programming' ISBN 1-861001-80-0 that will give you a more complete introduction to the language.

SQL is the interface for managing information in relational databases. With it we tell the database what we want to select, insert, update and delete. Each of these commands follows a standard syntax and use various clauses to determine exactly which rows are affected. Taken together, these form what is referred to as the Data Manipulation Language or DML.

Data Manipulation Language (DML)

Data Manipulation Language (DML) provides the language constructs for selecting information from the tables, adding, modifying, and deleting rows of data, and generally providing a standardized access method to the data. Any statement that adds, modifies or views information stored in the database is considered part of the Data Manipulation Language. There are 4 basic statements that make up the bulwark of this category, including SELECT, INSERT, UPDATE, and DELETE.

The SELECT statement

One of the first uses of a relational database any developer will have is learning how to retrieve information from the database. A basic SELECT

statement consists of a series of clauses that define what information we are interested in, and how we want to view it. These include a list of columns to include in the result set, the names of the tables involved, filter criteria that specify how to limit what we are looking for, and other optional clauses which further refine the results.

Select

, , ...

From ...

Where ...

For example to select

a list of names from the supplier table in the Northwind database use the following command:

SELECT

SupplierID, CompanyName, ContactName, *

FROM Suppliers

WHERE Country <> 'USA'

ORDER BY Country

This is pretty basic stuff in that it returns a simple list of suppliers. The SELECT

clause tells the query engine what columns we are interested in. The * is a wildcard that indicates we want all columns in the table to follow the ones that we specified. The FROM clause indicates where the information is stored, and the WHERE clause filters it down to a subset of the whole table. This is pretty basic stuff that is typical of most relational databases. The more advanced dbms's give us the ability go quite a bit further in refining our queries. Suppose we wanted to find out how many suppliers were from each country. The ANSI SQL language specifies some basic aggregation and summarization functionality that will run unchanged on most databases:

SELECT

Country, COUNT(*) as 'COUNT'

FROM Suppliers

GROUP BY Country

ORDER BY 'COUNT' DESC

We've added a GROUP BY clause which tells the database how we want to aggregate our results, and an ORDER BY clause to sort it by the countries where we find the most suppliers. Transact SQL provides 14 aggregate, 23 string, 9 date and time, and 23 mathematical functions out of the box in addition to several other system and state functions that we can use in developing our code. In addition to the many built in functions, Transact SQL also provides a number of operators that are worth mention.

Distinct

Eliminating duplicate records from our rowset can be done within our query by specifying the DISTINCT keyword in our SELECT clause. It will cause the query engine to return only unique rows from our query.

SELECT

DISTINCT Country

FROM Suppliers

Top

The TOP operator is Microsoft extension in Transact SQL that allows us to limit the number of rows we want in our result set. After the query has been processed, the result set is limited to the number of rows specified. We can either use a hard number like 10, or include the PERCENT keyword to indicate that we want to see only a percentage of the full results set.

SELECT

TOP 10 ProductID, Sum(UnitPrice * Quantity) as Sales

FROM [Order Details]

GROUP BY ProductID

ORDER BY Sales

Trying to accomplish the same results on other database platforms requires much more work and creativity on the part of the developer!

Subqueries

A subquery is a complete select

statement that returns a single value and is nested within another SQL statement in place of an expression. We can use the subquery anywhere that we can use an expression, whether it is part of the SELECT

clause or WHERE clause. We can use the subquery to perform lookups on related tables, or to evaluate a condition. We might need to select the orphaned rows which are those records from the child table which have no parent record. In our Northwind example we need to reduce our supplier count and eliminate suppliers who have no product sales. One way to return the list of suppliers with no orders using subqueries is:

SELECT

SupplierID, CompanyName FROM Suppliers S

WHERE NOT EXISTS

(SELECT

'x' FROM [Order Details] D, Products P

WHERE D.ProductID = P.ProductID

AND S.SupplierID = P.SupplierID)

String Concatenation

One of the basic operations performed in the database is to generate human-readable output. Because the process of normalization will break apart things like addresses and names into multiple parts that need to be reassembled, string concatenation is an important function to understand. SQL Server's approach is to use the + operator between character based values to concatenate. For example, if we wanted to do a report in which we wanted to return a single column with our Employee's name we could use the following:

SELECT

FirstName + ' ' + LastName FROM Employees

We could, for example, write a SQL Script where the output is another SQL Script. This generates a script which counts the user tables in the current database.

SELECT

'SELECT

''' + name + ''', count(*) from [' + name + ']'

FROM sysobjects

WHERE type = 'U'

ORDER BY name

One note to developers is that this syntax is specific to SQL Server and doesn't work on other databases such as Oracle (use two pipe characters || to concatenate two strings). Also you may notice the square brackets ('[' and ']') in the code. These are necessary when the database schema contains non-standard schema. In the Northwind database the Order Detail table's name has a space in between Order and Detail. In order to reference non-standard names we simply surround it with the brackets.

Comments

A quick note on comments. Just as all good code is self-documenting, we may find at times that it is helpful to others who read our code to find comments from the author explaining what is going on. Transact SQL recognizes two forms of comments: Single-line and multi-line.

The -- operator indicates that from that point on the line is a comment text and should be ignored by the query engine. In Query Analyzer, the commented text is colored as a comment (green by default).

SELECT

GetDate() -- This is a comment from here on, I can type whatever I want

Multi-line comments follow the C++ syntax of /* comment text */, where the comment text can cover multiple lines of text.

/* This is a multi-line comment, none of the statements are ever processed

DECLARE @myDate DATETIME

SET @myDate = GetDate()

PRINT 'The current date is ' & convert(varchar(18), @MyDate, 110)

*/

Cast & Convert

These functions allow you to convert from one data type to another. For example if you have ever worked on a cross platform database transformation project in which you are importing data from one DBMS to another, you may have discovered that the date formats recognized by the different vendors is not the same. In Oracle a valid date field has the format ’16 MAR 2004’ while in SQL Server the default format is ‘2004-03-16’. If you are exporting data to a comma separated value file to be loaded into Oracle, you need to use the convert function as:

Select

convert(varchar(12), mydate, 113)

The first argument is the target datatype, the second is the source value, and the third is an optional style value. The convert statement is also useful if you are transforming data from oen schema to another and need to concatenate columns but need to restrict the resulting size.

Select

convert(varchar(35), RTRIM(FNAME) + ' ' + RTRIM LNAME))

Another use is if you are working with text or binary data that is larger than the 8000 character limit of the varchar datatype. SQL Server doesn’t allow these types of columns to be used for comparison or sorting unless you use the LIKE operator. If you had a table with a column called CODE_TAG that was defined as TEXT, and you wanted to select

all records where it equals ‘BO’ you would have to use the following syntax:

select

* from codes where convert(varchar(2), CODE_TAG) = ‘BO

Case.

The CASE expression allows you to evaluate a variable or a condition inline and return one of multiple values as the SQL statement is being run. The basic version takes a source variable or column and depending on the value returns one of a discrete set of values.

Select

case datepart(month, GetDate())

When 1 then 'January is nice'

When 2 then 'February is cold'

When 3 then 'Spring is coming'

Else 'Its not winter anymore'

End

You can also evaluate Boolean conditions. These are evaluated in order until the conditions are satisfied and the resulting value is returned.

Select

case

when datepart(month, GetDate()) > 9

then 'Winter is here'

when datepart(hour, GetDate()) > 22

then 'Its getting late'

when convert(varchar(12), getdate(), 113) = '28 NOV 2003'

then 'Everythings on SALE!'

when datepart(weekday, GetDate()) = 6

then 'TGIF!'

Else 'Who knows???'

Case can be used anywhere an expression can, including defining views that dynamically transform data based on current conditions. For example we could use this technique to implement a 10 hour sale on all products for the day after Thanksgiving in the Northwind database by creating a view called PriceLookUp as follows:

CREATE VIEW PriceLookUp AS

Select

ProductID, ProductName, CategoryID, UnitPrice,

Case when getdate() > '28-NOV-2003 08:00' and

getdate() < '28-NOV-2003 18:00'

then UnitPrice * .85

else UnitPrice

end as SalePrice

from Northwind..Products

go

select

GetDate(), UnitPrice, SalePrice, * from PriceLookUp

We could have written a batch and scheduled it to run at the appropriate times to update the sale price, but that would’ve added a dependency on the scheduler, the amount of time it would take to process all the rows (mature retailer product tables likely have hundreds of thousands of rows in them).

Coalesce, Nullif, IsNull

Microsoft took the liberty of providing us with commonly used functions to make our lives easier. These are cousins of the CASE expression and could be coded in that way. The Coalesce function returns the first non-null expression in a list, otherwise it returns null.

Coalesce (expression1, expression2, ...)

-- this is essentially the same as:

case when expression1 is not null expression1

when expression2 is not null expression2

etc...

For example if we wanted to return the last date that an order status was changed we could use the following syntax:

select

coalesce (OrderDate, RequiredDate, ShippedDate) as LastUpdate

from orders

NullIf and IsNull operate similarly, the former returning null if a condition is true, and the second an alternate value if the expression is null.

NullIf(expression1, expression2)

-- NullIf is evaluated as

case when expression1 = expression2 then Null

else expression1

IsNull(expression1, expression2)

-- This is evaluated as

case when expression1 is null then expression 2

else expression1

-- NullIf is evaluated as

case when expression1 = expression2 then Null

else expression1

SQL Books has a complete listing of the system functions and provides you with syntax diagrams for how to use them.

Joining tables

As the name "Relational" implies, information in well-designed databases is stored in more than a single table, and much of our work is to develop means for accessing it effectively. When more than one table is referenced in a query, they are said to be joined. A join is specific criteria which define how two or more tables are related. Joins come in two flavors, inner and outer. The basic difference is that inner joins will only return rows in which both tables have columns matching the join criteria.

Inner

Join

Left Outer Join

Suppliers

Supplier ID primary key

CmpyName info col

Country info col

Products

ProdID primary key

ProdName info column

SupplierID foreign key

Units info column

Right Outer Join


An inner join is defined as the union of the set of records in which the SupplierID column in the Suppliers table has records which match Products SupplierID column. Records must exist in both tables in order for it to be part of the resulting set of rows returned. The join criteria can be specified as part of the FROM clause or as part of the WHERE. The two following statements return identical results:

SupplierID

CmpyName

Country

1

ABC Widget

USA

2

Sprocket 4U

UK

3

Isa Cmpy

CA

ProdID

ProdName

SupplierID

Units

1

WidgOne

1

4

2

WidgTwo

1

6

3

Spockit

2

3

4

WidgThree

1

1

-- SQL:89 Inner Join Syntax

SELECT

*

FROM Suppliers S, Products P

WHERE S.SupplierID = P.SupplierID

AND P.Units > 2

-- SQL:92 Inner Join Syntax

SELECT

*

FROM Suppliers as S JOIN Products as P ON S.SupplierID = P.SupplierID

WHERE P.Units > 2

Running these statements will result in the selection of rows from both tables which meet the join criteria and will exclude rows which do not have a match. In our resultset we won't return Supplier 3 Isa Cmpy because they don't have any products, and we don't return WidgThree from the Product table because it's Units are less than the selection criteria.

SupplierID

CmpyName

Country

ProdID

ProdName

SupplierID

Units

1

ABC Widget

USA

1

WidgOne

1

4

1

ABC Widget

USA

2

WidgTwo

1

6

2

Sprocket 4U

UK

3

Spockit

2

3

A left join (also called a left outer join) would return rows from the first table and information from the other if it exists. If it doesn't then nulls are returned in place of the missing data. The syntax for this would be:

-- SQL:89 Left Outer Join Syntax

SELECT

*

FROM Suppliers S, Products P

WHERE S.SupplierID *= P.ProductID

AND P.Units > 2

-- SQL:92 Left Outer Join Syntax

SELECT

*

FROM Suppliers as S LEFT JOIN Products as P ON S.SupplierID = P.SupplierID

WHERE P.Units > 2

SupplierID

CmpyName

Country

ProdID

ProdName

SupplierID

Units

1

ABC Widget

USA

1

WidgOne

1

4

1

ABC Widget

USA

2

WidgTwo

1

6

2

Sprocket 4U

UK

3

Spockit

2

3

3

Ima Cmpy

CA

NULL

NULL

NULL

NULL

A Right Outer Join is the converse of the Left Outer Join except that the table on the Right is the driver. Two other types of joins can be made, although you don't see them too often. The first is a Full Outer Join in which all rows from both tables are returned, and Cross Joins (also called a Cartesian Join) in which every possible combination of both tables is returned. If you forget to add the WHERE clause and you've specified more than one table, you are running a Cross Join.

Select

* from Suppliers, Products

Our example would return 12 rows (3 rows in Suppliers x 4 rows in Products = 12 combinations of the two), but if we were using a more realistic scenario involving hundreds of suppliers and thousands of products would result in hundreds of thousands of combinations.

SQL Server supports both types of syntax, but not all databases do. In order to ensure portability of your code, you may find that the older syntax easier to work with. In the ANSI SQL 92 standard, the join criteria is recommended to be part of the FROM clause in order to avoid confusion with the filter criteria of the WHERE clause.

Union

A union is the combination into one record set the results of more than one query. The only requirement is that all the queries have the same set of columns being returned and in the same order. For example, if you wanted to return the names of tables and the number of rows in each we would use the following syntax:

select

'Categories' as TAB, count(*) as CNT from Categories UNION

select

'CustomerCustomerDemo', count(*) from CustomerCustomerDemo UNION

select

'CustomerDemographics', count(*) from CustomerDemographics UNION

select

'Customers', count(*) from Customers UNION

select

'Employees', count(*) from Employees UNION

select

'EmployeeTerritories', count(*) from EmployeeTerritories UNION

select

'Order Details', count(*) from Order Details UNION

select

'Orders', count(*) from Orders UNION

select

'Products', count(*) from Products UNION

select

'Region', count(*) from Region UNION

select

'ReplenishWork', count(*) from ReplenishWork UNION

select

'Shippers', count(*) from Shippers UNION

select

'Suppliers', count(*) from Suppliers UNION

select

'Territories', count(*) from Territories

The column header names are taken from the first resultset that the query engine sees, in this case TAB and CNT.

Insert

We use the INSERT statement to add new information to the database. The standard SQL statement for inserting a new row is:

INSERT INTO (, , ...) values (, , ...)

 

You can omit the column list only if you have a value for every column in the row. Adding a new row to the Supplier table would look like this:

INSERT INTO Suppliers (SupplierID, CmpyName, Country)

Values (3, 'Isa Cmpy', 'CA')

SQL Server provides us with another form of the INSERT statement in which we can use an existing table as the source of the new records. In this case we use the syntax:

INSERT INTO (, , ...)

SELECT

, , ... FROM WHERE ...

For example, supposing we were implementing a reporting database (also called a data warehouse) in which we wanted to perform aggregation and summarization on our vendors. We could define a new table to contain our calculations and provide some level of isolation from the transactional system. Our new table might look like this:

ReplenishWork

SupplierID

CmpyName

ProductsOutOfStock

We can run a SQL batch to load the table using the second form of the INSERT statement as follows:

INSERT INTO ReplenishWork (SupplierID, CmpyName, ProductsOutOfStock)

SELECT

P.SupplierID, S.CmpyName, Count(*)

FROM Suppliers as S INNER JOIN Products as P ON S.SupplierID = P.SupplierID

WHERE P.Units < 2

GROUP BY P.SupplierID, S.CmpyName

Update

We manage existing information in the database using the UPDATE statement. SQL Server allows for the basic update statement that looks like this:

UPDATE SET = , =
WHERE ...

 

But it also includes a variation in which you can update one table using another table as your source. This syntax is an extension Microsoft includes to the ANSI standard that won't necessarily run in other environments.

UPDATE ReplenishWork SET ProductsOutOfStock = P.UnitsInStock

FROM Products as P

WHERE P.SupplierID = ReplenishWork.SupplierID

Delete

Purging the database of unnecessary information, or to enforce relational integrity, we use the DELETE function. The basic syntax is:

DELETE FROM

 

WHERE ...

 

The WHERE clause determines which rows will be affected. A word of caution when you are deleting data, it takes a long time to type it back in so you want to be sure you check what you are getting rid of. You can do this by running a count of the rows that will be affected (select

count(*) from

WHERE…) before executing the delete. You can further protect yourself from unwanted aggravation by using Transactions.

 

Variables

Variables are used to temporarily hold values that may be useful in the processing of statements. We declare them using the DECLARE keyword, specifying the name and the type, and we assign a value to them with the SET statement. You can also assign a value as part of a SELECT

statement, but using SET is listed as the preferred method.

We reference them as we would column data or constants in the processing logic of our code using the @ character in front of the name to denote a variable. Global variables reference the status of the system and the database engine and other options of the server and are referenced with two @@ characters.

All user declared variables are local in scope, meaning they remain accessible until the end of the current batch of statements is complete. For example executing the following script would result in an error because the variable @MyVar is not declared within the batch in which it is referenced.

DECLARE @MyVar varchar(15)

SET @MyVar = 'New Value'

Go

PRINT 'The value of MyVar is ' + @MyVar

Batches

Batches are a grouping of one or more SQL statements that together form a logical unit of work. They are sent together to the SQL Server engine, which creates an execution plan. If errors are generated in creating the plan, none of the statements are actually executed. Errors that occur after the execution plan has been compiled will cause succeeding statements to fail, but will not affect previous logic in the batch. This is true unless the batch is part of a transaction in which all of the statements are rolled back after the error.

For example, the following Transact SQL Code is comprised of 3 statements, which are all part of the same batch:

DECLARE @Discount float

SET @Discount = 0.05

SELECT

ProductID, UnitPrice, UnitPrice - @Discount * UnitPrice as SalePrice

FROM Products

ORDER BY SalePrice Desc

GO

Because the @Discount variable is used in the successive commands, if you try to run either of the last two statements separately you will raise an error condition.

Creating objects such as views, defaults, procedures, rules and triggers cannot be combined with other statements in a batch. A batch is completed by the inclusion of the command GO. An important point to remember about GO is that it is not a Transact SQL statement, rather it is recognized by Query Analyzer and ISQL as a marker to send all statements from the previous GO to the engine and await the result set.

You can see the effect of this when the results of statements before the GO command are visible in the results pane of Query Analyzer before the next batch of statements executes. If you include GO in an ODBC call or from one of the database libraries such as ADO or OLE DB, the SQL Server engine will flag a syntax error. Another point is that local variables declared in a batch go out of scope when the batch end, which is one reason why you cannot include GO in stored procedure declarations.

Transactions

A transaction is a grouping of statements that must be either succeed or fail together as a logical unit of work. You may have heard of the ACID (Atomicity, Consistency, Isolation and Durability) test for a transaction. These are the four properties which a valid transaction must meet.

  • Atomicity means that all the statements part of the group must either succeed or fail together. For example, if you were developing a banking application to add new accounts, the transaction of adding rows to the customer table and the account table might for a logical unit of work in which our data model integrity constraints require that records must exist in both tables to be valid.
  • Consistency means that the database is left in a stable state at the end of the transaction, and that all data integrity constraints are met.
  • Isolation defines that the data is modified either before or after other transactions against the same rowset. Basically we cannot alter the data that is at an intermediate state. The locking rules of the database prevent this from happening.
  • Durability means that the effects of the transaction are persisted after the end of the transaction, even in the event of a system failure. A system failure in the middle of a transaction would roll back the state of the database to where it was before the transaction began.

We manage transactions with the command BEGIN TRANSACTION, and end it with the command COMMIT or ROLLBACK. When beginning a transaction, locks on the database prevent other user sessions from altering the data.

For example, suppose we want to implement a sale price on our Products at Northwind.

BEGIN TRAN

DECLARE @Discount float

SET @Discount = 0.05

UPDATE Products

SET UnitPrice = UnitPrice - (@Discount * UnitPrice)

Go

-- Check the results...

Select

ProductID, UnitPrice

FROM Products

ORDER BY UnitPrice Desc

If we don't like what we see, we can roll back the transaction and leave the database in its previous state

ROLLBACK

GO

SELECT

ProductID, UnitPrice -- See the original prices!

FROM Products

ORDER BY UnitPrice Desc

GO

Conversely to make the changes permanent we use the COMMIT operator:

COMMIT

Go

Data Definition Language (DDL)

Data Definition Language (DDL) is used to define tables, views, stored procedures, and other objects that physically and logically express the data model you are working with. Data modeling and the Data Definition Language is a big topic that takes up more space than we've been allotted here. There are some very good books that focus just on this topic including Louis Davidson's Data Modeling Book ISBN: ???

The core of relational databases are the tables that store information. We can create tables using the Graphical tools provided with SQL Server, but this practice lends itself better to development of unique, custom objects that are still in the process of evolving. Deploying a schema out to multiple servers in a consistent basis typically relies on scripts to complete the tasks involved. The Enterprise manager allows us to generate these scripts, but we still should understand what the scripts are doing in case we need to customize or change the script.

Managing Tables

Our first task in making our data model usable is to create base set of tables. We do this using the CREATE TABLE statement which specifies how information is stored in the table. Each column definition includes a name, data type, whether the column allows null values to be stored or not, and other optional attributes. The basic syntax diagram for the CREATE TABLE is below:

CREATE TABLE

( )

 

A basic simple example of the ReplenishWork table can be created with the following statement:

CREATE TABLE ReplenishWork

(SupplierID int NOT NULL,

CmpyName varchar(40) NOT NULL,

ProductsOutOfStock int NULL)

Business changes drive database changes and we don't always have the luxury of recreating our schema in a production environment. Transact SQL allows us to expand column sizes and add new columns to existing tables using the ALTER statement:

ALTER TABLE ReplenishWork

ADD ContactName varchar(20)

We can also change the size and type of columns, drop columns, manage constraints and default values using the ALTER statement. If we didn't need to preserve existing data and were making changes to the schema that prohibited the use of the ALTER command we can DROP the table before recreating it. If we needed to reconfigure our ReplenishWork table to support integration of legacy data that used a character based supplier id we could do so with the following batch:

DROP TABLE ReplenishWork

GO

CREATE TABLE ReplenishWork

(SupplierID varchar(16) NOT NULL,

CmpyName varchar(40) NOT NULL,

ProductsOutOfStock int NULL,

ContactName varchar(20) NULL)

Indexes and Keys

Databases manage large amounts of information. In order to be efficient, they use data structures to map the contents called Indexes. In SQL Server the indexes come in two flavors: Clustered and Non-Clustered. The difference is that there can be only one Clustered index on a table, and this index will determine the ordering of how the data is physically stored. SQL Server supports up to 249 non-clustered indexes for any given table. To add an index to our ReplenishWork table we could use the following command

CREATE TABLE ReplenishWork

(SupplierID int PRIMARY KEY CLUSTERED,

CmpyName varchar(40) NOT NULL,

ProductsOutOfStock int NULL)

By default SQL Server will use clustering on the primary key defined for a table. This is important because without at least one clustered index on a large table, all the data would be scattered throughout the physical storage, and performance on retrieval of data may suffer. You can also create indexes on existing tables using the CREATE command:

CREATE INDEX IDX_ReplenishWork

ON ReplenishWork (CmpyName)

On the other hand unnecessary indexes can impact performance as well. Think of an index as a physical data structure that has a cost associated with maintaining it when data changes. If the nature of your system involves capturing transactional information as quickly as possible, you want to run with the least number of indexes. Each insert, update and delete must be reflected in each of the indexes on the table, so the fewer the indexes the less the overhead of those operations.

Systems which are primarily decision support and reporting based can afford to have more indexes because once the data has been written it isn't updated as much as a transactional system. You will tend to see more generous use of indexes in Data Warehousing systems that support a lot of Ad-Hoc querying.

Create Views

Views allow us to provide an abstraction layer between the physical implementation of the data and how users see the information. They are in essence virtual tables that we can use to limit or expand how we are able to use the base set of tables in our database. In SQL Server we are limited to a single SQL Select

statement to define the result set. The security architecture of SQL Server allows us to grant users access to a view without giving access to tables (and views) on which the view is based. For example, if we needed to make an employee phone list available to users but needed to protect other employee information (such as salary, social security number, birth date, etc) we could create a view of just the name & phone.

CREATE VIEW PhoneList AS

SELECT

FirstName + ' ' + LastName as EmpName, HomePhone

FROM Employees

go

select

* from PhoneList

The ALTER and DROP commands operate in the same way they do with tables. One interesting feature of SQL Server is that it supports indexed and updateable views. This means that you can define a view on some base objects, create indexes to optimize access and even alter data stored in the underlying tables, although you need to be aware that there is more overhead in updating a view than if you managed the data in the underlying tables directly.

Schema Binding

In order to create an index on a view, the view must be created with the SCHEMABINDING option. This option saves information about the underlying schema is stored in the system tables and SQL Server will prevent changes that would invalidate our View.

CREATE VIEW PhoneList

WITH SCHEMABINDING
AS

SELECT

FirstName + ' ' + LastName as EmpName, HomePhone

FROM Employees

go

Declarative Referential Integrity (DRI)

The business rules, which define the how our application behaves, such as that we will allow users of our application to setup a customer before they make an order, are implemented using foreign keys and constraints.

Constraints

ALTER TABLE dbo.Table1 ADD CONSTRAINT

PK_Customer PRIMARY KEY CLUSTERED

(

CustID

) ON [PRIMARY]

Default values and extended properties

We can further define default values and extended properties for the tables we create using the:

ALTER TABLE dbo.Table1 ADD CONSTRAINT

DF_Table1_CreditLimit DEFAULT 0 FOR CreditLimit

Data Control Language (DCL)

Data Control Language (DCL) allows the owner and administrator of the database objects to define what rights users have to the data and objects in the database. The DBA can manage access to the database by granting and denying users rights to objects. The term CRUD Matrix refers to the rights to Create, Read, Update, and/or Delete information in a given table.

  • GRANT
    Grant
    is used to give access to someone who needs to use the information stored in the objects. SQL Server allows us to specify user based or group based rights, and to apply them to objects or to groups of objects within a database.
  • REVOKE
    We use the REVOKE statement to reverse the effect of the GRANT statement.
  • DENY
    The DENY statement will prevent a user from accessing objects in the database.

Improvements over ANSI SQL

With the release of SQL Server 2000 Microsoft takes the SQL language a step further. Among the many additions in this release, we find new data types, user defined functions, and extensive support for XML, as well as indexing, trigger handling, and an expanded suite of system stored procedures. The result is one of the most flexible, scalable, and complete offerings in the database management system product field.

Data Types

While all relational databases implement the basic numeric, string, and date data-types for storing data, Microsoft has added some which you may not have heard of. Some of these were added in support of the Replication functions built into the database engine, others fit more neatly with the new feature of User Defined Functions. The new data types include:

  • BIGINT:
    True to its name, this is a very big whole number. You can use it to store values in the range of -9223372036854775808 to 9223372036854775807. This is 19 digits of precision that can hold some extremely large values. If you are porting data from platforms that include intelligent keys (like the first 4 digits is the location, the next 6 is customer, the next 3 is the weight, etc.) it would probably be better to normalize your schema and transform the data into multiple columns.
  • UniqueIdentifier:
    Columns of this type store system generated identifiers known as GUID's or Globally Unique Identifier. The GUID is a 38 character value generated by windows that is guaranteed to be unique across the world. It is takes the same form as the CLSID in the COM world. The NEWID function is used to generate it, and we can store them in the column type UniqueIdentifier.

DECLARE @guid uniqueidentifier

select

@guid = NEWID()

PRINT 'Value of @guid is: '+ CONVERT(varchar(50), @guid)

Value of @guid is: F4840C7B-F44C-4A49-A12E-097B45AE3CE0

While you may not want to use a GUID as a primary key because of its cumbersome nature for ad-hoc queries (try correctly typing in the 36 character value), it does allow us to implement n-way replication in which data is created on multiple servers. For example, if we were going to deploy the Northwind solution across the country and setup a merge replication with a corporate site to contain all orders, using a sequentially generated OrderID could create problems in that the same ID could be created on more than one server. By redefining it as:

CREATE TABLE Orders (

OrderID uniqueidentifier NOT NULL default NEWID(),

CustomerID nchar (5) NULL ,

EmployeeID int NULL ,

OrderDate datetime NULL ,

RequiredDate datetime NULL ,

ShippedDate datetime NULL ,

ShipVia int NULL ,

Freight money NULL DEFAULT (0),

ShipName nvarchar (40) NULL ,

ShipAddress nvarchar (60) NULL ,

ShipCity nvarchar (15) NULL ,

ShipRegion nvarchar (15) NULL ,

ShipPostalCode nvarchar (10) NULL ,

ShipCountry nvarchar (15) NULL

)

While this makes the replication scheme possible, you should define a clustered primary key on some other columnset. Ordering the data according to a randomly generated value will result in poor performance when processing the data because the data is scattered all over the disk in no order means there and there is increased seek time.

SQL_Variant:
Similar to the variant data type in Visual Basic, SQL_Variant is used to store data values of all the supported data types not including text, ntext, image, and timestamp. We can use it for column definitions, parameters, variables, and for the return value of user defined functions. Microsoft provides the function SQL_VARIANT_PROPERTY to return the base data type and other information about data stored.

One use for variants is in defining schema to hold attribute data. Suppose you need to keep a variable number of optional attributes about a person, such as hair color, weight, age, number of children, birthdate, favorite pet, etc. In our example we are collecting what information we can, but because there are a lot of optional attributes, we end up with a sparsely populated table.

PersonID
(int)

HairColor
(varchar 8)

Weight
(int)

Age
(int)

Sex
(char 1)

Birthdate
(datetime)

Pet
(varchar6)

1

Brown

Cat

2

250

32

M

Dog

3

1/2/62

Bird

Using SQL Variant we could more efficiently store the same set of information using a schema that looks like this:

PersonID
(int)

AttributeName
(varchar 12)

AttributeValue
SQL_Variant

1

HairColor

Brown

1

Pet

Cat

2

Weight

250

2

Age

32

2

Sex

M

2

Pet

Dog

3

Birthdate

1/2/62

3

Pet

Bird

Table:

True to its name, the table data type is a variable that temporarily stores a rowset in memory for later processing that can be used like an ordinary table. It accomplishes this using space in the TEMPDB, but because it has a well defined scope, it efficiently cleans up after itself. It also requires less locking and logging than tables created in the TEMPDB, which results in fewer recompilations of the execution plan. The Table datatype was added in support of a powerful new feature of SQL Server, the User Defined Function.

You declare a table variable using the DECLARE statement. You can then manage the content of that table variable, inserting, updating and deleting as needed. Since the scope of variables is local to the current batch, other processes cannot access your table variable.

declare @t table(PersonID int, AttrNm varchar(12), AttrVal sql_variant)

insert @t values(1, 'HairColor', 'Brown')

insert @t values(1, 'Pet', 'Cat')

insert @t values(2, 'Weight', 250)

insert @t values(2, 'Age', 32)

insert @t values(2, 'Sex', 'M')

insert @t values(2, 'Pet', 'Dog')

insert @t values(3, 'Birthdate', GetDate())

insert @t values(3, 'Pet', 'Bird')

Select

convert(varchar(15),SQL_VARIANT_PROPERTY(AttrVal, 'BaseType')), * from @t

User Defined Functions

Structured programming languages have long provided the ability to create procedures and function calls to breakdown complex routines into manageable units. The introduction of the User Defined Function (UDF) with SQL Server 2000 provides us with a new way of organizing and optimizing our use of the database engine. A User Defined Function takes zero or more input parameters and returns either a scalar or a table based value. We could use a UDF to calculate the distance between two geographic locations, return a list of projects for a given client, or return an amortization table for a given set of loan parameters.

UDF's are similar to stored procedures in that they both use the control statements and data manipulation code to define the logic of the module. But they differ in that UDF's don't allow modification of the global state of the calling process or the database environment. In other words User Defined Functions are stateless in that there can be no side affects. You can't alter data in tables that are not locally defined within the UDF, and you can't use non-deterministic functions within your code. Deterministic functions always return the same value given the same input parameters, such as the DATEADD function. The GETDATE function is non-deterministic in that the value depends on when you call it.

SQL Server 2000 supports 3 types of User Defined Functions: Scalar Functions, Inline Table Valued Functions, and Multiple Statement Table Valued Functions. Scalar Functions are UDF's that return a single discrete value and contain logic that is embedded between begin and end statements. This flavor of UDF must return a single value each time it is called. We create a UDF using the DECLARE function:

CREATE FUNCTION udfSalePrice (@ProductID int, @TranDate datetime)

RETURNS varchar(50)

BEGIN

declare @UnitPrice money, @ProductName varchar(15)

select

@UnitPrice = UnitPrice, @ProductName = ProductName

from Northwind..Products

where ProductID = @ProductID

if @TranDate > '28-NOV-2003 08:00' and

@TranDate < '28-NOV-2003 18:00'

return @ProductName + ' on Sale for $' +

convert(varchar(10),@UnitPrice * .85)

return @ProductName + ' for $' + convert(varchar(10),@UnitPrice)

END

Go

select

Northwind.dbo.udfSalePrice(3, GetDate()) as Price

Table Valued Functions return a table variable that can be comprised from one or more statements. If it is a single table and doesn't specify the column set in the declaration, then the statement doesn't require the begin/end wrapper, and it is considered to be Inline.

create function udfRegionOrders (@Region varchar(15))

returns TABLE

as

return (select

* from Northwind..Orders where ShipRegion = @Region)

go

select

* from northwind..udfRegionOrders ('RJ')

Multi-statement UDF's allow us to do more complex processing in order to derive the result set. Like scalar UDF's we can declare local variables, use flow control statements, manage data in locally defined table variables and call stateless extended stored procedures.

Using our Northwind database for this example, suppose we wanted to create a function that returned sales ranking information.

CREATE FUNCTION udfTopSales (@nCount int)

RETURNS @retTable TABLE

(Rank int, ProductID int, SaleAmt money)

AS

BEGIN

DECLARE @ProdID int, @Sales money, @Rank int

SET @Rank = 1

DECLARE curSales CURSOR FOR

SELECT productID, Sum(UnitPrice * Quantity - Discount) as SaleAmt

FROM [Order Details]

GROUP BY ProductID

ORDER BY SaleAmt DESC

OPEN curSales

FETCH FROM curSales into @ProdID, @Sales

WHILE @@FETCH_STATUS = 0 BEGIN

IF @Rank <= @nCount

INSERT INTO @retTable

values (@Rank, @ProdID, @Sales)

FETCH NEXT FROM curSales into @ProdID, @Sales

SET @Rank = @Rank + 1

END

CLOSE curSales

DEALLOCATE curSales

RETURN

END

go

select * from dbo.udfTopSales (10)

go

Stored Procedures

While we can run ad-hoc SQL Statements within Query Analyzer, and we have the ability to use the execute() function in ADO, frequently performed logic can be encapsulated into batches and saved as Stored Procedures on the server. The benefits from this approach not only include the reduction of work of testing multiple programs and applications to ensure that the logic is correct, but also from the execution plan generated on the server when the stored procedure is compiled and saved.

Stored Procedures are a collection of Transact SQL statements which perform a defined set of work. This can be as simple as updating a single row or returning a configuration setting, to being as complex as implementing a business rule that requires sending an e-mail alert containing crucial information specific to the problem. They can take both input and output parameters, and always return an integer value. There are several types of stored procedures including those that come with SQL Server, user defined and extended stored procedures.

System Stored Procedures.

Microsoft includes a broad suite of system stored procedures for administering the server engine. There are over 900 system stored procedures that SQL Books breaks into 17 categories. They include procedures for managing system security, to setup and manage distributed queries and linked servers (making tables and data on different machines visible to each other), data replication, retrieve information about schema, view current performance and usage information, manage SQL Agent (the scheduler that comes with SQL Server), and to interface with XML data.

For example, to return the schema of a table we can use the sp_help stored procedure:

EXEC sp_help Employees

If the stored procedure call is the first command in the batch we can omit the EXEC(UTE) command. To see the definition of a view or a stored procedure or other system objects we use sp_helptext:

sp_helptext PriceLookUp

User defined Stored Procedures.

We can define our own stored procedures to provide users of the database with a consistent implementation of work logic across applications. This simplifies the administration of business rules by acting as a single point of control on the database server itself. If the rule is changed, we only need to update the associated stored procedures instead of the various applications making use of the data.

Stored procedures are declared with the CREATE PROCEDURE statement.

CREATE PROCEDURE spSimple as

select

'HELLO WORLD!'

go

For more information on developing stored procedures see chapter XX

Extended stored procedures (XP's).

Extended stored procedures allow us to extend Transact SQL develop logic outside of the SQL Server using tools such as C or Visual Basic. They follow the same calling standards as a normal stored procedure, complete with parameters and a return code. XP's are implemented using a call to dynamic link libraries (dll) that conform to the Extended Stored Procedure API. you need to be careful that they are thoroughly tested and checked for aberrant behavior before using them in a production situation because these calls are external to SQL Server.

If we had written a standard event management system to log information to the NT Event Log and wanted to include it in our SQL environment we call the stored procedure sp_addextendedproc:

USE Master

go

EXEC sp_AddExtendedProc xp_LogEvent 'c:\winnt\system32\EventLogger.dll'

Where we use T-SQL

The Transact SQL Language is the primary interface to the SQL Server database engine. Any time we interact with SQL Server, whether it is through one of the provided administrative tools, a custom application, or through the command line interface ISQL, we are executing SQL statements. The vehicle for this might be ODBC or one of the high level drivers, or it might be with DB-Library, the original interface provided for talking to the database. In any case, requests in the form of SQL statements are made to the engine, which then processes them and returns the result set.

Query Analyzer

Query Analyzer is the current ad-hoc user interface that comes with SQL Server. It is installed as part of the client tools and is typically one of the first places a developer will begin working with SQL. By default it opens to an edit screen in which we can type our code. It provides context based coloring which is a nice enhancement from the previous version.

To run a batch of statements, you can click on the Execute button and results pane will be display the output of the selected batch. By default the entire script will be run, but you can limit it to portions of the script by selecting what you want to execute and then only the selected statements will be run.