Introduction to Transaction SQL
This is intended to be a complete reference of
best practices and technique of T-SQL and its many uses, and will give you a
flavor for where it came from and how it can be used. We will begin by covering
the basics of the language. If you're looking for a particular topic you might
want to skip to the relevant chapter, but to really make use of took you need a
solid foundation to build from.
We will cover a lot of topics and answer a number of
questions about SQL Server 2000 and Transact SQL in the next few pages. We will
begin with some history and build into some practical applications of the
language. Specifically in this chapter we will cover topics including:
-
What is it and where it came from
-
When and why you will use it
-
How to use some of the specific features that make
this language so unique and powerful
-
What you can do with it
Since the advent of the microprocessor and the
widespread use of the PC in the early 1980's, affordable technology has finally
caught up with the theory of relational databases. Introduced in 1970 by
E.F. Codd as a way to manage large amounts of information
by using a language based on what he called "Relational Calculus", the
relational database has become central to our civilization. Information about
our daily lives, what we buy, where we shop, the banks we use, the cars we
drive, all the information we need to be able to transact business is kept in
relational databases.
Fortunately someone had the good idea to standardize
how we interact with these systems, and the Structured Query Language was born.
Originally called SEQUEL (Structured English QUEry Language),it was developed by
Microsoft research in support of an experimental relational database called
System R that eventually morphed into DB2. It defines the access paths, not the
physical implementation, for managing the information and data structures. Since
the 1980's the American National Standards Institute (ANSI) has developed
criteria for certifying database engines, like SQL-86, SQL-92 and SQL-99 are
instances of those criteria that have been released. The current specification
is over 2000 pages in length and is available at a cost from ANSI’s web site (www.ansi.org) or from the International Standards Organization ISO
(www.iso.ch). Software manufacturers use these standard to drive the feature set
they include to ensure that their product is certifiably ANSI compliant.
The Transact SQL Language is a superset of the ANSI
specifications, meaning that it not only implements the required functionality,
but it provides much more. It includes a number of programming extensions
including variable handling, flow control, looping, and error handling, that
allow the developer to write sophisticated logic to be executed on the server.
In addition, the SQL language includes facilities for controlling access to the
data, declaring relational integrity (DRI), and defining business rules to be
enforced at the server level.
Microsoft came to be in the relational database
arena in a circuitous route that started as an offshoot from a joint development
effort with IBM do create the next generation of Operating Systems O/S 2. In
1985 the Intel 80286 processor was
released and promised to provide release from the constraints of the 640KB
memory space and a 8 bit architecture of the 8088 processor, and a new operating
system was needed to compete with the recently released Macintosh. Shortly after
the companies committed to work together, IBM announced they were developing a
special higher end version of O/S 2 that would include a Database Manager that
would be partially compatible with DB/2 on the Mainframe and a network manager
based on the Systems Network Architecture (SNA) protocol. This left Microsoft
holding the bag because who would buy the standard version?
At the time, most of the PC based desktop databases
were single user flat file systems that was dominated by products like dBASE,
Paradox and Rbase. In order to compete with IBM, Microsoft turned to Sybase, who
was a startup company in 1987 that was trying to establish itself in the
database market competing with Informix and Oracle. They had a product called
DataServer that they were developing for the mid-range UNIX based computers. By
convincing Sybase that PC’s would be fertile ground for growing future customers
of their midrange product, Microsoft convinced Sybase to partner with them to
port their product to O/S 2. And to add more fodder to the fire in this web of
alliances and counter alliances, Microsoft went the additional step of
convincing Ashton-Tate to support this new multi-user product and to develop
future versions of dBASE to act as a client to the new Server based product. In
the fall of 1988, the 3 companies announced their new product called
Ashton-Tate/Microsoft SQL Server for O/S 2.
The first beta of the product shipped that fall and version 1.0 was
released in 1989.
By 1990 the relationship with Ashton-Tate wasn’t
working. The new version of their product was late and had other problems as
well. The different paradigm to developing for a multi-user environment with SQL
and set based processing and that for a single user record at a time approach of
dBASE showed the marriage wouldn’t last. In the time since the arrangement was
announced the companies had found that their priorities had shifted, and
Microsoft and Sybase decided to break with Ashton-Tate and released Microsoft
SQL Server 1.1 in 1991.
A significant feature of note is that release 1.1
included support for the new client platform Windows 3.0. SQL Server was one of
the first applications that provided a direct interface to develop applications
against it using dynamic link libraries (DLL) calls. Version 1.11 included some
bug fixes, but was still primarily a Sybase developed and supported product.
Microsoft was dependent on Sybase to fix bugs and support customers with their
issues. Sybase at the time was rapidly growing their mid-range market and found
their priorities shifting more towards that market. Microsoft did eventually get
permission for read-only access to the source code, but all changes had to be
coordinated with Sybase in order that they could keep the various versions in
sync.
Release 1.0 for Microsoft SQL Server corresponded
with the code base for Sybase’s release 3.0 on other platforms, and after a
couple years of adding new features and functionality, Sybase was planning to
release version 4.0. This new version would include support for stored
procedures, unions, backup devices among other things.
As some may remember, Windows 3.0 was a 16 bit
operating system, but the new version of O/S 2 v2.0 would be 32 bit.
Unfortunately IBM was running into problems developing a stable 32 bit product
that they could release in a timely fashion. As a result, version 4.2 of
Microsoft SQL server was released for 16 bit O/S 2 v1.3 in 1992.
At about this time, Microsoft realized that they
needed to break from IBM and build their own 32 bit operating system, which they
called Windows NT. It would include a symmetric multiprocessing environment with
asynchronous I/O. The new operating system would be multi-threaded and would
provide standard facilities for logging and management of the processes.
Microsoft decided to bet the farm on this new platform and moved to make SQL
Server one of the first applications for it when Windows NT was released to the
public in July of 1993. Within 3 months Microsoft SQL Server 4.2 for NT was
shipping to the general public.
By the end of 1993, the relationship between
Microsoft and Sybase had changed as the company’s core focus had evolved from
startup to maturity in the high tech industry. The new Windows NT platform was
proving to be an alternative to the mini-computer UNIX based environment not
only based on performance, but also on ease of use. The last release of SQL
Server 4.2 was the last time that Microsoft and Sybase would share code. Since
1991 Microsoft had been slowly growing their stable of development expertise to
be able stand on their own supporting and evolving the SQL Server product. In
April of 1994, the companies announced the end of their relationship and each
company went on to develop the product line exclusively for their respective
platforms.
By the end of 1994 Microsoft began putting together
plans for the next version of SQL Server. Originally called SQL95 but later
released as Version 6.0 it would include support for scrollable cursors,
database replication, better performance, a greatly expanded library of system
stored procedures, and the inclusion of a better database management tool. For
the first time, this new release would be developed entirely by Microsoft. The
SQL Server team at Microsoft delivered it on time and it was a great success in
the marketplace..
Version 6.5 came out 10 months later and included
support for the Internet, data warehousing and distributed transactions. Another
key event was that it gained certification as ANSI SQL compliant. SQL Server 7.0
brought us full row level locking, a new query processing component that became
Microsoft Data Engine (MSDE) that supports distributed queries and a more
efficient scheme for processing ad-hoc queries. SQL Server 2000 added user
defined functions, new data types, and a host of other new features we will
cover later in this chapter.
The list of enhancements in SQL Server 2000 includes
With each release, SQL Server has expanded its
functionality and improved its performance as an enterprise level database
engine. While there may be some who still try to deny it, Microsoft has
established themselves as a serious player in the DBMS industry and has built a
core competency at taking a good product and making it better.
The Structured Query Language (SQL) is the primary
way of interacting with most relational database management systems. SQL Server
is no exception. While we assume you have a base level of understanding, we will
quickly cover the basics of the language before we get into how SQL Server has
extended it to provide the developer with better ways of using the database. If
you find that this discussion goes too fast, there are several good books
available including Robin Dewson's book 'SQL Server 2000 Programming' and
'Beginning SQL Programming' ISBN
1-861001-80-0 that will give you a more complete introduction to the
language.
SQL is the interface for managing information in
relational databases. With it we tell the database what we want to select,
insert, update and delete. Each of these commands follows a standard syntax and
use various clauses to determine exactly which rows are affected. Taken
together, these form what is referred to as the Data Manipulation Language or
DML.
Data Manipulation Language (DML) provides the
language constructs for selecting information from the tables, adding,
modifying, and deleting rows of data, and generally providing a standardized
access method to the data. Any statement that adds, modifies or views
information stored in the database is considered part of the Data Manipulation
Language. There are 4 basic statements that make up the bulwark of this
category, including SELECT, INSERT, UPDATE, and DELETE.
One of the first uses of a relational database any
developer will have is learning how to retrieve information from the database. A
basic SELECT
statement consists of a series of clauses that
define what information we are interested in, and how we want to view it. These
include a list of columns to include in the result set, the names of the tables
involved, filter criteria that specify how to limit what we are looking for, and
other optional clauses which further refine the results.
Select
, , ...
From ...
Where ...
For example to select
a list of names from the supplier table in the
Northwind database use the following command:
SELECT
SupplierID, CompanyName, ContactName, *
FROM Suppliers
WHERE Country <> 'USA'
ORDER BY Country
This is pretty basic stuff in that it returns a
simple list of suppliers. The SELECT
clause tells the query engine what columns we are
interested in. The * is a wildcard that indicates we want all columns in the
table to follow the ones that we specified. The FROM clause indicates where the
information is stored, and the WHERE clause filters it down to a subset of the
whole table. This is pretty basic stuff that is typical of most relational
databases. The more advanced dbms's give us the ability go quite a bit further
in refining our queries. Suppose we wanted to find out how many suppliers were
from each country. The ANSI SQL language specifies some basic aggregation and
summarization functionality that will run unchanged on most databases:
SELECT
Country, COUNT(*) as 'COUNT'
FROM Suppliers
GROUP BY Country
ORDER BY 'COUNT' DESC
We've added a GROUP BY clause which tells the
database how we want to aggregate our results, and an ORDER BY clause to sort it
by the countries where we find the most suppliers. Transact SQL provides 14
aggregate, 23 string, 9 date and time, and 23 mathematical functions out of the
box in addition to several other system and state functions that we can use in
developing our code. In addition to the many built in functions, Transact SQL
also provides a number of operators that are worth mention.
Eliminating duplicate records from our rowset can be
done within our query by specifying the DISTINCT keyword in our SELECT
clause. It will cause the query engine to return
only unique rows from our query.
SELECT
DISTINCT Country
FROM Suppliers
The TOP operator is Microsoft extension in Transact
SQL that allows us to limit the number of rows we want in our result set. After
the query has been processed, the result set is limited to the number of rows
specified. We can either use a hard number like 10, or include the PERCENT
keyword to indicate that we want to see only a percentage of the full results
set.
SELECT
TOP 10 ProductID, Sum(UnitPrice * Quantity) as
Sales
FROM [Order Details]
GROUP BY ProductID
ORDER BY Sales
Trying to accomplish the same results on other
database platforms requires much more work and creativity on the part of the
developer!
A subquery is a complete select
statement that returns a single value and is nested
within another SQL statement in place of an expression. We can use the subquery
anywhere that we can use an expression, whether it is part of the SELECT
clause or WHERE clause. We can use the subquery to
perform lookups on related tables, or to evaluate a condition. We might need to
select the orphaned rows which are those
records from the child table which have no parent record. In our Northwind
example we need to reduce our supplier count and eliminate suppliers who have no
product sales. One way to return the list of suppliers with no orders using
subqueries is:
SELECT
SupplierID, CompanyName FROM Suppliers S
WHERE NOT EXISTS
(SELECT
'x' FROM [Order Details] D, Products P
WHERE D.ProductID = P.ProductID
AND
S.SupplierID = P.SupplierID)
One of the basic operations performed in the
database is to generate human-readable output. Because the process of
normalization will break apart things like addresses and names into multiple
parts that need to be reassembled, string concatenation is an important function
to understand. SQL Server's approach is to use the + operator between character
based values to concatenate. For example, if we wanted to do a report in which
we wanted to return a single column with our Employee's name we could use the
following:
SELECT
FirstName + ' ' + LastName FROM Employees
We could, for example, write a SQL Script where the
output is another SQL Script. This generates a script which counts the user
tables in the current database.
SELECT
'SELECT
''' + name + ''', count(*) from [' + name +
']'
FROM sysobjects
WHERE type = 'U'
ORDER BY name
One note to developers is that this syntax is
specific to SQL Server and doesn't work on other databases such as Oracle (use
two pipe characters || to concatenate two strings). Also you may notice the
square brackets ('[' and ']') in the code. These are necessary when the database
schema contains non-standard schema. In the Northwind database the Order Detail
table's name has a space in between Order and Detail. In order to reference
non-standard names we simply surround it with the brackets.
A quick note on comments. Just as all good code is
self-documenting, we may find at times that it is helpful to others who read our
code to find comments from the author explaining what is going on. Transact SQL
recognizes two forms of comments: Single-line and multi-line.
The -- operator indicates that from that point on
the line is a comment text and should be ignored by the query engine. In Query
Analyzer, the commented text is colored as a comment (green by default).
SELECT
GetDate() -- This is a comment from here on, I
can type whatever I want
Multi-line comments follow the C++ syntax of /*
comment text */, where the comment text can cover multiple lines of text.
/* This is a multi-line comment, none of the
statements are ever processed
DECLARE @myDate DATETIME
SET @myDate = GetDate()
PRINT 'The current date is ' &
convert(varchar(18), @MyDate, 110)
*/
These functions allow you to convert from one data
type to another. For example if you have ever worked on a cross platform
database transformation project in which you are importing data from one DBMS to
another, you may have discovered that the date formats recognized by the
different vendors is not the same. In Oracle a valid date field has the format
’16 MAR 2004’ while in SQL Server the default format is ‘2004-03-16’. If you are
exporting data to a comma separated value file to be loaded into Oracle, you
need to use the convert function as:
Select
convert(varchar(12), mydate, 113)
The first argument is the target datatype, the
second is the source value, and the third is an optional style value. The
convert statement is also useful if you are transforming data from oen schema to
another and need to concatenate columns but need to restrict the resulting size.
Select
convert(varchar(35), RTRIM(FNAME) + ' ' +
RTRIM LNAME))
Another use is if you are working with text or
binary data that is larger than the 8000 character limit of the varchar
datatype. SQL Server doesn’t allow these types of columns to be used for
comparison or sorting unless you use the LIKE operator. If you had a table with
a column called CODE_TAG that was defined as TEXT, and you wanted to select
all records where it equals ‘BO’ you would have to use the following syntax:
select
* from codes where convert(varchar(2),
CODE_TAG) = ‘BO’
The CASE expression allows you to evaluate a
variable or a condition inline and return one of multiple values as the SQL
statement is being run. The basic version takes a source variable or column and
depending on the value returns one of a discrete set of values.
Select
case datepart(month, GetDate())
When 1 then 'January is nice'
When 2 then 'February is cold'
When 3 then 'Spring is coming'
Else 'Its not winter anymore'
End
You can also evaluate Boolean conditions. These are
evaluated in order until the conditions are satisfied and the resulting value is
returned.
Select
case
when datepart(month, GetDate()) > 9
then 'Winter is here'
when datepart(hour, GetDate()) > 22
then 'Its getting late'
when convert(varchar(12), getdate(), 113) =
'28 NOV 2003'
then 'Everythings on
SALE!'
when datepart(weekday, GetDate()) = 6
then 'TGIF!'
Else 'Who knows???'
Case can be used anywhere an expression can,
including defining views that dynamically transform data based on current
conditions. For example we could use this technique to implement a 10 hour sale
on all products for the day after Thanksgiving in the Northwind database by
creating a view called PriceLookUp as follows:
CREATE VIEW
PriceLookUp AS
Select
ProductID, ProductName, CategoryID, UnitPrice,
Case when getdate() > '28-NOV-2003 08:00' and
getdate() < '28-NOV-2003 18:00'
then UnitPrice * .85
else UnitPrice
end as SalePrice
from Northwind..Products
go
select
GetDate(), UnitPrice, SalePrice, * from
PriceLookUp
We could have written a batch and scheduled it to
run at the appropriate times to update the sale price, but that would’ve added a
dependency on the scheduler, the amount of time it would take to process all the
rows (mature retailer product tables likely have hundreds of thousands of rows
in them).
Microsoft took the liberty of providing us with
commonly used functions to make our lives easier. These are cousins of the CASE
expression and could be coded in that way. The Coalesce function returns the
first non-null expression in a list, otherwise it returns null.
Coalesce (expression1, expression2, ...)
-- this is essentially the same as:
case when expression1 is not null expression1
when expression2 is not null expression2
etc...
For example if we wanted to return the last date
that an order status was changed we could use the following syntax:
select
coalesce (OrderDate, RequiredDate,
ShippedDate) as LastUpdate
from orders
NullIf and IsNull operate similarly, the former
returning null if a condition is true, and the second an alternate value if the
expression is null.
NullIf(expression1, expression2)
-- NullIf is evaluated as
case when expression1 = expression2 then Null
else expression1
IsNull(expression1, expression2)
-- This is evaluated as
case when expression1 is null then expression 2
else expression1
-- NullIf is evaluated as
case when expression1 = expression2 then Null
else expression1
SQL Books has a complete listing of the system
functions and provides you with syntax diagrams for how to use them.
As the name "Relational" implies, information in
well-designed databases is stored in more than a single table, and much of our
work is to develop means for accessing it effectively. When more than one table
is referenced in a query, they are said to be joined. A join is specific
criteria which define how two or more tables are related. Joins come in two
flavors, inner and outer. The basic difference is that inner joins will only
return rows in which both tables have columns matching the join criteria.
Suppliers
Supplier ID primary key
CmpyName info col
Country info col
|
Products
ProdID primary key
ProdName info column
SupplierID foreign key
Units info column
|
An inner join is defined as the union of the set of
records in which the SupplierID column in the Suppliers table has records which
match Products SupplierID column. Records must exist in both tables in order for
it to be part of the resulting set of rows returned. The join criteria can be
specified as part of the FROM clause or as part of the WHERE. The two following
statements return identical results:
|
|
|
|
|
|
SupplierID
|
CmpyName
|
Country
|
1
|
ABC Widget
|
USA
|
2
|
Sprocket 4U
|
UK
|
3
|
Isa
Cmpy
|
CA
|
|
|
ProdID
|
ProdName
|
SupplierID
|
Units
|
1
|
WidgOne
|
1
|
4
|
2
|
WidgTwo
|
1
|
6
|
3
|
Spockit
|
2
|
3
|
4
|
WidgThree
|
1
|
1
|
|
|
|
|
|
|
|
-- SQL:89 Inner Join Syntax
SELECT
*
FROM Suppliers S, Products P
WHERE S.SupplierID = P.SupplierID
AND P.Units > 2
-- SQL:92 Inner Join Syntax
SELECT
*
FROM Suppliers as S JOIN Products as P ON
S.SupplierID = P.SupplierID
WHERE P.Units > 2
Running these statements will result in the
selection of rows from both tables which meet the join criteria and will exclude
rows which do not have a match. In our resultset we won't return Supplier 3 Isa
Cmpy because they don't have any products, and we don't return WidgThree from
the Product table because it's Units are less than the selection criteria.
|
SupplierID
|
CmpyName
|
Country
|
ProdID
|
ProdName
|
SupplierID
|
Units
|
|
1
|
ABC Widget
|
USA
|
1
|
WidgOne
|
1
|
4
|
|
1
|
ABC Widget
|
USA
|
2
|
WidgTwo
|
1
|
6
|
|
2
|
Sprocket 4U
|
UK
|
3
|
Spockit
|
2
|
3
|
A left join (also called a left outer join) would
return rows from the first table and information from the other if it exists. If
it doesn't then nulls are returned in place of the missing data. The syntax for
this would be:
-- SQL:89 Left Outer Join Syntax
SELECT
*
FROM Suppliers S, Products P
WHERE S.SupplierID *= P.ProductID
AND P.Units > 2
-- SQL:92 Left Outer Join Syntax
SELECT
*
FROM Suppliers as S LEFT JOIN Products as P ON
S.SupplierID = P.SupplierID
WHERE P.Units > 2
|
SupplierID
|
CmpyName
|
Country
|
ProdID
|
ProdName
|
SupplierID
|
Units
|
|
1
|
ABC Widget
|
USA
|
1
|
WidgOne
|
1
|
4
|
|
1
|
ABC Widget
|
USA
|
2
|
WidgTwo
|
1
|
6
|
|
2
|
Sprocket 4U
|
UK
|
3
|
Spockit
|
2
|
3
|
|
3
|
Ima
Cmpy
|
CA
|
NULL
|
NULL
|
NULL
|
NULL
|
A Right Outer Join is the converse of the Left Outer
Join except that the table on the Right is the driver. Two other types of joins
can be made, although you don't see them too often. The first is a Full Outer
Join in which all rows from both tables are returned, and Cross Joins (also
called a Cartesian Join) in which every possible combination of both tables is
returned. If you forget to add the WHERE clause and you've specified more than
one table, you are running a Cross Join.
Select
* from Suppliers, Products
Our example would return 12 rows (3 rows in
Suppliers x 4 rows in Products = 12 combinations of the two), but if we were
using a more realistic scenario involving hundreds of suppliers and thousands of
products would result in hundreds of thousands of combinations.
SQL Server supports both types of syntax, but not
all databases do. In order to ensure portability of your code, you may find that
the older syntax easier to work with. In the ANSI SQL 92 standard, the join
criteria is recommended to be part of the FROM clause in order to avoid
confusion with the filter criteria of the WHERE clause.
Union
A union is the combination into one record set the
results of more than one query. The only requirement is that all the queries
have the same set of columns being returned and in the same order. For example,
if you wanted to return the names of tables and the number of rows in each we
would use the following syntax:
select
'Categories' as TAB, count(*) as CNT from Categories UNION
select
'CustomerCustomerDemo', count(*) from
CustomerCustomerDemo UNION
select
'CustomerDemographics', count(*) from
CustomerDemographics UNION
select
'Customers',
count(*) from Customers UNION
select
'Employees',
count(*) from Employees UNION
select
'EmployeeTerritories', count(*) from EmployeeTerritories UNION
select
'Order Details', count(*) from Order Details UNION
select
'Orders', count(*) from Orders UNION
select
'Products', count(*) from Products UNION
select
'Region', count(*) from Region UNION
select
'ReplenishWork', count(*) from ReplenishWork
UNION
select
'Shippers', count(*) from Shippers UNION
select
'Suppliers', count(*) from Suppliers UNION
select
'Territories', count(*) from Territories
The column header names are taken from the first
resultset that the query engine sees, in this case TAB and CNT.
We use the INSERT statement to add new information
to the database. The standard SQL statement for inserting a new row is:
INSERT INTO
(, , ...) values (,
, ...)
You can omit the column list only if you have a value for
every column in the row. Adding a new row to the Supplier table would look like
this:
INSERT INTO Suppliers (SupplierID, CmpyName, Country)
Values (3, 'Isa Cmpy', 'CA')
SQL Server provides us with another form of the INSERT
statement in which we can use an existing table as the source of the new
records. In this case we use the syntax:
INSERT INTO (, , ...)
SELECT
,
, ... FROM WHERE ...
For example, supposing we were implementing a reporting
database (also called a data warehouse) in which we wanted to perform
aggregation and summarization on our vendors. We could define a new table to
contain our calculations and provide some level of isolation from the
transactional system. Our new table might look like this:
|
ReplenishWork
|
|
SupplierID
|
|
CmpyName
|
|
ProductsOutOfStock
|
We can run a SQL batch to load the table using the
second form of the INSERT statement as follows:
INSERT INTO ReplenishWork (SupplierID,
CmpyName, ProductsOutOfStock)
SELECT
P.SupplierID, S.CmpyName, Count(*)
FROM Suppliers as S INNER JOIN Products as P
ON S.SupplierID = P.SupplierID
WHERE
P.Units < 2
GROUP BY P.SupplierID, S.CmpyName
We manage existing information in the database using
the UPDATE statement. SQL Server allows for the basic update statement that
looks like this:
UPDATE
SET = , =
WHERE ...
But it also includes a variation in which you can update one
table using another table as your source. This syntax is an extension Microsoft
includes to the ANSI standard that won't necessarily run in other environments.
UPDATE ReplenishWork SET ProductsOutOfStock =
P.UnitsInStock
FROM Products as
P
WHERE
P.SupplierID = ReplenishWork.SupplierID
Purging the database of unnecessary information, or to
enforce relational integrity, we use the DELETE function. The basic syntax is:
DELETE FROM
WHERE ...
The WHERE clause determines which rows will be affected. A
word of caution when you are deleting data, it takes a long time to type it back
in so you want to be sure you check what you are getting rid of. You can do this
by running a count of the rows that will be affected (select
count(*) from
WHERE…) before executing the delete. You can further
protect yourself from unwanted aggravation by using Transactions.
Variables are used to temporarily hold values that may be
useful in the processing of statements. We declare them using the DECLARE
keyword, specifying the name and the type, and we assign a value to them with
the SET statement. You can also assign a value as part of a SELECT
statement, but using SET is listed as the preferred method.
We reference them as we would column data or constants in the
processing logic of our code using the @ character in front of the name to
denote a variable. Global variables reference the status of the system and the
database engine and other options of the server and are referenced with two @@
characters.
All user declared variables are local in scope, meaning they
remain accessible until the end of the current batch of statements is complete.
For example executing the following script would result in an error because the
variable @MyVar is not declared within the batch in which it is referenced.
DECLARE @MyVar varchar(15)
SET @MyVar = 'New Value'
Go
PRINT 'The value of MyVar is ' + @MyVar
Batches are a grouping of one or more SQL statements that
together form a logical unit of work. They are sent together to the SQL Server
engine, which creates an execution plan. If errors are generated in creating the
plan, none of the statements are actually executed. Errors that occur after the
execution plan has been compiled will cause succeeding statements to fail, but
will not affect previous logic in the batch. This is true unless the batch is
part of a transaction in which all of the statements are rolled back after the
error.
For example, the following Transact SQL Code is comprised of
3 statements, which are all part of the same batch:
DECLARE @Discount float
SET @Discount = 0.05
SELECT
ProductID, UnitPrice, UnitPrice - @Discount * UnitPrice
as SalePrice
FROM Products
ORDER BY SalePrice Desc
GO
Because the @Discount variable is used in the successive
commands, if you try to run either of the last two statements separately you
will raise an error condition.
Creating objects such as views, defaults, procedures, rules
and triggers cannot be combined with other statements in a batch. A batch is
completed by the inclusion of the command GO. An important point to remember
about GO is that it is not a Transact SQL statement, rather it is recognized by
Query Analyzer and ISQL as a marker to send all statements from the previous GO
to the engine and await the result set.
You can see the effect of this when the results of statements
before the GO command are visible in the results pane of Query Analyzer before
the next batch of statements executes. If you include GO in an ODBC call or from
one of the database libraries such as
ADO or OLE DB, the SQL Server engine will flag a syntax error. Another
point is that local variables declared in a batch go out of scope when the batch
end, which is one reason why you cannot include GO in stored procedure
declarations.
A transaction is a grouping of statements that must be either
succeed or fail together as a logical unit of work. You may have heard of the
ACID (Atomicity, Consistency, Isolation and Durability) test for a transaction.
These are the four properties which a valid transaction must meet.
-
Atomicity means that all the statements part of the
group must either succeed or fail together. For example, if you were developing
a banking application to add new accounts, the transaction of adding rows to the
customer table and the account table might for a logical unit of work in which
our data model integrity constraints require that records must exist in both
tables to be valid.
-
Consistency means that the database is left in a
stable state at the end of the transaction, and that all data integrity
constraints are met.
-
Isolation defines that the data is modified either
before or after other transactions against the same rowset. Basically we cannot
alter the data that is at an intermediate state. The locking rules of the
database prevent this from happening.
-
Durability means that the effects of the transaction
are persisted after the end of the transaction, even in the event of a system
failure. A system failure in the middle of a transaction would roll back the
state of the database to where it was before the transaction began.
We manage transactions with the command BEGIN TRANSACTION,
and end it with the command COMMIT or ROLLBACK. When beginning a transaction,
locks on the database prevent other user sessions from altering the data.
For example, suppose we want to implement a sale price on our
Products at Northwind.
BEGIN TRAN
DECLARE @Discount float
SET @Discount = 0.05
UPDATE Products
SET UnitPrice = UnitPrice - (@Discount * UnitPrice)
Go
-- Check the results...
Select
ProductID, UnitPrice
FROM Products
ORDER BY UnitPrice Desc
If we don't like what we see, we can roll back the
transaction and leave the database in its previous state
ROLLBACK
GO
SELECT
ProductID, UnitPrice -- See the original prices!
FROM Products
ORDER BY
UnitPrice Desc
GO
Conversely to make the changes permanent we use the COMMIT operator:
COMMIT
Go
Data Definition Language (DDL) is used to define tables,
views, stored procedures, and other objects that physically and logically
express the data model you are working with. Data modeling and the Data
Definition Language is a big topic that takes up more space than we've been
allotted here. There are some very good books that focus just on this topic
including Louis Davidson's Data Modeling Book ISBN: ???
The core of relational databases are the tables that store
information. We can create tables using the Graphical tools provided with SQL
Server, but this practice lends itself better to development of unique, custom
objects that are still in the process of evolving. Deploying a schema out to
multiple servers in a consistent basis typically relies on scripts to complete
the tasks involved. The Enterprise manager allows us to generate these
scripts, but we still should understand what the scripts are doing in case we
need to customize or change the script.
Our first task in making our data model usable is to create
base set of tables. We do this using the CREATE TABLE statement which specifies
how information is stored in the table. Each column definition includes a name,
data type, whether the column allows null values to be stored or not, and other
optional attributes. The basic syntax diagram for the CREATE TABLE is below:
CREATE TABLE
( )
A basic simple example of the ReplenishWork table can be
created with the following statement:
CREATE TABLE ReplenishWork
(SupplierID int NOT NULL,
CmpyName
varchar(40) NOT NULL,
ProductsOutOfStock
int NULL)
Business changes drive database changes and we don't always
have the luxury of recreating our schema in a production environment. Transact
SQL allows us to expand column sizes and add new columns to existing tables
using the ALTER statement:
ALTER TABLE ReplenishWork
ADD ContactName
varchar(20)
We can also change the size and type of columns, drop
columns, manage constraints and default values using the ALTER statement. If we
didn't need to preserve existing data and were making changes to the schema that
prohibited the use of the ALTER command we can DROP the table before recreating
it. If we needed to reconfigure our ReplenishWork table to support integration
of legacy data that used a character based supplier id we could do so with the
following batch:
DROP TABLE ReplenishWork
GO
CREATE TABLE ReplenishWork
(SupplierID
varchar(16) NOT NULL,
CmpyName
varchar(40) NOT NULL,
ProductsOutOfStock
int NULL,
ContactName
varchar(20) NULL)
Databases manage large amounts of information. In order to be
efficient, they use data structures to map the contents called Indexes. In SQL
Server the indexes come in two flavors: Clustered and Non-Clustered. The
difference is that there can be only one Clustered index on a table, and this
index will determine the ordering of how the data is physically stored. SQL
Server supports up to 249 non-clustered indexes for any given table. To add an
index to our ReplenishWork table we could use the following command
CREATE TABLE ReplenishWork
(SupplierID int PRIMARY KEY CLUSTERED,
CmpyName
varchar(40) NOT NULL,
ProductsOutOfStock
int NULL)
By default SQL Server will use clustering on the primary key
defined for a table. This is important because without at least one clustered
index on a large table, all the data would be scattered throughout the physical
storage, and performance on retrieval of data may suffer. You can also create
indexes on existing tables using the CREATE command:
CREATE INDEX IDX_ReplenishWork
ON ReplenishWork (CmpyName)
On the other hand unnecessary indexes can impact performance
as well. Think of an index as a physical data structure that has a cost
associated with maintaining it when data changes. If the nature of your system
involves capturing transactional information as quickly as possible, you want to
run with the least number of indexes. Each insert, update and delete must be
reflected in each of the indexes on the table, so the fewer the indexes the less
the overhead of those operations.
Systems which are primarily decision support and reporting
based can afford to have more indexes because once the data has been written it
isn't updated as much as a transactional system. You will tend to see more
generous use of indexes in Data Warehousing systems that support a lot of Ad-Hoc
querying.
Views allow us to provide an abstraction layer between the
physical implementation of the data and how users see the information. They are
in essence virtual tables that we can use to limit or expand how we are able to
use the base set of tables in our database. In SQL Server we are limited to a
single SQL Select
statement to define the result set. The security architecture
of SQL Server allows us to grant users access to a view without giving access to
tables (and views) on which the view is based. For example, if we needed to make
an employee phone list available to users but needed to protect other employee
information (such as salary, social security number, birth date, etc) we could
create a view of just the name & phone.
CREATE VIEW PhoneList AS
SELECT
FirstName + ' ' + LastName as EmpName, HomePhone
FROM Employees
go
select
* from PhoneList
The ALTER and DROP commands operate in the same way they do
with tables. One interesting feature of SQL Server is that it supports indexed
and updateable views. This means that you can define a view on some base
objects, create indexes to optimize access and even alter data stored in the
underlying tables, although you need to be aware that there is more overhead in
updating a view than if you managed the data in the underlying tables directly.
In order to create an index on a view, the view must be
created with the SCHEMABINDING option. This option saves information about the
underlying schema is stored in the system tables and SQL Server will prevent
changes that would invalidate our View.
CREATE VIEW PhoneList
WITH SCHEMABINDING
AS
SELECT
FirstName + ' ' + LastName as EmpName, HomePhone
FROM Employees
go
The business rules, which define the how our application
behaves, such as that we will allow users of our application to setup a customer
before they make an order, are implemented using foreign keys and constraints.
ALTER TABLE dbo.Table1 ADD CONSTRAINT
PK_Customer PRIMARY KEY CLUSTERED
(
CustID
) ON [PRIMARY]
We can further define default values and extended properties
for the tables we create using the:
ALTER TABLE dbo.Table1 ADD CONSTRAINT
DF_Table1_CreditLimit DEFAULT 0 FOR CreditLimit
Data Control Language (DCL) allows the owner and
administrator of the database objects to define what rights users have to the
data and objects in the database. The DBA can manage access to the database by
granting and denying users rights to objects. The term CRUD Matrix refers to the
rights to Create, Read, Update, and/or Delete information in a given table.
-
GRANT
Grant is used to give access to someone who needs to use the information
stored in the objects. SQL Server allows us to specify user based or group based
rights, and to apply them to objects or to groups of objects within a database.
-
REVOKE
We use the REVOKE statement to reverse the effect of the
GRANT
statement.
-
DENY
The DENY statement will prevent a user from accessing objects in the database.
With the release of SQL Server 2000 Microsoft takes the SQL
language a step further. Among the many additions in this release, we find new
data types, user defined functions, and extensive support for XML, as well as
indexing, trigger handling, and an expanded suite of system stored procedures.
The result is one of the most flexible, scalable, and complete offerings in the
database management system product field.
While all relational databases implement the basic numeric,
string, and date data-types for storing data, Microsoft has added some which you
may not have heard of. Some of these were added in support of the Replication
functions built into the database engine, others fit more neatly with the new
feature of User Defined Functions. The new data types include:
-
BIGINT:
True to its name, this is a very big whole number. You can use it to store
values in the range of
-9223372036854775808 to 9223372036854775807. This is 19 digits of precision that
can hold some extremely large values. If you are porting data from platforms
that include intelligent keys (like the first 4 digits is the location, the next
6 is customer, the next 3 is the weight, etc.) it would probably be better to
normalize your schema and transform the data into multiple columns.
-
UniqueIdentifier:
Columns of this type store system generated identifiers known as GUID's or
Globally Unique Identifier. The GUID is a 38 character value generated by
windows that is guaranteed to be unique across the world. It is takes the same
form as the CLSID in the COM world. The NEWID function is used to generate it,
and we can store them in the column type UniqueIdentifier.
DECLARE @guid uniqueidentifier
select
@guid = NEWID()
PRINT 'Value of @guid is: '+ CONVERT(varchar(50),
@guid)
Value of @guid is: F4840C7B-F44C-4A49-A12E-097B45AE3CE0
While you may not want to use a GUID as a primary key because
of its cumbersome nature for ad-hoc queries (try correctly typing in the 36
character value), it does allow us to implement n-way replication in which data
is created on multiple servers. For example, if we were going to deploy the
Northwind solution across the country and setup a merge replication with a
corporate site to contain all orders, using a sequentially generated OrderID
could create problems in that the same ID could be created on more than one
server. By redefining it as:
CREATE TABLE Orders (
OrderID uniqueidentifier NOT NULL default NEWID(),
CustomerID nchar (5) NULL ,
EmployeeID int NULL ,
OrderDate datetime NULL ,
RequiredDate datetime NULL ,
ShippedDate datetime NULL ,
ShipVia int NULL ,
Freight money NULL DEFAULT (0),
ShipName nvarchar (40) NULL ,
ShipAddress nvarchar (60) NULL ,
ShipCity nvarchar (15) NULL ,
ShipRegion nvarchar (15) NULL ,
ShipPostalCode nvarchar (10) NULL ,
ShipCountry nvarchar (15) NULL
)
While this makes the replication scheme possible, you should
define a clustered primary key on some other columnset. Ordering the data
according to a randomly generated value will result in poor performance when
processing the data because the data is scattered all over the disk in no order
means there and there is increased seek time.
SQL_Variant:
Similar to the variant data type in Visual Basic, SQL_Variant is used to
store data values of all the supported data types not including text, ntext,
image, and timestamp. We can use it for column definitions, parameters,
variables, and for the return value of user defined functions. Microsoft
provides the function SQL_VARIANT_PROPERTY to return the base data type and
other information about data stored.
One use for variants is in defining schema to hold attribute
data. Suppose you need to keep a variable number of optional attributes about a
person, such as hair color, weight, age, number of children, birthdate, favorite
pet, etc. In our example we are collecting what information we can, but because
there are a lot of optional attributes, we end up with a sparsely populated
table.
PersonID
(int)
|
HairColor
(varchar 8)
|
Weight
(int)
|
Age
(int)
|
Sex
(char 1)
|
Birthdate
(datetime)
|
Pet
(varchar6)
|
1
|
Brown
|
|
|
|
|
Cat
|
2
|
|
250
|
32
|
M
|
|
Dog
|
3
|
|
|
|
|
1/2/62
|
Bird
|
Using SQL Variant we could more efficiently store
the same set of information using a schema that looks like this:
PersonID
(int)
|
AttributeName
(varchar 12)
|
AttributeValue
SQL_Variant
|
1
|
HairColor
|
Brown
|
1
|
Pet
|
Cat
|
2
|
Weight
|
250
|
2
|
Age
|
32
|
2
|
Sex
|
M
|
2
|
Pet
|
Dog
|
3
|
Birthdate
|
1/2/62
|
3
|
Pet
|
Bird
|
Table:
True to its name, the table data type is a variable that temporarily stores a
rowset in memory for later processing that can be used like an ordinary table.
It accomplishes this using space in the TEMPDB, but because it has a well
defined scope, it efficiently cleans up after itself. It also requires less
locking and logging than tables created in the TEMPDB, which results in fewer
recompilations of the execution plan. The
Table datatype was added in support of a powerful new feature of SQL Server, the
User Defined Function.
You declare a table variable using the DECLARE
statement. You can then manage the content of that table variable, inserting,
updating and deleting as needed. Since the scope of variables is local to the
current batch, other processes cannot access your table variable.
declare @t table(PersonID int, AttrNm
varchar(12), AttrVal sql_variant)
insert @t values(1, 'HairColor', 'Brown')
insert @t values(1, 'Pet', 'Cat')
insert @t values(2, 'Weight', 250)
insert @t values(2, 'Age', 32)
insert @t values(2, 'Sex', 'M')
insert @t values(2, 'Pet', 'Dog')
insert @t values(3, 'Birthdate', GetDate())
insert @t values(3, 'Pet', 'Bird')
Select
convert(varchar(15),SQL_VARIANT_PROPERTY(AttrVal, 'BaseType')), * from @t
Structured
programming languages have long provided the ability to create procedures and
function calls to breakdown complex routines into manageable units. The
introduction of the User Defined Function (UDF) with SQL Server 2000 provides us
with a new way of organizing and optimizing our use of the database engine. A
User Defined Function takes zero or more input parameters and returns either a
scalar or a table based value. We could use a UDF to calculate the distance
between two geographic locations, return a list of projects for a given client,
or return an amortization table for a given set of loan parameters.
UDF's are
similar to stored procedures in that they both use the control statements and
data manipulation code to define the logic of the module. But they differ in
that UDF's don't allow modification of the global state of the calling process
or the database environment. In other words User Defined Functions are stateless
in that there can be no side affects. You can't alter data in tables that are
not locally defined within the UDF, and you can't use non-deterministic
functions within your code. Deterministic functions always return the same value
given the same input parameters, such as the DATEADD function. The GETDATE
function is non-deterministic in that the value depends on when you call it.
SQL Server
2000 supports 3 types of User Defined Functions: Scalar Functions, Inline Table
Valued Functions, and Multiple Statement Table Valued Functions. Scalar
Functions are UDF's that return a single discrete value and contain logic that
is embedded between
begin
and
end
statements. This flavor of UDF must return a single value each time it is
called. We create a UDF using the DECLARE function:
CREATE FUNCTION udfSalePrice (@ProductID int, @TranDate datetime)
RETURNS varchar(50)
BEGIN
declare @UnitPrice money, @ProductName varchar(15)
select
@UnitPrice = UnitPrice, @ProductName = ProductName
from
Northwind..Products
where
ProductID = @ProductID
if
@TranDate > '28-NOV-2003 08:00' and
@TranDate < '28-NOV-2003 18:00'
return @ProductName + ' on Sale for $' +
convert(varchar(10),@UnitPrice * .85)
return @ProductName + ' for $' + convert(varchar(10),@UnitPrice)
END
Go
select
Northwind.dbo.udfSalePrice(3, GetDate()) as Price
Table
Valued Functions return a table variable that can be comprised from one or more
statements. If it is a single table and doesn't specify the column set in the
declaration, then the statement doesn't require the begin/end wrapper, and it is
considered to be Inline.
create function udfRegionOrders (@Region varchar(15))
returns TABLE
as
return (select
*
from Northwind..Orders where ShipRegion = @Region)
go
select
*
from northwind..udfRegionOrders ('RJ')
Multi-statement UDF's allow us to do more complex processing in order to derive
the result set. Like scalar UDF's we can declare local variables, use flow
control statements, manage data in locally defined table variables and call
stateless extended stored procedures.
Using our
Northwind database for this example, suppose we wanted to create a function that
returned sales ranking information.
CREATE FUNCTION udfTopSales (@nCount int)
RETURNS @retTable TABLE
(Rank
int, ProductID int, SaleAmt money)
AS
BEGIN
DECLARE @ProdID int, @Sales money, @Rank int
SET
@Rank = 1
DECLARE curSales CURSOR FOR
SELECT productID, Sum(UnitPrice * Quantity - Discount) as SaleAmt
FROM
[Order Details]
GROUP
BY ProductID
ORDER
BY SaleAmt DESC
OPEN
curSales
FETCH
FROM curSales into @ProdID, @Sales
WHILE
@@FETCH_STATUS = 0 BEGIN
IF
@Rank <= @nCount
INSERT INTO @retTable
values (@Rank, @ProdID, @Sales)
FETCH
NEXT FROM curSales into @ProdID, @Sales
SET
@Rank = @Rank + 1
END
CLOSE
curSales
DEALLOCATE curSales
RETURN
END
go
select * from dbo.udfTopSales (10)
go
While we can run ad-hoc SQL Statements within Query
Analyzer, and we have the ability to use the
execute() function in ADO, frequently performed logic can be
encapsulated into batches and saved as Stored Procedures on the server. The
benefits from this approach not only include the reduction of work of testing
multiple programs and applications to ensure that the logic is correct, but also
from the execution plan generated on the server when the stored procedure is
compiled and saved.
Stored Procedures are a collection of Transact SQL
statements which perform a defined set of work. This can be as simple as
updating a single row or returning a configuration setting, to being as complex
as implementing a business rule that requires sending an e-mail alert containing
crucial information specific to the problem. They can take both input and output
parameters, and always return an integer value. There are several types of
stored procedures including those that come with SQL Server, user defined and
extended stored procedures.
Microsoft includes a broad suite of system stored
procedures for administering the server engine. There are over 900 system stored
procedures that SQL Books breaks into 17 categories. They include procedures for
managing system security, to setup and manage distributed queries and linked
servers (making tables and data on different machines visible to each other),
data replication, retrieve information about schema, view current performance
and usage information, manage SQL Agent (the scheduler that comes with SQL
Server), and to interface with XML data.
For example, to return the schema of a table we can
use the sp_help stored procedure:
If the stored procedure call is the first command in
the batch we can omit the EXEC(UTE) command. To see the definition of a view or
a stored procedure or other system objects we use sp_helptext:
User
defined Stored Procedures.
We can define our own stored procedures to provide
users of the database with a consistent implementation of work logic across
applications. This simplifies the administration of business rules by acting as
a single point of control on the database server itself. If the rule is changed,
we only need to update the associated stored procedures instead of the various
applications making use of the data.
Stored
procedures are declared with the CREATE PROCEDURE statement.
CREATE PROCEDURE spSimple as
select
'HELLO WORLD!'
go
For more
information on developing stored procedures see chapter XX
Extended stored procedures (XP's).
Extended stored procedures allow us to extend
Transact SQL develop logic outside of the SQL Server using tools such as C or
Visual Basic. They follow the same calling standards as a normal stored
procedure, complete with parameters and a return code. XP's are implemented
using a call to dynamic link libraries (dll) that conform to the Extended Stored
Procedure API. you need to be careful that they are thoroughly tested and
checked for aberrant behavior before using them in a production situation
because these calls are external to SQL Server.
If we had written a standard event management system
to log information to the NT Event Log and wanted to include it in our SQL
environment we call the stored procedure sp_addextendedproc:
USE Master
go
EXEC sp_AddExtendedProc xp_LogEvent
'c:\winnt\system32\EventLogger.dll'
Where we use
T-SQL
The Transact SQL Language is the primary interface
to the SQL Server database engine. Any time we interact with SQL Server, whether
it is through one of the provided administrative tools, a custom application, or
through the command line interface ISQL, we are executing SQL statements. The
vehicle for this might be ODBC or one of the high level drivers, or it might be
with DB-Library, the original interface provided for talking to the database. In
any case, requests in the form of SQL statements are made to the engine, which
then processes them and returns the result set.
Query Analyzer is the current ad-hoc user interface
that comes with SQL Server. It is installed as part of the client tools and is
typically one of the first places a developer will begin working with SQL. By
default it opens to an edit screen in which we can type our code. It provides
context based coloring which is a nice enhancement from the previous version.
To run a batch of statements, you can click on the
Execute button and results pane will be display the output of the selected
batch. By default the entire script will be run, but you can limit it to
portions of the script by selecting what you want to execute and then only the
selected statements will be run.