Thursday, July 23, 2009

SQL Refresher: It is How You Say it That Matters

While growing up, we were all taught that how you say something is just as important as what you say. Imagine if you went into a restaurant and said to the waitress "Fetch me some coffee you lazy slug!". You probably would not receive the coffee for quite some time ( .. and that is a best case scenario). Well, sometimes we forget that databases are no different.

A recent post on cf-talk, by Brad Wood, highlighted exactly this point. He mentioned that using string manipulation functions on a particular column would prevent the proper usage of indexes. Furthermore, it could negatively impact performance as it would likely require a scan of every record in the table. It is a bit like having to read through a two thousand page book looking for a favorite section. You know it is in there somewhere, but without an index you have not a clue where to begin looking for it. So you must resort to reading all of the pages to find it.

Well, to test this theory I created a small table in MS SQL and populated it with about 60,000 records. Then created a non-clustered index on the WidgetName column.

--- Create the sample table
WidgetName VARCHAR(150),
DateCreated DATETIME

--- Create the indexes
CREATE INDEX idxWidget_WidgetName ON Widget(WidgetName)
CREATE INDEX idxWidget_DateCreated ON Widget(DateCreated, WidgetName)

--- Populate the table
SET @i = 0

WHILE @i < 60000
INSERT INTO Widget(WidgetName, DateCreated)
VALUES ( 'Widget #'+ CAST(@i AS VARCHAR), DATEADD(d, @i, getDate()) )

SET @i = @i + 1

Finally, I created two versions of a query that searched for records where the WidgetName began with the characters "Widget #111%". Now both queries return the exact same information. They just phrase the request a bit differently.

SELECT WidgetName
FROM Widget
WHERE WidgetName LIKE 'Widget #111%'

SELECT WidgetName
FROM Widget
WHERE LEFT(WidgetName, 11) = 'Widget #111'

When I viewed the execution plans, it clearly showed that using the LEFT() function caused the database to perform a less efficient index scan. The index scan essentially touches every record in the index to obtain the desired data. Whereas the index seek locates the data with much greater precision. Notice the overall estimated cost of the second query is very high, relative to the first one (99% versus 1%) as well as its i/o costs. Keep in mind the numbers are relative to the overall batch here. But they do provide a good sense of one query's efficiency over another.

Now this does not just apply to string columns. Say you needed to retrieve a listing of all widgets created in the month of July 2009, sorted by name. In this scenario, you might create a covering index on the DateCreated and WidgetName columns. Again, write two similar queries. Both requesting the same data, just in slightly different ways.

--- Queries on Widget Date
SELECT WidgetName
FROM Widget
WHERE MONTH(DateCreated) = 7
AND YEAR(DateCreated) = 2009
ORDER BY WidgetName

SELECT WidgetName
FROM Widget
WHERE DateCreated >= CONVERT(DATETIME, '20090701', 112)
AND DateCreated < CONVERT(DATETIME, '20090801', 112)
ORDER BY WidgetName

Once again, the execution plans show the query without the functions used the more efficient index seek, with the relative costs being %4 versus %96.

Of course there are always exceptions. For example, if a table contained a small amount of data or a query requests a large percentage of the records from a table, the database engine may decide it is more efficient to perform some sort of scan. So ultimately the database engine makes the final decision. But you can certainly point it in the right direction by writing smarter queries.

Now obviously there is a lot more to analyzing query plans than just reviewing high level numbers. But you can explore the specifics of your database on your own. If you are interested in reading more about one of the more common index types (b-tree), this entry by Matt Fleming is a pretty good read.


  © Blogger templates The Professional Template by 2008

Header image adapted from atomicjeep