Wednesday, April 13, 2005

Notes on SQL

Asterisks in select statements are something of a performance hog. I try not to use them in production code unless I actually need all the columns in the table.

Let me give you a real world example:

The stock version of the IntraLearn ASP 3.0 LMS can handle about 100 to 150 course logins a day before it grinds to a halt. It ships with no indexes.

I set up the performance monitor for about 12 hours and fed the results into the MS SQL "Create Index Wizard." The resulting indexes allowed the system to handle 400 to 500 logins a day.

Next, I decrypted the Cold Fusion scripts in which IntraLearn had been written (CF gives you the option of a rather absurd "encryption" routine to obfuscate your code) and started rewriting queries. All I did was remove asterisks and replace them with references to the actual columns used in the code. The result was a system, still on the same hardware, that could handle 1,300 to 1,500 student logins a day.

Enabling Cold Fusion's query result caching for language variable queries added another 200 daily logins. I did more to optimize IntraLearn after that, but that's not relevant here.

The advice about indexes is good. If you do a select * the only index used is generally the clustered index. It's always better to have indexes that reflect the actual queries you run, but a select * eliminates the possibility of doing that.

One way to see the difference is to run a few select queries in the MS SQL Query Analyzer with "Show Execution Plan" enabled. It adds an "Execution Plan" tab next to "Grids" and "Messages" that tells you, among other things, which indexes if any are being used to process the query.

An easy way to test one query against another is to run them both at the same time, and see what percentage of the total Query cost each one represents. I used this trick at my last job to convince the CTO that a few applications needed some work. Looking up the same data twice and demonstrating the existing query ate up 85% of the Query cost was a simple and persuasive argument. Not terribly scientific, but persuasive.

For example, say you want to get the imageid and expiration date for all the items in photos_def. On my current test database, I run the following queries:

    select * from photos_def

select imageid, expires_on from photos_def

The one with the asterisk represents 77.05% of the total Query cost.

Another advantage to only retrieving the columns you need is the reduced memory and processing overhead. Using the more efficient query means I don't have to iterate through the other values or waste memory assigning them to local variables. Don't underestimate this memory and CPU savings. It adds up FAST.

Matthew (Webmaster of onlineconfessional.com)


This was written as part of an in office conversation about this Usenet post


David Portas Oct 8 2004, 1:00 pm show options
Newsgroups: microsoft.public.sqlserver.server
From: "David Portas" - Find messages by this author
Date: Fri, 8 Oct 2004 21:00:18 +0100
Local: Fri,Oct 8 2004 1:00 pm
Subject: Re: asterisk in select_list in queries...

Best practice is to avoid using SELECT * in queries (except in an EXISTS
subquery or other subqueries that don't return data and aren't referenced by
an outer query).

Listing the column names makes sense in an N-Tier environment because you
want to make efficient use of network resources by returning to the client
only the data that is actually needed. This is an important difference from
a desktop, ISAM database like FoxPro where you have to retrieve a whole
record whether all the data is required or not.

Listing only the required columns also increases the opportunities for SQL
Server to optimize your query by making use of indexes.

Also, listing column names improves reliability and ease of maintenance. If
you later add another column you don't want to break existing code that
doesn't require that column. Use a column list and then just modify the code
that needs to reference the new column. If you use an asterisk in your
SELECT list then potentially more code could need modification and more code
would need to be unit-tested for each schema change.

Shortcuts? In Query Analyzer you can drag a list of column names from the
Object Browser into the query window. That can save you a lot of typing.

--
David Portas
SQL Server MVP

No comments: