Coding Practices

SQL coding does matter on performance. If your SQL code is bad, no any expensive hardware can help. Below are some important SQL coding practices that I think every SQL programmer must follows:

1. Set-based programming
Many programmers think in procedural terms, they build iterative constructs such as cursors and loops, in which it process data one row at a time in specific order. Start thinking in terms of relational sets, avoid cursors and loops. Actually most data processing can be achieved by set-based SQL statements work on multiple rows at a time.

2. Avoid *
SELECT * not only retrieves unnecessary columns from the underlying tables to the query result. It also forces the query optimizer to choose a Scan operator in the execution plan rather than a proper index seek, or imposes RID/Key Lookup operator to retrieve all columns from the heap/clustered index. Both the scanning and key lookup operators impose extra page read. Don't be lazy, type the truly interested columns in all queries especially in production database.

3. SARGable query
Consider the follow two queries:
SELECT * FROM Table1 WHERE col1 = (1 + 1)
SELECT * FROM Table1 WHERE (col1 - 1) = 1
The above queries return exactly the same result set. By the way, let's say there's an index on the integer column col1. The first query performs an Index Seek operator, while the second query performs an Scan operator. SQL Server Query Optimizer is not smart enough to resolve the left hand side of the filtering criterion of the second query into col1 index, it's the responsibility of you SQL programmers to resolve it.

4. Data types in sync
When writing Transact-SQL and ADO.NET code that uses parameters, it is important that you match data types in the queries with column data types in the tables. In some cases SQL Server can do an implicit data type conversion for you (e.g., convert a number to a string). But what you may not realize is that while the conversion of the data types will not affect the results, SQL Server may not be able to use the existing index on the column being queried. This rule also applies on column-to-column comparison and table join.

5. EXISTS vs COUNT(*)
To do row existence checking, uses IF EXISTS rather than IF SELECT COUNT(*)... > 0. The problem with using COUNT(*) is that it will count all the rows that match; instead, it can stop after finding the first match using the EXISTS query.

6. UNION ALL vs UNION
The UNION operation combines the results of two or more queries into a single result set. UNION will remove duplicated rows after the sets have been concatenated. UNION ALL will return all the rows, includes duplicates. In order to remove duplicated rows, the UNION operator has an execution plan that uses an Sort (Distinct Sort) logical operator. This distinct sorting operation can double the query cost. So when writing SQL always consider whether UNION ALL will fit your problem.

7. Avoid scalar function in SELECT
Avoid using scalar function inside a query which returns a large number of rows. Scalar function behave like a cursor when you use Scalar function inside a query which returns large number of rows. Change the scalar function to inline or multiline table function or a view.

8. SET NOCOUNT ON
You must reduce the data transferred on the network. Returning the number of rows affected by the statements executed to the client is unnecessary. Put it in all Stored procedures.

9. Keep transaction short
Blocked process and deadlock are outcomes of long transaction. You must keep the transaction as short as possible. And also refer the object in the same order inside transaction in order to reduce the chance of deadlock.

10. Access the necessary rows only
Use WHERE condition as far as possible. You should limit the rows fetched by the query.

11. Avoid negative operator
Avoid using !=, NOT, NOT IN, NOT EXISTS kind of operator because it causes a table scan. Query engine has to ensure there is not data till the last row is read.

12. Avoid LIKE '%ABC...'
If you use % at the beginning of the searching value, the query will go for table scan. If the application is more text searching kind go for Full Text Index.

13. Using Transactions and SET XACT_ABORT ON to Handle Errors
By default, XACT_ABORT is OFF, which means that in some circumstances SQL Server can continue processing when a T-SQL statement causes a run-time error. In other words, for less severe errors, it may be possible that SQL Server will roll back only the statement that caused the error, and to continue processing other statements in the transaction, and so compromising data integrity. If XACT_ABORT is turned on, SQL Server stops processing as soon as a T-SQL run-time error occurs, and the entire transaction is rolled back. You may think we can use TRY…CATCH to handle errors too, but be aware that TRY…CATCH cannot catch all errors, and it's quite complicated. In many cases, we do not need sophisticated error handling. Quite frequently, all we need to do in case of an error, is roll back all the changes and throw an exception, so that the client knows that there is a problem and will handle it. In such situations, a perfectly reasonable approach is to make use of the XACT_ABORT setting ON.

14. Linked-server query: Distributed (4-part) query vs OPENQUERY
Distributed (4-part) query may scan the remote table and then locally apply filter, any indexes in the remote table may be ignored. But OPENQUERY sends raw query to the remote server, and it's solely the remote server to process the query and return result to the local server. Always use OPENQUERY for better performance.

15. Avoid create Constraint on Temp Table
If a temporary table is created with a named constraint and the temporary table is created within the scope of a user-defined transaction, only one user at a time can execute the statement that creates the temp table. For example, if a stored procedure creates a temporary table with a named primary key constraint, the stored procedure cannot be executed simultaneously by multiple users.

16. Use SCOPE_IDENTITY instead of @@IDENTITY
Because @@IDENTITY is a global identity value, it might have been updated outside the current scope and obtained an unexpected value. Triggers, including nested triggers used by replication, can update @@IDENTITY outside your current scope. To resolve this issue you must replace references to @@IDENTITY with SCOPE_IDENTITY, which returns the most recent identity value in the scope of the user statement.

17. Avoid using sp_ as prefix for stored procs
The sp_ prefix designates system stored procedures. If you use that prefix for your stored procedures, the name of your procedure might conflict with the name of a system stored procedure that will be created in the future.

18. Extract deterministic function calls from WHERE predicates
In a WHERE predicate, a function call is deterministic if its value does not depend on the selected data (deterministic functions always return the same result any time they are called with a specific set of input parameter values, ref.: Deterministic and Nondeterministic Functions). Such calls could cause unnecessary table scans, which decrease database performance. To resolve this issue, you can assign the result of the call to a variable that you use in the WHERE predicate. E.g.
CREATE PROCEDURE [dbo].[Procedure2WithWarning]
@param1 INT = 0,
AS
BEGIN
SELECT [c1], [c2], [c3], [SmallString]
FROM [dbo].[Table1]
WHERE [c2] > ABS(@param1) -- Deterministic function call NOT depend on any selected data
END
The procedure above could cause table scan. You can resolve it by rewriting the procedure as below:
CREATE PROCEDURE [dbo].[Procedure2Fixed]
@param1 INT = 0,
AS
BEGIN
DECLARE @AbsOfParam1 INT
-- Wrap the deterministic function call by a variable before reference it in a WHERE
SET @AbsOfParam1 = ABS(@param1)
SELECT [c1], [c2], [c3], [SmallString]
FROM [dbo].[Table1]
WHERE [c2] > @AbsOfParam1
END

No comments:

Post a Comment