2018-03-22

Myth: Index Seek Always Better Than Table/Index Scan

Many DBAs and developers believe index seek always performs better than scan. But in fact, it depends. Let's take a look on an example:

SET STATISTICS IO, TIME ON;
SELECT TOP 1000 * FROM [Users] WHERE DisplayName LIKE 'B%';
SELECT TOP 1000 * FROM [Users] WITH(FORCESEEK) WHERE DisplayName LIKE 'B%';


In this example, I use the Stack Overflow public database StackOverflow2010 which is free to download. The Users table there has a primary key clustered index on its id column, and a nonclustered index on its DisplayName column, which has no any included columns. Below shows the table schema and the index creation statement:
CREATE NONCLUSTERED INDEX IX_DisplayName ON Users (DisplayName);

When the 1st select query (without any table hints) being executed, the engine picks Clustered Index Scan operator to run it. And for the 2nd select query, as it has a FORCESEEK table hint, Index Seek on IX_DisplayName and Key Lookup on the primary key will be used. Below shows the actual execution plans:

So many people will jump in and suggest to optimize the query by the FORCESEEK hint. No, it's not so simple. Let's take a look on the STATISTICS IO output:

Table 'Users'. Scan count 1, logical reads 1156, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 3081, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The "tuned" 2nd query induces more reads, and so more memory usage and more disk IO. That's why the SQL Server engine decides to scan rather than seek by itself. The extra reads on the 2nd query come from the Key Lookup for each rows returned from the Index Seek operator. Also, Clustered Index Scan in the 1st query isn't really scan the whole table, it's because there's a residual predicate on the DisplayName column, and also the TOP operator already placed a limit on the number of rows to be read.
From this demonstration, we can conclude that scan sometimes can perform better than seek.