2022-03-02

How OPTIMIZE FOR UNKNOWN estimates the number of rows return

OPTIMIZE FOR is a query hint that can help with the parameter sniffing problem, but it requires you to specify a literal constant value for a local variable used in the query. On the other hand, OPTIMIZE FOR UNKNOWN, which was introduced in SQL Server 2008, doesn't require you to specify a constant value, the query optimizer will use statistical data instead to determine the value for a local variable during query optimization. In this blog post, I will explain how the query optimizer engine calculates the estimated number of rows by using the information on the density vector of the column statistics.

To better understand how OPTIMIZE FOR UNKNOWN works, let's create the following test stored procedure in AdventureWorks2017 database:

USE AdventureWorks2017
GO
CREATE OR ALTER PROC test
    @pid int
AS
SET NOCOUNT ON;
SELECT * FROM Sales.SalesOrderDetail
    WHERE ProductID = @pid
    OPTION (OPTIMIZE FOR UNKNOWN) ;
GO

Running this stored procedure, the actual execution plan shows the estimated number of rows 456.079


As the [Sales].[SalesOrderDetail] table has an index [IX_SalesOrderDetail_ProductID] on its [ProductID] column, let's check the density value from the statistics of this index.

Density is information about the number of duplicates in a given column or combination of columns and it is calculated as 1/(number of distinct values). We can verify the density value 0.003759399 from below query.


From the statistics, the total number of rows in the table is 121317

To obtain the estimated number of rows, the query optimizer engine multiplies the column density value by the total number of rows in the table, 0.003759399 * 121317 = 456.079, as shown on the actual execution plan which the first picture illustrated.

No comments:

Post a Comment