2022-04-08

Missing Index Suggestion VS Database Engine Tuning Advisor

This month blog post I would like to share a performance tuning case. Actually I used to rely on missing index DMVs (I had a post too) and get away from Database Engine Tuning Advisor (DTA) as DTA mostly give you too many indexes to add which will slow your database DML workload. But recently I found there are a few cases that missing index suggestion cannot give you the right index to add while DTA can, and these cases are so obvious that even a developer just take a glance can make the index suggestion. Let's take a look at the below experiment on StackOverflow2013 sample database:

USE StackOverflow2013
GO
SELECT MIN(CreationDate) FROM Users;
SELECT MAX(CreationDate) FROM Users;
SELECT TOP 1 CreationDate FROM Users ORDER BY CreationDate;

You can easily infer that an index on column CreationDate can speed up these queries, but look at the actual execution plan, no any missing indexes being suggested, seems SQL Server just happily to run them by Cluster Index Scan:

You may say that because SQL Server thinks the table is too small, so don't bother to consider index seek. But in fact the table has 2 millions rows, and the STATISTICS IO also shows that the scanning operators are not cheap:
Table 'Users'. Scan count 1, logical reads 44530, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 1,
logical reads 44530, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 5,
logical reads 45184, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 5,
logical reads 45184, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Now let's try the same queries on DTA.

DTA can give you the good suggestion:

CREATE NONCLUSTERED INDEX [_dta_index_Users_12_149575571__K4] ON [dbo].[Users]
(
    [CreationDate] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

Now let's add the new index suggested by DTA, then re-run the workload queries in SSMS to check the actual execution plan and statistics io:

Table 'Users'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

As you can see, the new index suggested by DTA [_dta_index_XXX] is being used, and the logical read is hugely reduced.

The conclusion is, rather than solely rely on missing index suggestions, DTA should also be considered. We should compare both results and merge the suggestions.


2022-03-02

How OPTIMIZE FOR UNKNOWN estimates the number of rows return

OPTIMIZE FOR is a query hint that can help with the parameter sniffing problem, but it requires you to specify a literal constant value for a local variable used in the query. On the other hand, OPTIMIZE FOR UNKNOWN, which was introduced in SQL Server 2008, doesn't require you to specify a constant value, the query optimizer will use statistical data instead to determine the value for a local variable during query optimization. In this blog post, I will explain how the query optimizer engine calculates the estimated number of rows by using the information on the density vector of the column statistics.

To better understand how OPTIMIZE FOR UNKNOWN works, let's create the following test stored procedure in AdventureWorks2017 database:

USE AdventureWorks2017
GO
CREATE OR ALTER PROC test
    @pid int
AS
SET NOCOUNT ON;
SELECT * FROM Sales.SalesOrderDetail
    WHERE ProductID = @pid
    OPTION (OPTIMIZE FOR UNKNOWN) ;
GO

Running this stored procedure, the actual execution plan shows the estimated number of rows 456.079


As the [Sales].[SalesOrderDetail] table has an index [IX_SalesOrderDetail_ProductID] on its [ProductID] column, let's check the density value from the statistics of this index.

Density is information about the number of duplicates in a given column or combination of columns and it is calculated as 1/(number of distinct values). We can verify the density value 0.003759399 from below query.


From the statistics, the total number of rows in the table is 121317

To obtain the estimated number of rows, the query optimizer engine multiplies the column density value by the total number of rows in the table, 0.003759399 * 121317 = 456.079, as shown on the actual execution plan which the first picture illustrated.

2022-02-04

Availability Group Failover History

This blog post illustrates how to check history of failovers of alwayson availability group. Below query can be used to check it by querying the alwayson_health extended events session (created and started automatically when you create the availability group using SSMS New Availability Group Wizard).

DECLARE @FileName NVARCHAR(4000)
SELECT @FileName =
target_data.value('(EventFileTarget/File/@name)[1]','nvarchar(4000)')
FROM (
        SELECT
            CAST(target_data AS XML) target_data
            FROM sys.dm_xe_sessions s
            JOIN sys.dm_xe_session_targets t
                ON s.address = t.event_session_address
            WHERE s.name = N'AlwaysOn_health'
    ) ft
 
SELECT
    XEData.value('(event/@timestamp)[1]','datetime2(3)') AS event_timestamp,
    XEData.value('(event/data/text)[1]', 'VARCHAR(255)') AS previous_state,
    XEData.value('(event/data/text)[2]', 'VARCHAR(255)') AS current_state,
    ar.replica_server_name
FROM (
    SELECT CAST(event_data AS XML) XEData, *
        FROM sys.fn_xe_file_target_read_file(@FileName, NULL, NULL, NULL)
        WHERE object_name = 'availability_replica_state_change'
     ) event_data
JOIN sys.availability_replicas ar
    ON ar.replica_id = XEData.value('(event/data/value)[5]', 'VARCHAR(255)')
ORDER BY event_timestamp DESC;
 
Make sure the AlwaysOn_health session is running first.

We could run the same query on any of the availability replicas to see their state change histories.


2022-01-07

Server Low Memory causes SQL Server hang and Availability Group failover

This month blog post I would like to share a case that I encountered and was solved by calling Microsoft premier support. I have a SQL Server 2016 primary instance that configured high availability with other 3 nodes on an availability group, hosting a mission critical database total 6TB, the primary sql instance only had 64GB hardware memory. Client programs sometimes run a batch job which involves high volume of data import and read on that mission critical database. Database inaccessible occurs occasionally while the batch program running, made the batch failed and needed to restart again. Users complained on this problem. Checking the sql server error log, error messages were found as below:

Process 0:0:0 (0x5748) Worker 0x000001DD933D0160 appears to be non-yielding on Scheduler 7. Thread creation time: 13283945227971. Approx Thread CPU Used: kernel 130515 ms, user 0 ms. Process Utilization 62%. System Idle 33%. Interval: 130527 ms.

***Stack Dump being sent to D:\Program Files\Microsoft SQL Server\MSSQL13.HKRDBSP1\MSSQL\LOG\SQLDump0017.txt

Always On Availability Groups connection with secondary database terminated for primary database 'XXX' on the availability replica 'XXX' with Replica ID: {b94d35e1-6b50-48f3-8421-93fe6bf7c8b0}. This is an informational message only. No user action is required.

Unable to access availability database 'XXX' because the database replica is not in the PRIMARY or SECONDARY role. Connections to an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.

As an application DBA, when we see dump was generated by SQL Server, what we can do is only call Microsoft for help. Below is the investigation done by Microsoft support engineer:

Below are non-yielding call stacks of the dump. We found SQL SERVER was hung when allocating memory. Kernel time is the majority which means the slowness is in windows OS.

# Call Site

00 sqldk!SOS_MemoryBlockAllocator::CommitBlockAndCheckNumaLocality

01 sqldk!SOS_MemoryBlockAllocator::AllocateBlock

02 sqldk!SOS_MemoryWorkSpace::AllocatePage

03 sqldk!MemoryNode::AllocatePagesInternal

04 sqldk!MemoryNode::AllocatePages

05 sqldk!MemoryClerkInternal::AllocatePagesWithFailureMode

06 sqldk!MemoryClerkInternal::AllocatePages

07 sqldk!TVarPageMgr<0>::PviNewVarPage

08 sqldk!TVarPageMgr<0>::PbAllocate

09 sqldk!CMemObj::InlineAlloc

0a sqldk!CMemObj::Alloc

  

IncidentType                          NonYieldingScheduler

OffenderThreadId                 ~256

IsPreemptive                          False

Wall_clock_time_ms             70215

Kernel_time_ms                     70203

User_time_ms                        0

passes                                     15

Suggestion

1. Please enable ‘Lock pages in memory’ for this SQL SERVER. ‘Lock pages in memory’ will use AWE api to allocate memory. It will be more efficient.

Please follow below official article. Add “Lock pages in memory” privilege to “SQL Service account”.

https://technet.microsoft.com/en-us/library/ms190730(v=sql.105).aspx

2. Please increase physical memory for this VM to 128GB, and increase the sql server max server memory setting to 112GB.


2021-12-13

Missing Index Recommendation Column Order

When SQL Server is processing a query, it will sometimes make a suggestion for an index that it believes will help that query run faster. These are known as missing indexes, which can be obtained from the missing index DMVs, or more directly by showing the graphical execution plan when you execute the query in SSMS. But can you blindly create the missing index that recommended by SQL Server? In this post, I will show you why you cannot blind trust what SQL Server told you to do.

CREATE TABLE Candidates (

id int PRIMARY KEY IDENTITY(1, 1),

gender char(1) NOT NULL,

displayName varchar(10) NOT NULL

);

GO

SET NOCOUNT ON;

INSERT Candidates (gender, displayName) SELECT 'M', substring(replace(newID(), '-', ''), cast(RAND() * (31 - 10) AS INT), 10);

GO 500000

INSERT Candidates (gender, displayName) SELECT 'F', substring(replace(newID(), '-', ''), cast(RAND() * (31 - 10) AS INT), 10);

GO 500000

UPDATE STATISTICS Candidates WITH FULLSCAN;

GO

-- Enable Actual Execution Plan for below query

SELECT displayName, gender FROM Candidates WHERE displayName = 'CF53A906FD' AND gender = 'M' OPTION (RECOMPILE);

And here is the recommended index to create:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[Candidates] ([gender],[displayName]);

As you can see, the column order has no any considerations on selectivity, they are just a comma-delimited list of column order in the table (actually the key columns in the recommended index will be ordered by equality predicate group first, then inequality predicate group, and inside each group follows the column order in table definition). You can double check the column order in the table definition by executing sp_help system stored procedure.

2021-11-04

Solving Last-Page Insert Latch Contention by SQL Server 2019 new feature: OPTIMIZE_FOR_SEQUENTIAL_KEY

Two years ago I wrote a blog post Solving Last-Page Insert Latch Contention by Index Partitioning which explains how to solve a problem called last-page insert PAGELATCH_EX contention, That is if you have an even-increasing index on a table, such as IDENTITY or GETDATE(), then during huge amount of concurrent queries/threads inserting new rows into that table, different threads will contend for exclusive latch PAGELATCH_EX on the last data page of that index in the buffer pool. The solution that I provided on that old post requires adding a computed column on the table, append that column into the clustered index, and enabling table partitioning, which involved quite a lot of table structure changes. SQL Server 2019 introduces a new feature, OPTIMIZE_FOR_SEQUENTIAL_KEY, which is a new index option that intended to address the last page insert contention. Below demonstration shows how you do that.
--When you create the table, e.g.
CREATE TABLE DemoTbl (
Id INT IDENTITY(1, 1) NOT NULL,
col1 varchar(50),
CONSTRAINT PK_DemoTbl PRIMARY KEY CLUSTERED (Id ASC) WITH (OPTIMIZE_FOR_SEQUENTIAL_KEY = ON)
);
GO
--When you create the index, e.g.
CREATE INDEX DemoTbl_Id ON DemoTbl (Id) WITH (OPTIMIZE_FOR_SEQUENTIAL_KEY = ON);
GO
--Or if you need to alter an existing index:
ALTER INDEX DemoTbl_Id ON DemoTbl SET(OPTIMIZE_FOR_SEQUENTIAL_KEY = ON);
GO
--Check which indexes enabled this option in the current database:
SELECT * FROM sys.indexes WHERE optimize_for_sequential_key = 1;
GO
Simple enough :)

2021-10-11

Availability Group Listener RegisterAllProvidersIP Setting

Last month I encountered an issue on our production sql server availability group, the AG listener fails to connect after failover. After spending hours to troubleshoot the problem with Microsoft support engineer, we find out the root cause is the cluster resources didn't register all listener IP addresses. There is only one IP address of the primary server, where our sql server is actually configured across multiple subnets. In order to solve it, we executed PowerShell command Set-ClusterParameter RegisterAllProvidersIP 1 which enables the RegisterAllProvidersIP setting on the AG listener. The lesson learned: When we set up new sql server availability group, make sure the parameter (ClusterParameter RegisterAllProvidersIP 1) setting is correct and other settings please refer following link. Reference: Configure availability group listener - SQL Server Always On | Microsoft Docs It's fun to learn something on sql server that I never met before even I'm already a certified sql server DBA with 10+ years hands-on working experience! :D