2026-03-02

Filtered Index can lead to a reduction in query performance

This month, I faced a compelling performance challenge that I am eager to share. My client indicated that one of their older data cleanup tasks is executing very slowly. Upon reviewing the Query Store and the table indexes, I discovered something quite intriguing.

Below, the image depicts the execution plan obtained from the query store.



At its essence, the query is simple; it consists of a DELETE statement with a WHERE clause that filters the rows to be deleted based on an AccountId column, which is already indexed as non-clustered.

Interestingly, the query execution plan is not just about using the nonclustered index on AccountId to retrieve the qualified rows (visible on the far right of the plan graph); it also involves a Key Lookup on the clustered index of the table.

Also, there is a suggestion for a missing index that suggests adding a duplicate index while including the ToBeDeleted column. This seems strange, as the DELETE query does not refer to any elements from the ToBeDeleted column. 

Let’s examine the Key Lookup operator more closely; it claims to be used for retrieving the ToBeDeleted column, which is not included in the nonclustered index at the rightmost top operator.


Indeed, a filtered index is available on that table, specifically filtered by the ToBeDeleted column, as shown below.


The filtered index features a key column called AccountId, filtered based on the ToBeDeleted column, yet it lacks any INCLUDED columns.

The following blog post clarifies the need for an extra Key Lookup in the query and advises incorporating the filtered column into either the key column or the included column.

In my opinion, it is important to assess the actual usefulness of a filtered index before adding it, ensuring that it can significantly enhance your query performance. The filtering value should only be a constant, as sometimes its effectiveness can be overestimated.

2026-02-02

SQL Server terminated unexpectedly because of Memory exhaustion and Oracle Linked Server

This month, I want to discuss an incident involving one of my clients, where their production SQL server was unexpectedly terminated. 


Moments before the SQL Server was terminated unexpectedly, an EXCEPTION_ACCESS_VIOLATION error was raised by a stored procedure call, as shown below:


The Stack Dump generated by SQL Server just before the service was shut down indicates that the exception is related to the Oracle Provider for OLE DB (OraOLEDB), as shown below:


Employing the WinDbg tool to analyze the minidump (.mdmp file) produced by the SQL Server upon crashing also indicates the error caused by OracleOLEDB, as shown below:


With the Oracle Linked Server Provider option set to "Allow inprocess" as depicted in the image below, an exception in the linked server provider may lead to a crash of the SQL Server.
(Ref.: Create Linked Servers - SQL Server | Microsoft Learn
SQL Server service crashes when you run an Oracle linked server query - SQL Server | Microsoft Learn)


To address this issue, I lowered the SQL Server maximum memory limit from the original 95% to 85%. The server has a total memory of 1.25TB, and the system administrator at my client's company believed that reserving 5% (64GB) for the Windows OS was sufficient for a server dedicated to SQL Server. However, he failed to consider that the Oracle OLEDB Linked Server Provider also consumes memory; as more concurrent SQL sessions invoke the Oracle Linked Server query, memory usage increases. After I adjusted the SQL Server maximum memory, allowing more memory for the Oracle OLEDB provider, the problem was resolved.

2026-01-05

TempDB LOGBUFFER and IO_COMPLETION Waits

We observed numerous SQL sessions that were waiting for LOGBUFFER and IO_COMPLETION; all of these sessions were executing an INSERT INTO #Temp table, as illustrated below.


Upon examining the count of Virtual Log Files (VLFs) in the tempdb .ldf file, we discovered that the tempdb transaction log was significantly fragmented.


Subsequently, we executed defragmentation of the VLFs within the TempDB transaction log file, using the SQL commands provided below:
use tempdb
DBCC SHRINKFILE ( 'templog', 0, TRUNCATEONLY)
ALTER DATABASE [tempdb] MODIFY FILE ( NAME = 'templog', SIZE = 150000MB )

The VLFs in the tempdb log file have been decreased to 32, as shown below:

We also observed that the DB Full Backup job was planned to run at that time. Thus, we rescheduled the backup job to operate during off-peak hours.

We found that the maximum memory allocation for SQL Server was set to 75% of the overall server memory, which we think was a result of the SQL Server installer GUI's suggestions.
Next, we increased the SQL max memory limit to 1216GB, representing 95% of the total 1.25TB, on the server, which is chiefly assigned to SQL server services. It should be adequate to reserve 64GB of RAM for the Windows operating system and other various tasks.

Finally, we relocated the tempdb transaction log file to a separate disk that offers improved write speed.

2025-12-02

Slow disk in Tempdb leads to unexpected failover of the Availability Group

A problem arose for one of my clients, as their production SQL Server availability group kept failing over back and forth. Presented below is the SQL Server error log from when the failover took place.



I recommended that they relocate the tempdb from the overloaded disk.
In the meantime, to minimize the likelihood of unexpected failover, I recommended that they adjust certain cluster settings as outlined below:

1. Set the LeaseTimeout and HealthCheckTimeout values to 60000 in the Availability Group, as depicted below.

2. Raise the heartbeat delay and threshold values, as 1/2 * LeaseTimeout should be lower than SameSubnetThreshold * SameSubnetDelay, by executing the following PowerShell commands:

3. Bring the AG group offline and subsequently online, or perform a switchover, to apply the changes.

2025-11-02

Gather the Usage and Size of all Indexes across every database

EXEC sp_MSforeachdb 'USE [?];
SELECT DB_NAME() AS DB,
    OBJECT_NAME(i.[object_id]) AS [TableName],
    i.name AS [IndexName],
    i.index_id,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates,
    i.type_desc AS [IndexType],
    s.last_user_seek,
    s.last_user_scan,
    s.last_user_lookup,
    s.last_user_update,
    SUM(ps.used_page_count) * 8 / 1024 AS [Used Space (MB)],
    SUM(ps.reserved_page_count) * 8 / 1024 AS [Reserved Space (MB)]
FROM 
    sys.dm_db_index_usage_stats AS s
    INNER JOIN sys.indexes AS i
        ON s.[object_id] = i.[object_id] AND s.index_id = i.index_id
    INNER JOIN  sys.dm_db_partition_stats AS ps
        ON ps.object_id = i.object_id AND ps.index_id = i.index_id
WHERE 
    OBJECTPROPERTY(s.[object_id], ''IsUserTable'') = 1
    AND s.database_id = DB_ID()
GROUP BY 
    OBJECT_NAME(i.[object_id]),
    i.name,
    i.index_id,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates,
    i.type_desc,
    s.last_user_seek,
    s.last_user_scan,
    s.last_user_lookup,
    s.last_user_update
ORDER BY 
    [TableName], [IndexName];
';
 

2025-10-03

Assessing the Latency of Availability Group Database Synchronization

The SQL script provided below can be utilized to assess the real-time latency in seconds of SQL Server AlwaysOn synchronization.

;WITH

AG_Stats AS

(

SELECT

AR.replica_server_name,

AG.name AS AGName,

HARS.role_desc,

DB_NAME(DRS.database_id) AS DBName,

DRS.last_commit_time

FROM sys.dm_hadr_database_replica_states DRS

INNER JOIN sys.availability_replicas AR ON DRS.replica_id = AR.replica_id

INNER JOIN sys.dm_hadr_availability_replica_states HARS

ON AR.group_id = HARS.group_id AND AR.replica_id = HARS.replica_id

INNER JOIN sys.availability_groups AG ON AG.group_id = AR.group_id

),

Pri_CommitTime AS

(

SELECT

replica_server_name,

AGName,

DBName,

last_commit_time

FROM AG_Stats

WHERE role_desc = 'PRIMARY'

),

Sec_CommitTime AS

(

SELECT

replica_server_name,

AGName,

DBName,

last_commit_time

FROM AG_Stats

WHERE role_desc = 'SECONDARY'

)

SELECT

p.replica_server_name AS PrimaryReplica,

p.AGName,

p.DBName AS DatabaseName,

s.replica_server_name AS SecondaryReplica,

DATEDIFF(SECOND, s.last_commit_time, p.last_commit_time) AS Sync_Latency_Secs

FROM Pri_CommitTime p

LEFT JOIN Sec_CommitTime s

ON s.DBName = p.DBName AND s.AGName = p.AGName;

Example output as shown below:



2025-09-14

Snapshot Replication Agent Profile's Parameter BcpBatchSize

This week, a customer of mine shared the image below, mentioning that their Snapshot Replication is experiencing bad performance.

The customer informed me that the snapshot replication was operating correctly. I then inquired whether the tables being replicated had increased in size, to which they responded affirmatively. Consequently, I recommended that they utilize a smaller BcpBatchSize value in the Snapshot Agent Profile. I recommended that they reduce the BcpBatchSize setting from its default value of 100000 to 50000 rows. This can be done by creating a new agent profile derived from the default profile, adjusting the BcpBatchSize value to 50000, and subsequently assigning this new profile to the Snapshot Publication.


Lowering it can reduce memory usage and improve stability for large datasets, though it may slow down snapshot generation.