HK SQL Server Doctor: Wait Statistics

Every time a request is forced to wait, SQL Server records the length of the wait, and the cause of the wait, a.k.a. the wait type, which generally indicates the resource on which the request was waiting. These are the wait statistics, and SQL Server exposes them primarily through two Dynamic Management Views:
• sys.dm_os_wait_stats (or sys.dm_db_wait_stats on Windows Azure SQL Database) – aggregated wait statistics for all wait types.
• sys.dm_os_waiting_tasks – wait statistics for currently-executing requests that are experiencing resource waits.

A thread is using the CPU (called RUNNING) until it needs to wait for a resource. It then moves to an unordered list of threads that are SUSPENDED. In the meantime, the next thread on the FIFO (first-in-first-out) queue of threads waiting for the CPU (called being RUNNABLE) is given the CPU and becomes RUNNING. If a thread on the SUSPENDED list is notified that it’s resource is available, it becomes RUNNABLE and is put on the bottom of the RUNNABLE queue. Threads continue this clockwise movement from RUNNING to SUSPENDED to RUNNABLE to RUNNING again until the task is completed.

Below illustration depicts the SQLOS architecture:

The sys.dm_exec_requests DMV [status] column reveals the status of each current request:
• RUNNING – on the CPU.
• SUSPENDED – whenever a thread requires a resource, such as a page that is not in memory, the thread yields the CPU and moves onto an unordered waiter list, with a thread status of SUSPENDED, until the required resource is available.
• RUNNABLE – if a thread is not waiting on any resource, but is not currently on the CPU, for example because the scheduler to which it is assigned currently has another session’s thread running, it will be placed in a first-in-first-out (FIFO) queue called the runnable queue and the thread status will be RUNNABLE.

You can check the waiting tasks currently active or blocked using below sql script:

SELECT blocking.session_id AS blocking_session_id ,
blocked.session_id AS blocked_session_id ,
waitstats.wait_type AS blocking_resource ,
waitstats.wait_duration_ms ,
waitstats.resource_description ,
blocked_cache.text AS blocked_text ,
blocking_cache.text AS blocking_text
FROM sys.dm_exec_connections AS blocking
INNER JOIN sys.dm_exec_requests blocked
ON blocking.session_id = blocked.blocking_session_id
CROSS APPLY sys.dm_exec_sql_text(blocked.sql_handle)
blocked_cache
CROSS APPLY sys.dm_exec_sql_text(blocking.most_recent_sql_handle)
blocking_cache
INNER JOIN sys.dm_os_waiting_tasks waitstats
ON waitstats.session_id = blocked.session_id

The result will be something like:

Then you can check the waiting resources as mentioned.

You can also also perform historical wait statistics analysis, using the data provided in sys.dm_os_wait_stats. The wait times provided by this DMV are running totals, accumulated across all sessions since the server was last restarted or the statistics were manually reset by following command:
DBCC SQLPERF (‘sys.dm_os_wait_stats’, CLEAR)
If the sql server instance has been running for quite a while and then is subjected to a significant change, such as adding an important new index, it’s worth clearing the old wait stats in order to prevent the old cumulative wait stats masking the effect of the change.

SQL Server keeps track of the wait time as total wait time. This time is inclusive of signal wait time.
You need to work out the time spent waiting on the SUSPENDED list (resource wait time) as below:
Resource Wait Time = Total Wait Time - Signal Wait Time

You can check the CPU pressure by following statements:
SELECT SUM(signal_wait_time_ms) AS TotalSignalWaitTime ,
SUM(wait_time_ms) AS TotalWaitTime ,
( SUM(CAST(signal_wait_time_ms AS NUMERIC(20, 2)))
/ SUM(CAST(wait_time_ms AS NUMERIC(20, 2))) * 100 )
AS PercentageSignalWaitsOfTotalTime
FROM sys.dm_os_wait_stats

You can check the primary resource waits by using the script below provided by Paul Randal:
WITH [Waits] AS
(SELECT
[wait_type],
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
FROM sys.dm_os_wait_stats
WHERE [wait_type] NOT IN (
N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
N'CHKPT', N'CLR_AUTO_EVENT',
N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
N'EXECSYNC', N'FSAGENT',
N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
N'LOGMGR_QUEUE', N'ONDEMAND_TASK_QUEUE',
N'PWAIT_ALL_COMPONENTS_INITIALIZED',
N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT')
AND [waiting_tasks_count] > 0
)
SELECT
MAX ([W1].[wait_type]) AS [WaitType],
CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],
CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
MAX ([W1].[WaitCount]) AS [WaitCount],
CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],
CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX ([W1].[Percentage]) < 95; -- percentage threshold

GO
The result of this script is ordered by the percentage of waits in descending order. The waits to be concerned about are those at the top of the list. Many benign waits are being filtered out.

網頁

Wait Statistics

No comments:

Post a Comment