2015-04-10

Analyzing Deadlock using built-in "system _health" Extended Event



Background:
By default, SQL Server automatically starts an Extended Events Session called “system_health” when it start-up. This session collects information includes any deadlocks that are detected. (ref. http://blogs.msdn.com/b/psssql/archive/2008/07/15/supporting-sql-server-2008-the-system-health-session.aspx
Without any extra monitoring and loading, this default enabled “system_health” X-Events Session gives us a way to check the deadlock information, including the locked objects (table/index) and SQL statements/stored procedure calls involved in the deadlock.
This “system_health” events session logs detected events into memory, ring buffer size 4MB, when the buffer fills up it will overwrite the oldest entries.

How to do:
  1. Run the following SELECT statement in production server:
    (ref.
    http://blogs.msdn.com/b/sqlserverfaq/archive/2013/04/27/an-in-depth-look-at-sql-server-memory-part-2.aspx)Use Master
SELECT
       xed.value('@timestamp', 'datetime') as Creation_Date,
       xed.query('.') AS Extend_Event
FROM
(
       SELECT CAST([target_data] AS XML) AS Target_Data
       FROM sys.dm_xe_session_targets AS xt
       INNER JOIN sys.dm_xe_sessions AS xs
       ON xs.address = xt.event_session_address
       WHERE xs.name = N'system_health'
       AND xt.target_name = N'ring_buffer'
) AS XML_Data
CROSS APPLY Target_Data.nodes('RingBufferTarget/event[@name="xml_deadlock_report"]') AS XEventData(xed)
ORDER BY Creation_Date DESC

  1. The result is ordered by the time of the deadlock (latest on top) like following:


  2. Click on the xml inside “Extend_Event” in the result, it will show you the deadlock xml report (the xml tags < and > were transformed to &lt; and &gt;, you can copy the XML into a notepad, and replace them for better view). E.g.
<event name="xml_deadlock_report" package="sqlserver" id="123" version="1" timestamp="2015-04-10T04:52:33.351Z">
<data name="xml_report">
<type name="unicode_string" package="package0" />
<value><deadlock-list>
<victim-list>
<victimProcess id="process586d708"/>
<process-list>
<process id="process586d708" taskpriority="0" logused="432" waitresource="RID: 5:1:144:0" waittime="1625" ownerId="90452" transactionname="user_transaction" lasttranstarted="2015-04-10T12:52:21.713" XDES="0x85a15950" lockMode="U" schedulerid="4" kpid="3624" status="suspended" spid="57" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2015-04-10T12:52:21.710" lastbatchcompleted="2015-04-10T12:51:59.060" clientapp="Microsoft SQL Server Management Studio - Query" hostname="WIN-4VPNBDSGB8D" hostpid="2636" loginname="WIN-4VPNBDSGB8D\Administrator" isolationlevel="read committed (2)" xactid="90452" currentdb="5" lockTimeout="4294967295" clientoption1="673319008" clientoption2="390200">
<executionStack>
<frame procname="" line="14" stmtstart="358" stmtend="428" sqlhandle="0x03000500045a3d02c4b2b60076a400000100000000000000">
</frame>
<frame procname="" line="1" sqlhandle="0x0100050001cf4e1e70280380000000000000000000000000">
</frame>
</executionStack>
<inputbuf>
EXEC uspDeadlockTest_2 2002 </inputbuf>
</process>
<process id="process5852988" taskpriority="0" logused="432" waitresource="RID: 5:1:146:0" waittime="2718" ownerId="90449" transactionname="user_transaction" lasttranstarted="2015-04-10T12:52:20.613" XDES="0x856d7950" lockMode="U" schedulerid="3" kpid="2868" status="suspended" spid="55" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2015-04-10T12:52:20.610" lastbatchcompleted="2015-04-10T12:51:33.770" clientapp="Microsoft SQL Server Management Studio - Query" hostname="WIN-4VPNBDSGB8D" hostpid="2636" loginname="WIN-4VPNBDSGB8D\Administrator" isolationlevel="read committed (2)" xactid="90449" currentdb="5" lockTimeout="4294967295" clientoption1="673319008" clientoption2="390200">
<executionStack>
<frame procname="" line="13" stmtstart="354" stmtend="424" sqlhandle="0x0300050059ed607f6ddab50076a400000100000000000000">
</frame>
<frame procname="" line="1" sqlhandle="0x010005008f2a720ac034ac82000000000000000000000000">
</frame>
</executionStack>
<inputbuf>
EXEC uspDeadlockTest_1 101 </inputbuf>
</process>
</process-list>
<resource-list>
<ridlock fileid="1" pageid="144" dbid="5" objectname="" id="lock82cf8680" mode="X" associatedObjectId="72057594038779904">
<owner-list>
<owner id="process5852988" mode="X"/>
</owner-list>
<waiter-list>
<waiter id="process586d708" mode="U" requestType="wait"/>
</waiter-list>
</ridlock>
<ridlock fileid="1" pageid="146" dbid="5" objectname="" id="lock8013dd80" mode="X" associatedObjectId="72057594038845440">
<owner-list>
<owner id="process586d708" mode="X"/>
</owner-list>
<waiter-list>
<waiter id="process5852988" mode="U" requestType="wait"/>
</waiter-list>
</ridlock>
</resource-list>
</deadlock>
</deadlock-list>
</value>
<text />
</data>
</event>

  1. As you can see above, the affected sql statements or stored procedure calls are in the <inputbuf> element. The deadlock victim (process being killed and rolled-back by SQL Server engine) is identified by the process ID in <victimProcess>. The objects (table/index) involved are represented by Resource-IDs in different formats, including TAB (table), PAGE, KEY (index), and RID (row).
    (ref.: https://support.microsoft.com/en-us/kb/224453 – section “Waitresource”)

  1. In order to resolve the waitresource ID into the table/index name, you can run the follow SQL statements.
    (these SQL can also be run in a Non-production sql server with a production database backup image restored on it, better the latest backup, as the object or data-page may not exists if the backup was too old).

The 1st number in all different waitresource ID formats is the db_id, you can check the database name by this SQL:
SELECT DB_NAME(db_id) /* this DB_NAME statement can only be run on production server */

Then you can run the follow SQL for different waitresource formats, by setting the current database as the affected database:
USE <DBName>
GO

For Object ID (OBJECT:db_id:object_id) / Table ID (TAB:db_id:object_id) :
SELECT OBJECT_NAME(object_id);

For HOBT ID (KEY:db_id:hobt_id) :
SELECT o.name AS TableName, i.name AS IndexName, SCHEMA_NAME(o.schema_id) AS SchemaName FROM sys.partitions p JOIN sys.objects o ON p.OBJECT_ID = o.OBJECT_ID JOIN sys.indexes i ON p.OBJECT_ID = i.OBJECT_ID AND p.index_id = i.index_id WHERE p.hobt_id = hobt_id

For Page ID (PAGE:db_id:file_id:page_id) / Row ID (RID:db_id:file_id:page_id:slot) :
DBCC TRACEON(3604)
GO
DBCC PAGE('DBName', file_id, page_id)
The object id will be shown in the result, Metadata: Object_Id = ???
The index id will be shown in the result, Metadata: IndexId = ???
Then you can get the table name and index name by OBJECT_NAME function and sys.indexes DMV.
SELECT OBJECT_NAME(Object_Id)
SELECT name FROM sys.indexes WHERE object_id = object_Id AND index_id = IndexId

No comments:

Post a Comment