2019-02-07

Indexed View Overhead on DML - Part 1 (Non-aggregrated view)

Starting from SQL Server 2000, Indexed View is a powerful tool to speed up complicated SELECT queries. In this series, I would like to dive into the indexed view maintenance overhead on DML statements. In this blog post, I will first focus on indexed view with no any aggregation, i.e. no any group by clause.
Like indexes, every time we modify data in the underlying tables then not only must SQL Server maintain the index entries on those tables, but also the index entries on the indexed view. This can affect write performance. SQL Server maintains the indexed view by embedding extra operations into the execution plan of the DML statement.
Let's say we have two tables, the Account table stores account id, player name, and the current balance; and the PlayerBetLog table stores the players' betting records.

USE [TESTDB]
GO
CREATE TABLE [dbo].[Account] (
 [accid] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
 [name] [varchar](50) NOT NULL,
 [balance] [int] NOT NULL DEFAULT(0)

)
GO
CREATE TABLE [dbo].[PlayerBetLog] (
 [dividendId] [bigint] IDENTITY(1,1) NOT NULL PRIMARY KEY,
 [amount] [decimal](18, 2) NOT NULL,
 [playerId] [int] NOT NULL,
 [gameCat] [int] NOT NULL
);



The balance column in the Account table will be frequently updated, but new records will be seldomly added, and existing records never be deleted. On the other hand, new records will be frequently inserted into the PlayerBetLog table, but never change.
Let's insert some dummy data into the two tables:


Before adding an indexed view, let's run a typical DML workload, to see the execution plan, statistics I/O, and locked resources:


SET NOCOUNT ON;
SET ARITHABORT OFF;
SET STATISTICS IO ON;
BEGIN TRAN
INSERT PlayerBetLog (playerId, gameCat, amount) SELECT 1, 2, 10 UNION ALL SELECT 1, 2, 20 UNION ALL SELECT 2, 2, 100;
UPDATE Account SET balance += 10 WHERE accid = 1;
SET STATISTICS IO OFF;
-- check locked resources
SELECT
 t1.request_session_id AS spid
 ,t1.resource_type
 ,DB_NAME(t1.resource_database_id) AS DB
 ,(CASE resource_type
  WHEN 'DATABASE' THEN ''
  WHEN 'FILE' THEN ''
  WHEN 'OBJECT' THEN OBJECT_NAME(t1.resource_associated_entity_id, t1.resource_database_id)
  WHEN 'EXTENT' THEN ''
  WHEN 'APPLICATION' THEN ''
  WHEN 'METADATA' THEN ''
  WHEN 'ALLOCATION_UNIT' THEN (
   SELECT TOP 1 OBJECT_NAME(p.[object_id])
   FROM sys.partitions AS p JOIN sys.allocation_units AS a
   ON a.container_id = (CASE a.[type] WHEN 0 THEN NULL WHEN 2 THEN p.partition_id ELSE p.hobt_id END)
   WHERE a.allocation_unit_id = t1.resource_associated_entity_id
  )
  ELSE (SELECT TOP 1 OBJECT_NAME([object_id]) FROM sys.partitions WHERE hobt_id = t1.resource_associated_entity_id)
  END) AS [object_name]
 ,i.name AS index_name
 ,t1.request_mode
 ,t1.request_status
FROM sys.dm_tran_locks AS t1 LEFT JOIN sys.dm_os_waiting_tasks AS t2
 ON t1.lock_owner_address = t2.resource_address
LEFT JOIN sys.partitions AS s ON s.hobt_id = t1.resource_associated_entity_id
LEFT JOIN sys.indexes AS i on i.[object_id] = s.[object_id] and i.index_id = s.index_id
LEFT JOIN (SELECT t.[text], p.*
FROM sys.sysprocesses AS p
CROSS APPLY sys.dm_exec_sql_text(p.[sql_handle]) AS t
) AS t3 ON t1.request_session_id = t3.spid
WHERE t3.spid = @@spid
ORDER BY index_name, resource_type

ROLLBACK;




Below is the execution plan:
As you can see in the above picture, the main operator on each DML statement only the Clustered Index Insert / Update operator.


Let's also check the statistics I/O by using http://statisticsparser.com/
Table 'PlayerBetLog'. Scan count 0, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Account'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Below are the locked resources:
 

Now let's add an indexed view, which joins the Account table and the PlayerBetLog table, but excluding the frequently updated balance column:
USE [TESTDB]
GO
-- View
CREATE VIEW [dbo].[VwTranReport] WITH SCHEMABINDING AS
 SELECT A.accid, A.name, B.dividendId, B.amount, B.gameCat FROM dbo.Account A JOIN dbo.PlayerBetLog B ON A.accid = B.playerId;
GO
-- Unique Clustered Index on View
CREATE UNIQUE CLUSTERED INDEX [VUX_VwTranReport] ON [dbo].[VwTranReport] ([dividendId]);
GO
-- another Nonclustered Index on View
CREATE NONCLUSTERED INDEX [NCI_VwTranReport_accid] ON [dbo].[VwTranReport] ([accid]);
GO






Let's run again the DML workload, and compare the new execution plan, statistics I/O, and locked resources:





By comparing to the old plan on the left hand side, the new plan on the right hand side added an extra operation branch for the INSERT PlayerBetLog statement, which includes an index seek to get the Account name column value, and Index Insert to maintain the indexes on the indexed view.
On the other hand, as the indexed view excludes the balance column, the UPDATE statement on this column didn't incur any extra operation.


Below is the statistics I/O for the DML statements after indexed view added:
Table 'VwTranReport'. Scan count 0, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Account'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PlayerBetLog'. Scan count 0, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Account'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Extra logical read are placed on the Account table in order to get the name column for populating the indexed view, and extra I/O on the indexed view and Worktable in order to maintain the indexed view.


Below picture compares the locked resources between the old plan (left) and the new plan (right), extra locks are being placed on the indexed view in the new plan:


Final thought:
This blog post only investigates the overhead of Non-aggregated indexed view, for aggregated one, much more complicated operations will be added into the DML execution plan. If you find adding an indexed view can really boost up a reporting query, you must do load test first to see whether your production database can afford the extra indexed view maintenance cost on the DML workload.






No comments:

Post a Comment