2020-01-15

Change Data Capture (CDC) for tracking data change

SQL Server provides two features that track changes to table data: Change Data Capture (CDC) and Change Tracking (CT). While CT is a synchronous mechanism and less overhead, it only captures the event that rows in a table were changed, but does not capture the actual data. On the other hand, CDC relies on SQL Server Agent jobs to asynchronously capture changes for a table, both the fact that changes were made and the actual data. Before SQL Server 2016, CDC was an enterprise edition only feature. Starting from SQL Server 2016, CDC is also available in standard edition. Below picture is obtained from online SQL Docs illustrating the architecture of CDC.
Source tables are the user tables in your database that enabled CDC. When DML statements applied on user tables, the database modifications made are recorded on transaction log. A capture agent job is created when the first table in the database is enabled for CDC, this capture job periodically scans the transaction log and adds information about changes into change tables. SQL developers are provided with built-in functions to enumerate the captured changes.

Let's see a demo on how to use CDC.

1. Enable CDC for the current database.
USE TestDB;
GO

EXEC sys.sp_cdc_enable_db; 

2. Create a new table with primary key.
CREATE TABLE TestTable (ID int, col varchar(50), col2 varchar(50), CONSTRAINT PK_TestTable PRIMARY KEY (ID));

3. Enable CDC for the user table.
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'TestTable', @role_name = NULL;
Two Agent jobs are created with each CDC enabled database: one that is used to capture changes, another is responsible for change table cleanup.
The capture job is started immediately and runs continuously. By default, it captures a maximum of 1000 transactions per cycle with a wait of 5 seconds between cycles. The cleanup job runs once daily at 2AM, and retains change table entries for 4320 minutes. You can check these configurations by executing sys.sp_cdc_help_jobs stored procedure.


4. List the tables which enabled CDC, get the capture_instance name.
EXEC sys.sp_cdc_help_change_data_capture;
By default, the capture instance name is derived from the source schema name plus the source table name in the format schemaname_sourcename, in our example, it's dbo_TestTable.

5. Apply some DML on the table, then uses the fn_cdc_get_all_changes function to get the changes.
DECLARE @begin binary(10), @end binary(10);
SET @begin = sys.fn_cdc_get_min_lsn('dbo_TestTable');
SET @end = sys.fn_cdc_get_max_lsn();
SELECT CASE __$operation WHEN 1 THEN 'delete' WHEN 2 THEN 'insert' WHEN 3 THEN 'PreUpdate' WHEN 4 THEN 'PostUpdate' ELSE 'Unknown' END AS Operation, *
    FROM cdc.fn_cdc_get_all_changes_dbo_TestTable(@begin, @end, N'all update old') ORDER BY __$start_lsn, __$seqval;
GO

As you can see, rolled back DML won't be captured
After inserted one row, the enumerate function returns one row stating an insert for a new row:
When multiple rows were inserted within one transaction, those inserts have the same __$start_lsn:
You can also use fn_cdc_get_net_changes function to get the changes, which only returns one net change row for each source row changed within the specified LSN range.


SQL Server provides quite a few functions and stored procedures that you can use to enumerate and manipulate CDC change tables according to your application requirement. Please refer to the online SQL Docs for all of them.