Transactional Replication Latency

Here're the steps to troubleshoot transactional replication latency:

1. Launch Replication Monitor, insert a tracer token, check the latency is on publisher to distributor (i.e. log reader agent), or distributor to subscriber (i.e. distribution agent).

2. If the bottleneck is the log reader agent (publisher to distributor), check it's running. View its detail history log, if there are many "The Log Reader Agent is scanning the transaction log for commands to be replicated" messages, then mostly the database is either re-indexing or importing bulk data, which is the problem cause.
Suggestions: Schedule the index rebuild task on maintenance window. Split bulk data import into multiple of batches.
Also, if there are many virtual log files (VLFs) inside the transaction log of the published DB, the log reader agent latency will be high. You can use DBCC LOGINFO to check how many VLFs are there, and follow Kimberly's blog (step 8) to reduce the number of VLFs.

3. If the bottleneck is the distribution agent (distributor to subscriber), check it's running.

4. Run "select * From MSdistribution_history" at distribution database, the bottleneck is at the reader thread if there are records with comments start with <stats state="2"

5. Run below commands at publisher/distribution database, find any publications of the database that "immediate_sync" option was enabled. It does impact the overall performance, especially the reader thread of distribution agent.
USE <publishedDB>
GO
EXEC sp_helppublication '<publication>'
GO
USE distribution
GO
select distinct p.publication from MSsubscriptions s inner join MSpublications p
on s.publication_id=p.publication_id
inner join MSarticles a on s.article_id=a.article_id where subscriber_id < 0
Go
select distinct p.publication,p.publisher_db,ser.name as publisherServer from MSsubscriptions s inner join MSpublications p
on s.publication_id=p.publication_id
inner join sys.servers ser on p.publisher_id =ser.server_id
inner join MSarticles a on s.article_id=a.article_id where subscriber_id < 0
go
select *From MSdistribution_agents where subscriber_id < 0

6.Disable the immediate_sync
exec sp_changepublication @publication = 'publicationname' ,@property = 'allow_anonymous' , @value = false
exec sp_changepublication @publication = 'publicationname' , @property = 'immediate_sync' , @value = false

7. Run below command at subscription database several times, you may find the transaction_timestamp never changed, e.g. 0x001323490003A69C0088000000000000
select transaction_timestamp, * From MSreplication_subscriptions

8. Run "select * from MSsubscription_agents" (in distribution database for push subscription; in subscription database for pull subscription), to get the session id of distribution agent, e.g. 74.

9. Run "select *From sys.sysprocesses where spid=74" several times. If the CPU and physical_io kept increasing. it indicates that the distribution agent is updating the subscription.

10. Base on the result of step 7 and step 9, it's suspected that the latency may caused by big transaction.

11. Run command at distribution database
select xact_seqno,count(*) From MSrepl_commands
where xact_seqno > 0x001323490003A69C0088000000000000 -- transaction_timestamp obtained in step 7
group by xact_seqno
order by count(*) desc

12. If there are any transactions with many commands, let's say more than 100000 commands, was delivering into the subscription. Then you find the root cause.
Suggestion: Consider to use "Publishing Store Procedure feature".

No comments:

Post a Comment