2025-12-02

Slow disk in Tempdb leads to unexpected failover of the Availability Group

A problem arose for one of my clients, as their production SQL Server availability group kept failing over back and forth. Presented below is the SQL Server error log from when the failover took place.



I recommended that they relocate the tempdb from the overloaded disk.
In the meantime, to minimize the likelihood of unexpected failover, I recommended that they adjust certain cluster settings as outlined below:

1. Set the LeaseTimeout and HealthCheckTimeout values to 60000 in the Availability Group, as depicted below.

2. Raise the heartbeat delay and threshold values, as 1/2 * LeaseTimeout should be lower than SameSubnetThreshold * SameSubnetDelay, by executing the following PowerShell commands:

3. Bring the AG group offline and subsequently online, or perform a switchover, to apply the changes.

No comments:

Post a Comment