2016-04-14

Notes of SQL Server AlwaysOn Availability Group

The AlwaysOn Availability Groups (AG) feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, an availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of read-write primary databases and one to eight sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.

AG must be running on the Enterprise Edition of SQL Server 2012.

You can create any type of backup of a primary database. Alternatively, you can create log backups and copy-only full backups of secondary databases.

A given availability group can support up to three synchronous-commit availability replicas, including the current primary replica (i.e. 1 primary + 2 synchronous mode secondaries).

A WSFC Resource Group is created for every Availability Group that you create. The WSFC cluster monitors this resource group to evaluate the health of the primary replica.
The quorum for AG is based on all nodes in the WSFC cluster regardless of whether a given cluster node hosts any availability replicas.
In contrast to database mirroring, there is no witness role in AlwaysOn Availability Groups.

A Failover Cluster Instance (FCI) may be used together with an AG to enhance the availability of an availability replica. However, to prevent potential race conditions in the WSFC cluster, automatic failover of the availability group is NOT supported to or from an availability replica that is hosted on a FCI. SQL Server Failover Cluster Instances (FCIs) do NOT support automatic failover by availability groups, so any availability replica that is hosted by an FCI can only be configured for manual failover.

Do NOT use the Failover Cluster Manager to manipulate availability groups, for example:
Do NOT change any availability group properties, such as the possible owners.
Do NOT use the Failover Cluster Manager to fail over availability groups. You must use Transact-SQL or SQL Server Management Studio.

AG Failovers are NOT caused by database issues such as a database becoming suspect or corruption of a transaction log.
The primary replica sends transaction log records of each primary database to every secondary database. Every secondary replica caches the transaction log records and then applies them to its corresponding secondary database (hardens the log).
Data synchronization occurs between the primary database and each connected secondary database, independently of the other databases in an availability group. Therefore, a secondary database can be suspended or fail without affecting other secondary databases, and a primary database can be suspended or fail without affecting other primary databases.

Availability Modes:
- Asynchronous-commit mode, the primary replica commits transactions WITHOUT waiting for acknowledgement that an asynchronous-commit secondary replica has hardened the log.
- Synchronous-commit mode, before committing transactions, the primary replica waits for a synchronous-commit secondary replica to acknowledge that it has finished hardening the log.

Failover:
Synchronous-commit mode supports planned manual failover and automatic failover, if the target secondary replica is currently synchronized. The support for these forms of failover depends on the setting of the failover mode property on the failover partners. If failover mode is set to "manual" on either the primary or secondary replica, only manual failover is supported for that secondary replica. If failover mode is set to "automatic" on both the primary and secondary replicas, both automatic and manual failover are supported on that secondary replica. Automatic failover requires that both the primary replica and the target secondary replica are running under synchronous-commit mode with the failover mode set to "Automatic". In addition, the secondary replica must already be synchronized, have WSFC quorum, and meet the conditions specified by the flexible failover policy of the availability group. In Asynchronous-commit mode, the only form of failover is forced manual failover (with possible data loss), typically called forced failover.

An availability group listener is associated with a unique DNS name that serves as a virtual network name (VNN), one or more virtual IP addresses (VIPs), and a TCP port number.

The session-timeout period is an availability-replica property that determines how long connection with another availability replica can remain inactive before the AG connection is closed. The primary and secondary replicas ping each other to signal that they are still active. If no ping is received from the other replica within the session-timeout period, the replica times out. Its connection is closed, and the timed-out replica enters the DISCONNECTED state. Even if a disconnected replica is configured for synchronous-commit mode, transactions will not wait for that replica to resynchronize. The default session-timeout period for each availability replica is 10 seconds. Setting the value to less than 10 seconds creates the possibility of a heavily loaded system declaring a false failure. In the resolving role (during failover), the session-timeout period does not apply because pinging does not occur.

Automatic Page Repair:
Each availability replica tries to automatically recover from corrupted pages on a local database. If a secondary replica cannot read a page, the replica requests a fresh copy of the page from the primary replica. If primary replica cannot read a page, it broadcasts a request for a fresh copy to all the secondary replicas and gets the page from the first to respond.

AG Requirements:
1. The sql server instances that host availability replicas for a given AG must reside on separate nodes of a single WSFC cluster. A given instance can host only one availability replica per availability group. However, each instance can be used for many availability groups.

2. To enable Kerberos authentication:
- all sql server instances that host an availability replica for the AG must use the same SQL Server service account;
- domain administrator needs to manually register a Service Principal Name (SPN) with Active Directory on the SQL Server service account for the virtual network name (VNN) of the availability group listener.

3. All sql server instances that host availability replicas for an AG must use the same SQL Server Collation.

4. Enable the AlwaysOn Availability Groups feature on each sql server instance.

5. Each server instance requires a database mirroring endpoint. If a server instance that you select to host an availability replica is running under a domain user account and does not yet have a mirroring endpoint, the New Availability Group Wizard (or Add Replica to Availability Group Wizard) can create the endpoint and grant CONNECT permission to the server instance service account.

6. The user databases must be in the full recovery mode.

7. Possess at least one full database backup, required to initiate the full-recovery log chain.

8. The databases must NOT be configured for database mirroring.

Availability Mode:
If the primary replica is configured for asynchronous-commit mode, it does not wait for any secondary replica to write incoming transaction log records to disk (to harden the log).
If a given secondary replica is configured for asynchronous-commit mode, the primary replica does not wait for that secondary replica to harden the log.
If both the primary replica and a given secondary replica are both configured for synchronous-commit mode, the primary replica waits for the secondary replica to confirm that it has hardened the log (unless the secondary replica fails to ping the primary replica within the primary's session-timeout period). If primary's session-timeout period is exceeded by a secondary replica, the primary replica temporarily shifts into asynchronous-commit mode for that secondary replica. When the secondary replica reconnects with the primary replica, they resume synchronous-commit mode.
If a synchronous-commit secondary replica times out without confirming that it has hardened the log, the primary marks that secondary replica as failed. The connected state of the secondary replica changes to DISCONNECTED, and the primary replica stops waiting for confirmation from the secondary replica. This behavior ensures that a failed synchronous-commit secondary replica does not prevent hardening of the transaction log on the primary replica.

Asynchronous-Commit Mode with Only Forced Failover:
The only form of failover supported by asynchronous-commit mode is forced failover (with possible data loss). The failover target transitions to the primary role, and its copies of the databases become the primary databases. Any remaining secondary databases, along with the former primary databases, once they become available, are suspended until you manually resume them individually. Any transaction logs that the original primary replica had not yet sent to the former secondary replica are lost. This means that the new primary databases might be lacking recently committed transactions (data loss).

Synchronous-Commit Mode with Only Manual Failover:
When these replicas are connected and the database is synchronized, manual failover is supported. If the secondary replica goes down, the primary replica is unaffected. The primary replica runs exposed if NO any SYNCHRONIZED replicas exist (that is, without sending data to any secondary replica).

Synchronous-Commit Mode with Automatic Failover:
To configure an availability group for automatic failover, you need to set both the current primary replica and one secondary replica to synchronous-commit mode with automatic failover. Furthermore, for an automatic failover to be possible at a given time, this secondary replica must be synchronized with the primary replica (that is, the secondary databases are all synchronized), and the Windows Server Failover Clustering (WSFC) cluster must have quorum. If the primary replica becomes unavailable under these conditions, automatic failover occurs. The secondary replica switches to the role of primary, and it offers its database as the primary database.

No comments:

Post a Comment