2013-10-03

Why Database Integrity Checking (CHECKDB) is needed

DBCC CHECKDB is a resource expensive process, which can cause performance problem when run concurrently with business workload. However, it's a must to run it periodically, because it has the non-replaceable ability to check the logical and physical integrity of all the objects in your database. DBCC CHECKDB essentially includes DBCC CHECKALLOC, DBCC CHECKCATALOG, and DBCC CHECKTABLE on every table and view in the database.
Someone suggested stop to run DBCC CHECKDB, and instead rely only on the WITH CHECKSUM option of database full backup. By the way, CHECKSUM can only detect page corruption within the I/O subsystem (disk), but not any corruption within the memory (RAM). CHECKSUM is the default page protection, which means that when a data page is flushed to disk (usually by a periodic CHECKPOINT), the last to do is a checksum is calculated for the 8KB data page contents and written into the page's header. When, subsequently, SQL Server reads a data page into memory from disk, the first to do is that it recalculates the page checksum and checks it against the value stored in the page header. If the checksums do not match, SQL Server knows that something in the I/O subsystem. SQL Server will also verify the page checksums during database backup, but only when the WITH CHECKSUM specified on the BACKUP statement.
So why isn't using WITH CHECKSUM with BACKUP a substitute for consistency checking using DBCC CHECKDB, as both operations will check page checksums? Consider the following scenario: SQL Server reads a data page into memory and then modifies it. Unfortunately, after the page is modified, but before the next checkpoint operation, a faulty memory chip causes a corruption in the 8KB block of memory that is holding the modified data page. When the checkpoint occurs, it calculates a page checksum over the data file page contents, including the portion corrupted by the faulty memory chip, writes the page checksum into the page header, and the page is written out to disk. Later, a query causes SQL Server to read this data page from disk and so it validates the page checksum and, assuming nothing went wrong at the I/O-subsystem, the checksum values will match, and it will not detect any problem. The page is corrupt in memory, but the page checksum algorithm cannot detect it. Similarly, a backup with checksum will not detect in-memory corruption; the backup operation simply reads the data pages, calculates the checksum, compares the value to what it was when the page was last written to disk, finds the values match, and writes the in-memory corrupted page into the backup file.
DBCC CHECKDB interprets the page contents (for example, validating row structures, validating column values, checking linkages in indexes), and so should always be able to detect a corrupt page, even if the corruption happened while in memory. We need to run regular consistency checks by DBCC CHECKDB, but not necessarily on the production server. A good alternative is to restore database backup on another server and then run CHECKDB on it. Or if your database is mirrored, you can run CHECKDB on a database snapshot that was created from the mirror. With respect to corruption, one great feature in SQL Server 2008 is the ability to automatically repair corrupt pages during database mirroring, you can see the article "Automatic Page Repair During a Database Mirroring Session" for more information.
To summarize, DBCC CHECKDB consistency check should always be run regularly, using whatever method allows you to run them that does not impact, or at least minimizes the impact on, your production workload.

No comments:

Post a Comment