2015-03-03

How to reduce the database size - Part 2

In the first part of this series, I explained how to find out the most space consuming objects and how to reduce the wasted space by defragmenting indexes with low page density. Here I explain how to implement data compression in SQL Server to further reduce your data size.

DATA COMPRESSION

If you have Enterprise Edition of SQL Server, you can reduce the data size by implementing Data Compression. Data compression reduces the storage costs and increases query performance by reducing I/O and increasing buffer-hit rates. There are two types of Data Compression supported in SQL Server for both tables and indexes – ROW and Page.

ROW compression addresses the storage inefficiency introduced by fixed-length data types. By default, in non-compressed row, size of the fixed-length data is based on the data type size. For example, CHAR(100) column would always use 100 bytes, regardless of the value – even when it is NULL. ROW compression addresses that and removes such an overhead. For example, CHAR(100) value of 'SQL Server 2012' would use just 15 byte (plus 2 offset bytes) rather than 100 bytes. Also ROW compression feature does not take any disk space for zero or null values.

PAGE compression is a superset of ROW compression and takes into account the redundant data in one or more rows on a given page. It also uses prefix and dictionary compression. What this simply means is that the storage engine cuts down on repeated data bytes in the page, by storing the repeated bytes once on the page and then refer to this value from all other occurrences of this value on the same page. PAGE compression only occurs when the page is full to optimize the performance.

Remind that Data Compression works with IN_ROW allocation units only. It doesn't compress LOB nor ROW_OVERFLOW data.

While it may appear that data compression would reduce the size of your tables or indexes, first you should evaluate the estimated space savings in a table or index by using either the sp_estimate_data_compression_savings system stored procedure or the Data Compression Wizard. The estimation works by copying and compressing the sample of your data in tempdb measuring compression results. An (IS) lock is acquired on the table during this operation. As a result, I suggest you to run the estimation on a testing server rather than directly on the production server. Also you should check if the data is fragmented before considering compression as you might be able to reduce the size of the index by rebuilding it instead of using compression.

To use the Data Compression Wizard, open SQL Server Management Studio, right-click on the table you need to estimate, select Storage and click Manage Compression.

This will launch the Data Compression Wizard. On the Welcome page, click Next.
On the Select Compression Type page, click the drop-down on the Compression Type column to select the compression type. Click the Calculate button to display the estimated compressed size. You can select try another compression type and click the Calculate button again to get result of different compression types.

After you made decision to turn on compression for this table, click on the “Next” button. On the Select an Output Option page, you can create a script that can be used later to apply the selected compression on that table.

Because the compression process can be very CPU and disk I/O intensive, you should only apply it on the production server during a maintenance window. Compressing a table (a clustered index) does NOT mean any non-clustered indexes of the table will be automatically compressed for you. Compression is based on a per object (index, and even partition) basis. If you want to compress the non-clustered indexes for this table, you will have to compress each of them.

Compression and decompression adds additional CPU load to the system, but it reduces I/O load in the system. That CPU overhead is relatively small for the ROW compression, especially when you read the data; however, for PAGE compression that overhead is more significant. Queries could execute even faster especially on the systems that are not heavily CPU bound. But batch inserts, batch updates, and index maintenance could take more time when data is compressed. The conclusion is if the system is not heavily CPU bound, it's suggested to implement ROW compression on the indexes with volatile (value change frequently) data. PAGE compression, on the other hand, could be the good choice for the old static data, especially when that data accessed infrequently. Partitioning helps you to implement different compression schemes to different table partitions and will allow you to reduce index maintenance overhead by rebuilding the index on partition scope.

No comments:

Post a Comment