2021-04-01

Boosting Analytics Query using Columnstore Index

While row store table and B-tree index perform very well for transactional operations include insert/update/delete and selecting small number of rows, analytics query which access large range of rows will incur high I/O on B-tree index. SQL Server 2012 introduced columnstore index which target analytics query, and SQL Server 2016 further enhanced columnstore index to make it updateable nonclustered columnstore index on heap or B-tree. In this blog post I'm going to show you how columnstore index boost up an analytics query, using the StackOverflow2010 database to demo.

Let's say we have the following analytics query to run:
SELECT U.DisplayName, MAX(P.Score)
FROM Users U JOIN Posts P ON U.Id = P.OwnerUserId
GROUP BY U.DisplayName;

Without any nonclustered index, table scan on both tables are required to run this query.

The execution time and I/O incurred are huge (817,869 logical reads, elapsed time 01:05).

Now we add a nonclustered B-tree index with included column:
CREATE INDEX IX_Posts ON Posts (OwnerUserId) INCLUDE(Score);

The analytics query runs faster by using the new index (22,319 logical reads, elapsed time 8 secs).


Let's create a columnstore index and see how it further boost up the query:
CREATE NONCLUSTERED COLUMNSTORE INDEX CI_Posts ON Posts (OwnerUserId, Score);

The analytics query now using the columnstore index which incur lesser I/O and more faster.
Table 'Posts'. Segment reads 5, segment skipped 0.
Table 'Users'. Scan count 5, logical reads 14,783. Elapsed time 3.4 secs.


For a deeper insight into columnstore index design, you can take a look into the design guidance provided by Microsoft.

No comments:

Post a Comment