GAPTHEGURU

Geek with special skills

New columnstore index feature in SQL Server 2012

Problem

A new feature in SQL Server 2012 is the Columnstore Index which can be used to significantly improve query performance. In this tip we will take a look of how it works and how we can use it.

Solution

There are two types of storage available in the database; RowStore and ColumnStore.

In RowStore, data rows are placed sequentially on a page while in ColumnStore values from a single column, but from multiple rows are stored contiguously. So a ColumnStore Index works using ColumnStore storage.

column store versus row store in SQL Server

Now let’s show how we can create a ColumnStore Index and how performance can be improved.

Creating a Column Store Index

Creating a ColumnStore Index is the same as creating a NonClustered Index except we need to add the ColumnStore keyword as shown below.

The syntax of a ColumnStore Index is:

CREATE NONCLUSTERED COLUMNSTORE INDEX ON Table_Name (Column1,Column2,… Column N)

Performance Test

I used the AdventureWorks sample database for performing tests.

--Create the Test Table
USE [AdventureWorks2008R2]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Test_Person](
 [BusinessEntityID] [int] NOT NULL,
 [PersonType] [nchar](2) NOT NULL,
 [NameStyle] [dbo].[NameStyle] NOT NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] [dbo].[Name] NOT NULL,
 [MiddleName] [dbo].[Name] NULL,
 [LastName] [dbo].[Name] NOT NULL,
 [Suffix] [nvarchar](10) NULL,
 [EmailPromotion] [int] NOT NULL,
 [AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
 [Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
 [ModifiedDate] [datetime] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
-- We Populated this table with the Data Stored in Table Person.Person.
-- As we need Plenty of data so we ran the loop 100 times.
INSERT INTO [dbo].[Test_Person] 
SELECT P1.*
FROM Person.Person P1
GO 100
-- At this point we have 1,997,200 rows in the table.
-- Create Clustered Index  on Coloun [BusinessEntityID] 
CREATE CLUSTERED INDEX [CL_Test_Person] ON [dbo].[Test_Person]
( [BusinessEntityID])
GO
-- Creating Non - CLustered Index on 3 Columns
CREATE NONCLUSTERED INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

-- Creating Non - CLustered  ColumnStore Index on 3 Columns
CREATE NONCLUSTERED COLUMNSTORE INDEX [ColumnStore__Test_Person]
ON [dbo].[Test_Person]
([FirstName] , [MiddleName],[LastName])

At this point we have created the ColumnStore Index on our test table. Now we will run the SELECT query with and without the ColumnStore Index and analyze performance.

Query Without ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

We have used the OPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) query hint to not to use the ColumnStore Index this time.

Query With ColumnStore Index

select [LastName],Count([FirstName]),Count([MiddleName])
from dbo.Test_Person 
group by [LastName]
Order by [LastName]

Here are the Actual Execution Plans for both queries:

sql server query plan for non columnstore query

 

sql server query plan for columnstore query

We can see the cost when using the NonClustered Index is 59 % while using the ColumnStore index is 13%.

Now if we hover the mouse over the Index Scans we can see details for these operations.  The below is a comparison:

execution plan output for columnstore query

It is clear from the results that the query performs extremely fast after creating the ColumnStore Index as the column needed for the query is stored in the same page and the query does not have to go through every single page to read these columns.

Performing INSERT, DELETE or UPDATE Operations

We cannot perform DML ( Insert\ Update \ Delete ) operations on a table having a ColumnStore Index, because this puts the data in a Read Only mode. So one big advantage of using this feature is a Data Warehouse where most operations are read only.

For example, if you perform a DELETE operation on a table with a ColumnStore Index you will get this error:

 Msg 35330, Level 15, State 1, Line 1
DELETE statement failed because data cannot be updated 
in a table with a columnstore index. Consider disabling the 
columnstore index before issuing the DELETE statement, 
then rebuilding the columnstore index after DELETE is complete.

However, to perform the operation we would need to disable the ColumnStore Index before issuing the command as shown below:

ALTER INDEX 'Index Name' on 'table name' DISABLE

Creating a ColumnStore Index using Management Studio

Right click and select New Index and select Non-Clustered Columnstore Index…

create columnstore index using SSMS

Click add to add the columns for the index.

create columnstore index using SSMS select columns

After selecting the columns click OK to create the index.

save columnstore index using SSMS

Limitations of a ColumnStore Index

  1. It cannot have more than 1024 columns.
  2. It cannot be clustered, only NonClustered ColumnStore indexes are available.
  3. It cannot be a unique index.
  4. It cannot be created on a view or indexed view.
  5. It cannot include a sparse column.
  6. It cannot act as a primary key or a foreign key.
  7. It cannot be changed using the ALTER INDEX statement. You have to drop and re-create the ColumnStore index instead. (Note: you can use ALTER INDEX to disable and rebuild a ColumnStore index.)
  8. It cannot be created with the INCLUDE keyword.
  9. It cannot include the ASC or DESC keywords for sorting the index.

04/10/2012 - Posted by | SQL Scripting, Sql Server |

2 Comments »

  1. a very interesting post thanks for writing it!

    Comment by Cheap VPS | 04/21/2012 | Reply


Leave a reply to gaptheguru Cancel reply