What is a columnstore index? Put very simply, it’s a high performance index created on a SQL Server table. This is something we’re going to use in a relational database. Typically it’s used in a data warehouse but could be used on any relational table.
Columnstore indexes are implemented using a technology called xVelocity. This might not be familiar to you, but it’s a new name for something you may already be familiar with. It was previously called Vertipaq, and the brand has been changed to xVelocity.
xVelocity really refers to an underlying technology that Microsoft developed. This technology is a set of algorithms and an approach that can be applied in a variety of ways. Currently xVelocity is used in two SQL Server product features. The features are different, but they use xVelocity in somewhat similar ways.
The first of these is the SQL Server analysis services Tabular mode. You may also know this as PowerPivot–either PowerPivot for SharePoint or PowerPivot for Excel, or with SQL Server 2012 you may use this as the Tabular mode of BISM models.
The second feature is the Columnstore index. Getting into a little more detail about these implementations, the Tabular mode of Analysis Services uses xVelocity or an “in-memory” implementation. In this implementation, all data is pulled out of relational stores (or other data sources), and is stored in an in-memory database that’s hosted by Analysis Services.
The Columnstore index is a little bit different. This implementation is what’s called “memory optimized”. Data is put into this index and is stored initially in disk structures. Then as the index is used it’s pulled into RAM and cached there, yielding much better performance on subsequent queries.
This isn’t that much unlike using pages of SQL Server tables that are then pulled into cache. The biggest difference is that with xVelocity these on-disk indexes are going to be very highly compressed.
Let’s look at the differences between the xVelocity Columnstore and the traditional row store that we’re used to. In a conventional row store strategy pages are stored on disk with sets of rows. So n this example we might have three different rows that are stored on each page, and each row has four columns. That makes up a page that’s stored on disk.
We can compress data on these pages but the amount of data we can pack into each page is limited by the compressibility of the entire row. And then as we build more and more pages we get more and more rows, but each page is only compressed to the extent that the sets of full rows can be compressed in each page.
In a Columnstore strategy, we approach the problem differently. Instead of storing pages with sets of rows, we store pages with full columns. This gives us much better compressibility, because we have a lot more repeating values in a column than we do across a row. If we store a page with just all the values of a single column, we can get incredible compression.
And that’s what happens with Columnstore using xVelocity. We have pages with a lot more data packed into each page, so as pages are loaded back into RAM there’s more data available in RAM. There are other optimizations that xVelocity adds, but this is one of the fundamental concepts and the reason for the name “Columnstore”.
To create a Columnstore index against a table, we use the same CREATE INDEX statement that we’re used to, with a modifier. So here we have the word “COLUMNSTORE”, so we’re creating an index that’s a columnstore index on the table with a list of columns. When we execute this statement the database will pull the columns that we specified for all the rows in the table and compress that into a columnstore index. This is then written back to disk.
When we create the Columnstore index, we can list all the columns in or table if we want, and very often we’ll do exactly that. But there are some limitations. These are the supported column types that we can put in that CREATE INDEX statement. The ones that are missing are generally the BLOB oriented and unstructured column types. If your table has columns that are outside this set of data types, you can still create a Columnstore index, just don’t list any columns with unsupported data types in the CREATE INDEX statement.
So once the Columnstore index exists, the way we query it is no different from the way we query the table. Here’s a very simple query against a table, and this table has the Columnstore index we created. We don’t have to make a note of that in the query.
Once we execute the query a query plan is generated, and it’ll look at whether the Columnstore would likely be faster or not compared to using traditional scans or incorporating other types of secondary indexes If the Columnstore wouldn’t make the query any faster, the engine will just continue and query the table as usual, ignoring the columnstore index.
If the columnstore index would be faster the engine will switch over and pull the rows for the table from the Columnstore index rather than from the table itself.
If there’s a Columnstore index, we can’t update or delete anything in the table while the index is present. This is because in SQL Server 2012 a Columnstore index isn’t incrementally maintained. It can only be built from scratch completely by reading the rows in the table or partition — no updates. But you have some options.
The first is that you can drop the Columnstore index, make changes, and then rebuild the index. If the table isn’t that big, this is probably fine. If the table is big, you probably partitioned it anyway, and you can use partition switching. Create a temporary table, put the updated partition rows into that table, add a columnstore index, and then switch that partition into the partition table.
The third option is that you could use two tables. You could populate a historical table with rows that don’t change anyway, and put a columnstore index on that. Then keep your more recent rows in a different table that doesn’t have a columnstore index. This is the table where you would make updates and deletes.
Then when you query the overall table, use a UNION ALL. This is a little bit of a workaround, but may work fine in many implementations.
Let’s just go through a few tips about columnstore indexes. Some background, limitations and so on.
The column order when you crate the columnstore doesn’t matter. Just put the columns in any order you want. The SQL engine will follow its own ideas about the order of columns according to its own rules anyway.
The most common approach is to add all columns to a columnstore index, particularly in a data warehouse environment. This is probably what you’ll do most of the time. However you can reduce the size of the columnstore index by specifying fewer than all the columns. But then if you query the table on columns that aren’t in the index, the columnstore index isn’t going to do you much good.
The table underneath the columnstore index can have a clustered index, but it doesn’t have to. It can be a heap or a clustered table.
But if the underlying table is going to have a clustered index, it’s not going to be a Columnstore index, because there’s no such thing as a clustered Columnstore index. Columnstore indexes are always secondary indexes.
When you create the columnstore index, if you do have a clustered index, those clustered columns will always be added to the columnstore index–you don’t even need to specify them.
So you might wonder once you create your columnstore index just how big is it? You can query the system tables to figure this out. It’s a query like this one. It’s kind of intimidating, but it’ll give you the number you’re looking for. This query is listed in the MSDN documentation, so you don’t have to copy it down from this slide.
Just a few limitations that you should be aware of when you’re using Columnstore indexes. Filtering is not supported. When you create a Columnstore index, all rows from the table will be added to the index. This is usually going to be fine, however if you do want to have fewer rows included you’ll have to populate a different table and populate that table.
Computed columns can’t be in the index, so don’t try to create a columnstore index with those columns- -it’s not going to work. Columnstore indexes are created only on tables. You cannot create Columnstore indexes on an indexed view. And sparse columns can’t be included in a Columnstore index either.
Most importantly, a table can have only one Columnstore index. Normally this is going to be fine because you’ll usually add all the columns from the table to the Columnstore index anyway. But if you were thinking of having two or more Columnstore indexes–that’s not supported in this version.
Finally I’ll leave you with some links to product documentation and to the TechNet Wiki. Both of these resources have a wealth of information about Columnstore indexes.