Category: Community


So it’s TSQL Tuesday once more, and hosted by Jen McCown this month. Thanks Jen!
The topic is around tips, tactics and strategies for managing an Enterprise…

Well, having been in my current role for nearly a year now, which involves a degree of managing a number of SQL Servers, it’s a great opportunity to share what helps me.

1 – SQL prompt

This is an Add-in for SSMS produced by Redgate. While it does a wide range of awesome things, one of the main things that help me, from an enterprise perspective, is the ability to colour code tabs. This means that it is really easy to see what environment I’m running code against. For me, Green is dev, Purple is staging (and not quite live), and Red is Production (or requires caution!).

SQLPrompt_Tabs

The other, really nice feature is that when you close SSMS down (which occasionally happens accidentally, or when Windows Update does it’s thing, cos why would you deliberately close it!), SQL Prompt remembers all the tabs you had open, and reopens them (and connects them) on restart.

For me, this is great as it means that I can concentrate on the actual work, rather than waste time trying to think of a filename for some random query that may be of use at some point..

SSMS_Registered_Servers2 – Registered Servers

This is a really nice feature feature, that allows me to run the same query on multiple servers. One of the main uses of this is to get a ‘snapshot’ of what is running on the servers, with sp_whoisactive. Having a Query connected to one of these groups runs the query against each member server, and adds a column to the start of the result set showing the server name.

3 – Extended Events

I’ve had chance to get to grips more with this over the past few months, and I’m really starting to find lots of great uses for it. One of the really helpful (Enterprise-related) features that I’m using it for is to monitor execution times for a specific stored procedure. This is just for one SP, and it’s called thousands of times an hour. Using Extended events for this allows tracking of these times, with bordering on no overhead.

I’m in the process of doing a separate post on this, so keep em peeled.

Thanks again to Jen for hosting.

Having seen the request by redgate to review the Tribal SQL book, I leapt at the chance. I love the idea of these books and having read the MVP Deep Dives books (Book 1 and Book 2) previously I wanted to see what this one offered.

As with the MVP Deep Dives books, Tribal SQL’s authors have donated their royalties to charity, and in this case they go to Computers For Africa. There are also alot of well known names in this book with Dave BallantyneMark Rasmussen, Bob Pusateri, Stephanie Locke and Matt Velic, among them.

Tribal SQL is a good size book, weighing in at 454 pages, with 15 chapters covering various topics relevant to DBA’s. Each of the chapters has been written by a previously  unpublished author, and all are based on their experiences. There are chapters covering most topics from Internals Data Compression, Performance tuning, Auditing and SQL Injection, through to less technical areas such as Reporting and Database Mail. Additionally, there are sections on personal skills which include Communication Skills, Project Management and an Introduction to Agile Database Development.

While some of the sections don’t go into as much detail as you’d hope, this isn’t the point of this book (from my understanding). It covers each of the areas well, leaves you interested for more, and most of the chapters include links to various sources for further reading.

I enjoyed reading this book, and would highly recommend it for others. Chapters that I got the most out of were the Internals (Mark Rasmussen), Windowing Functions (Dave Ballantyne, and particularly since I was doing the SQL 2012 Querying exam), and the SQL Injection (Kevin Feasel).

My only criticism suggestion is that I’d like to see an eBook/Kindle version of it so I can have it in the eBook library I have on my Tablet, on the grounds that I can use it as a reference book.

You can read more about this book at the books website – TribalSQL.com

TestDay

Unit Testing is a methodology that we should all embrace and understand.

It’s not just for Programmers

Unit Testing Frameworks are available for almost every Platform, from ABAP to XSLT. So if you are a hard-core coder, a SQL DBA, a Web Developer, or a Sys Admin, you can join in!

While the use of Unit Testing is getting more common, it’s not as common as it could be.

So I’d like to propose TEST DAY 2012!!!

How do you do Testing ?

Why not share how you do Unit Testing, why you started it, what your experiences have been, or something related to Unit Testing ?

We can all learn from each others experiences.

What do you need to do ?

  • Write a blog Post and Share it with the internet, so everyone can learn from your experiences.
  • Your blog post must be published between Wednesday, 12th December 2012 00:00:00 GMT and Thursday, 13th December 2012 00:00:00 GMT.
  • If you are on Twitter please tweet your blog using the #TestDay2012 hashtag. I can be contacted there as @nhaslam, in case you have questions or problems with comments/trackback.
  • Either, include the TestDay2012 Picture (above) and hyperlink it back to this post, or have a link back to this post.
  • If you don’t see your post in trackbacks, add the link to the comments below.

What will I do ?

A week or so later (depending on the number of posts), I’ll do a summary post and cover all the submissions.

I look forward to reading your posts!

Thanks to everyone who posted on T-SQL Tuesday this month. Below is a summary of the posts, so have a look through if you’ve not had chance yet.

As an aside, if you’ve not watched the film yet, it’s available here (Google | AmazonUK | AmazonUS)

There were some really interesting, and terrifying posts here, so pull up a chair, grab some whiskey, turn the lights down, and have a read through.

Don’t forget to keep an eye out for the next TSQL2sDay post, in a couple of weeks time.

The Posts!

Rob FarleyWhen someone deletes a shared data source in SSRS

Thomas RushtonSQL Wildcards

Rick KruegerNightmare on TSQL Street – The Case of the Missing Cache

Matthew VelicSoylent Growth

Ted KruegerHorrify Me!

Ken WatsonSoylent Green

Chris ShawAre you kidding me ?

Thomas Rushton (Again!) – Soylent Inbox

Jason BrimhallHigh Energy Plankton

Jes BorlandSoylent Green SQL Server

Bob PusateriA Horror Story

Steve Jones (Voice of the DBA)Soylent Green

Chris Yates – Soylent Green

Jeffrey VerheulSoylent Green

 

image

20121003-200545.jpg Welcome to TSql2sday issue #35, this time hosted by me…

It’s a bit last minute, as I stepped in to help Adam out, so bear with me. As always, thanks to Adam for starting this off, I’ve posted a few articles on previous runs, and have found other people’s posts to be really interesting. I hope this follows in the same way.

Over the past couple of days I’ve been attending a training course in Paris, and one evening, to relax I watched ‘Soylent Green‘, a classic science fiction film. If you’ve not seen it, I recommend it, and go and watch it …

So, what I’d like to know is, what is your most horrifying discovery from your work with SQL Server?

We all like to read stories of other people’s misfortunes and, in some ways they help to make us better people by learning from them. Hopefully, there is nothing as bad as Charlton Heston’s discovery, but there may be in its own way.

A couple of extra thoughts for motivational thinking…

Soylent Brown – You did a post, Great Job!!

Soylent Orange – You did a post, it made me wince!

Soylent Green  – You did a post, it made me wince, and it included some T-SQL.

Do you have the words straight?

Here are the rules as usual: If you would like to participate in T-SQL Tuesday please be sure to follow the rules below:

  • Your blog post must be published between Tuesday, October 9th 2012 00:00:00 GMT and Wednesday, October 10th 2012 00:00:00 GMT.
  • Include the T-SQL Tuesday logo (above) and hyperlink it back to this post.
  • If you don’t see your post in trackbacks, add the link to the comments below.
  • If you are on Twitter please tweet your blog using the #TSQL2sDay hashtag. I can be contacted there as @nhaslam, in case you have questions or problems with comments/trackback.

Thank you all for participating, and special thanks to Adam Machanic (b|t) for all his help and for continuing this series!

Thanks for posting, and I’ll have a follow-up post listing all the contributions as soon as I can.

It’s another TSQL2sday post, this time hosted by Rob Volk (b | t ). Thanks for hosting Rob.

So this month, it’s about how we fixed a problem, or found help when we couldn’t fix a problem, with a theme based on ‘Help’ by The Beatles

I chose the 2nd verse…

When I was younger, so much younger than today

So, many years ago, when I started out with SQL Server, back in the heady days of 6.5, there was much less of a SQL Community, actually, I don’t even remember one. The only way I could get help, was either through using MSDN, or by emailed colleagues I met on a SQL training course.

I never needed anybody’s help in any way.

Though that’s primarily due to stopping using SQL for a while, just a year or so, but still.

Everyone needs help, at some point, with something. It’s not a weakness, it’s a strength.

But now these days are gone, I’m not so self assured.

In the past few years, I’ve started working more and more with SQL, and found that it is such a huge product that no one can know the whole thing (SSAS, SSIS, SSRS included), and because of that, I’ve found several ways to get help if I need it.

Though, before I get into that, I need to say something about the community. There is a huge SQL Community out there, though the first community event I attended wasn’t a SQL One. It was a Developer event, Remix Uk, back in 2008 (http://www.microsoft.com/uk/remix08/default.aspx). It was a great event and I got to meet some great people there, including Scott Guthrie! Getting to this event was pretty much solely due to an ex-colleague, Jes Kirkup. Thanks Jes!

Since then I’ve started attending community events where I can, including the local DevEvening events (where I’ve done a couple of short presentations), and SQL community events (SQLMaidenhead, SQL in the Evening, and SQLBits of course!). I’ve found that these are a great way of getting a great insight into what skills others in the industry have, and so where I should be targeting my learning. Following on from that, I’ve met some great people, and there are people who I know I could ask for help if I needed to.

Not to mention the #SQLHelp hash tag on twitter, where there is help, pretty much 24hours a day, the only restriction being the need to phrase your question in 150 characters (160-hash tag).

Now I find I’ve changed my mind and opened up the doors.

Now I find that I am helping people where I get the opportunity, am publishing blog articles (here, like this one!) and am hoping to do more Community presentations. Furthermore, I’m doing internal training courses (next month I’m doing one on SSAS), and have recently started mentoring a colleague in SQL.

It’s great to be able to share knowledge and experience.

Thanks for listening, and reading, and thanks again to Rob for hosting.

T-SQL Tuesday

Thanks to Erin Stellato for hosting this months #TSQL2sday. Erin wanted to know all about what we do every day!

Interestingly, when I was much younger, I wanted to be a Fire-fighter or a Pilot. I’m still quite keen on learning to fly, but that’s looking less likely as time is going by (Eyesight, time, age and cost in that order).

Now though, and for the past 12 years or so, I work as a Consultant. It’s a nice, vague title. It started out as ‘Technical Consultant’, moved through Systems Consultant, and CRM Consultant. It’s currently bouncing between BI Consultant and Data Warehousing Consultant depending on the project I’m working on.

2012-07-11 07.31.33
My Journey to Work

2012-07-12 07.33.32
The Office

My Day!

The day started by sitting in a traffic jam. Pretty common that, unfortunately.

However, when I made it to my desk, I did a couple of checks of a server that I was running maintenance jobs on overnight. All was well, so I dived into email.

A couple of interesting items in there, one was a link about a Pigeon with a USB stick being faster than UK broadband (BBC link here). Also, was an invitation to the Microsoft Hadoop on Azure trial, which looks really interesting, and something I’ll have a look at next week (link here).

The Morning

Then, I started work on a Customer project that I’m working on this week. It’s effectively adding two additional country feeds (Spain and France, since you asked), to a data warehouse. The customer is using WhereScape RED, so it was a pretty straightforward matter of dragging and dropping the tables from the DB2 source system, into the ETL tool. WhereScape RED then generates the stored procedures to allow the ETL process to run, to get the data into the DWH.

Sounds a pretty straightforward process, however, there are 91 tables, and a couple of minor modifications to each one. So that took up all of my morning.

The Afternoon

The afternoon was pretty much taken up by an interesting problem with a BusinessObjects (XI4)environment. It was apparently continually running a query against the SQL Server database. We managed to prove it was the BO server doing this by changing the service account it was running as. The query could be seen in sp_whoisactive (thank you @AdamMachanic) to be run by a different user. The query was proceeding to take the server utilisation to 100%, which meant that the other databases on the server couldn’t effectively service user queries.

To temporarily resolve this issue, we put Resource Governor on, which restricted the BusinessObjects service to 25% of the CPU power, thereby letting the other users have some resources.

I found a really helpful query that helped me to find the queries that were being run. The query (from SQLAuthority, is copied here).

SELECT sqltext.TEXT, req.session_id, req.status,
req.command, req.cpu_time, req.total_elapsed_time
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext

Copied from http://blog.sqlauthority.com/2009/01/07/sql-server-find-currently-running-query-t-sql/

The final solution to the issue was to apply the BO XI4 SP4 patch, which appears to have resolved the issue.

There was also a couple of questions on licencing, to which both answers were ‘If it looks to be too good to be true, it probably is’.

Sadly, I didn’t get any pictures of the Red Arrows flying around the Farnborough Airshow, which is just up the road from us, or any pictures of the White-tailed Kite we saw flying over the motorway.

And that, is pretty much my day; a comparatively quiet one, and for a change, I made it out the door and home at a reasonable time. I hope you found this interesting, and I look forward to reading about your day.

Thanks again to Erin for hosting.

Last night I had the opportunity to do my first community presentation, at the SQL Server in the Evening event, hosted by Gavin Payne and Justin Langford from Coeo. Thanks to both of you for the opportunity to present.

The session I presented was a 15 minute ‘Newcomers’ slot on SQL Server and the CLR. I think the session went well, and I had some positive feedback.

The slides from the session are available here: SQL Server and CLR Session Slides

It’s T-SQL Tuesday again, and this time hosted by Nigel Sammy. Thanks for hosting Nigel, enjoy the post.

Not so long ago, I was lucky enough to go to SQL Bits X. It was a great few days, an I highly recommend it to you!

The Keynote session, given by Conor Cunningham, was a 400 level session on the ColumnStore index, which is a new feature in SQl Server 2012.

The demo was, unsurprisingly, really good, and it made me wonder ‘is it really that good ?’ So I thought I’d give it a go and see.

Having Googled around a bit, I found a useful blog article by Sacha Tomey, that went through a few examples. With permission, I’m going to run through a similar process, add a few bits in, and use a different data set.

Part of me really hates the AdventureWorks demo database, so you can imagine my delight when I discovered that there is now a bigger Retail data set, structured as a DataWarehouse. This is the Contoso BI set, and I like it.

Getting down to it

After installing the ContosoBI  database, you’ll end up with a fact table, factOnlineSales, with approx. 12.6 million rows in it.

First off, I want to try and get a level playing field, so we’ll be running with Statistics IO and Statistics Time on, and we’ll be clearing the buffers before each query

set statistics IO on;
set statistics time on;
dbcc dropcleanbuffers;

The Clustered Index

Just to get a comparison, I ran the test query, shown below, to get an idea of the speed against the supplied Clustered Index.

dbcc dropcleanbuffers;
go
SELECT
StoreKey ,SUM(SalesAmount) AS SalesAmount
FROM   factOnlineSales
GROUP BY StoreKey
ORDER BY StoreKey

This gave the following results:

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘FactOnlineSales’. Scan count 5, logical reads 46821, physical reads 1, read-ahead reads 46532, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 8377 ms,  elapsed time = 3476 ms

Just a Heap

Next, I wanted to get rid of the Clustered index, but since I didn’t really want to lose the original table, I ran this code to insert the contents of the factOnlineSales table into factCleanSales.

select * into factCleanSales from FactOnlineSales

That gave me 12 million rows, I wanted more, so next I ran this:

insert into factCleanSales
select dateadd(yy,3,DateKey), StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount,
ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost,
UnitCost, UnitPrice, ETLLoadID, dateadd(yy,3,LoadDate), dateadd(yy,3,UpdateDate) from factOnlineSales

This gave me approx. 25 million records, and no Clustered Index. So I ran the test query again. It took a little longer this time.

dbcc dropcleanbuffers;
go
SELECT
StoreKey ,SUM(SalesAmount) AS SalesAmount
FROM   factCleanSales
GROUP BY StoreKey
ORDER BY StoreKey

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘factCleanSales’. Scan count 5, logical reads 505105, physical reads 0, read-ahead reads 504823, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 14976 ms,  elapsed time = 33987 ms.

Nearly 10 times longer to run, and more than 10 times the I/O, but that wasn’t surprising since we had no indexes.

Add one Non-Clustered

So, following Sacha’s lead, I added a compressed, nonclustered index into the pot.

CREATE NONCLUSTERED INDEX [IX_StoreKey] ON [dbo].factCleanSales
(    StoreKey ASC    )
INCLUDE ([SalesAmount]) WITH
(
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE
) ON [PRIMARY]
GO

Clearing the buffers and running the query now, resulted in a better experience.

Table ‘factCleanSales’. Scan count 5, logical reads 43144, physical reads 1, read-ahead reads 42999, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 18877 ms,  elapsed time = 5785 ms.

The query time was down to a more reasonable level, though still longer than the Clustered Index.

ColumnStore Time!

Adding the ColumnStore index took a while, just over 2 minutes. The definition is below, so I ran it. Note that the ColumnStore index has all the columns in the definition. You can’t have Include Columns, and by having all the columns in there, you gain huge flexibility for the Index.

Create nonclustered columnstore index [IX_ColumnStore] on  [dbo].factCleanSales
(    OnlineSalesKey, DateKey, StoreKey, ProductKey,
PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber,
SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity,
ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost,
UnitPrice, ETLLoadID, LoadDate, UpdateDate
) with (Drop_Existing = OFF) on [PRIMARY];

Next I ran the test query.

Table ‘factCleanSales’. Scan count 4, logical reads 6378, physical reads 27, read-ahead reads 13347, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 515 ms,  elapsed time = 378 ms.

That’s less than a tenth of the time the Clustered index took, and the great thing is, because it’s got all the columns in there, you can create more complicated queries, and still get amazing speed. By running the query below, we still got great speed!

dbcc dropcleanbuffers;
go
SELECT
year(DateKey), storekey ,SUM(SalesAmount) AS SalesAmount
FROM   factCleanSales with (index ([IX_ColumnStore]))
GROUP BY year(DateKey), storekey
ORDER BY year(DateKey)

Table ‘factCleanSales’. Scan count 4, logical reads 8156, physical reads 78, read-ahead reads 16224, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 4603 ms,  elapsed time = 1522 ms.

Is there a Downside ?

Yes. Two actually.

Firstly, it’s an Enterprise only feature. This is annoying, however, it is linked to the second downside. You cannot insert, update or delete directly, when a ColumnStore index is present.

Msg 35330, Level 15, State 1, Line 1
UPDATE statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, then rebuilding the columnstore index after UPDATE is complete.

This means that if you are using it on a Data Warehouse, you’ll need to disable the index on the fact table, insert/update the data, then rebuild the index to get it back online. This isn’t ideal, however, there is an alternative. You can use Partition Switching to switch data in and out of the table.

Effectively, what you’ll be doing to insert data, is to load data into a partition table, with the same schema as the fact table, and switch it in. For updating or deleteing, you’d switch the appropriate partition out, update/delete the data, then switch it back in again. It’s more complicated (obviously), but the performance improvement gained by ColumnStore indexes should be worth it. Given that Table Partitioning is an Enterprise feature, it makes sense (kind of) that ColumnStore indexes should be too.

Partition Switching

To demonstrate how inserting into a table with a ColumnStore index on it was working, I dropped the indexes against the factCleanSales table, and partitioned and clustered it using the following:

CREATE PARTITION FUNCTION [myPartFunc](int) AS RANGE RIGHT
FOR VALUES (N’2003′, N’2004′, N’2005′, N’2006′, N’2007′, N’2008′, N’2009′,
N’2010′, N’2011′, N’2012′, N’2013′, N’2014′, N’2015′)

CREATE PARTITION SCHEME [myPartScheme] AS PARTITION [myPartFunc] TO
([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY])

CREATE CLUSTERED INDEX [ClusteredIndex_on_myPartScheme_634694274321586358] ON [dbo].[factCleanSales]
( [YearPart] )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [myPartScheme]([YearPart])

Then, added the ColumnStore back into the table, and this is automatically matched to the Partitioning function and scheme above.

CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_ColumnStore] ON [dbo].[factCleanSales] (    [OnlineSalesKey],    [DateKey],    [StoreKey],    [ProductKey],    [PromotionKey],    [CurrencyKey],    [CustomerKey],    [SalesOrderNumber],    [SalesOrderLineNumber],    [SalesQuantity],    [SalesAmount],    [ReturnQuantity],    [ReturnAmount],  [DiscountQuantity],    [DiscountAmount],    [TotalCost],    [UnitCost],
[UnitPrice],    [ETLLoadID],    [LoadDate],    [UpdateDate],    [YearPart]
)WITH (DROP_EXISTING = OFF)

Next, I created a table to switch the data in from, then loading it up, adding the ColumnStore index, and then switching the partition in using this:

CREATE TABLE [dbo].[factCleanSales_Part](
[OnlineSalesKey] [int] IDENTITY(1,1) NOT NULL,
[DateKey] [datetime] NOT NULL,
[StoreKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [int] NULL,
[SalesQuantity] [int] NOT NULL,
[SalesAmount] [money] NOT NULL,
[ReturnQuantity] [int] NOT NULL,
[ReturnAmount] [money] NULL,
[DiscountQuantity] [int] NULL,
[DiscountAmount] [money] NULL,
[TotalCost] [money] NOT NULL,
[UnitCost] [money] NULL,
[UnitPrice] [money] NULL,
[ETLLoadID] [int] NULL,
[LoadDate] [datetime] NULL,
[UpdateDate] [datetime] NULL,
[YearPart] [int] NULL
)

alter table [factCleanSales_Part] with check add constraint chk2006 check (yearPart=2006)

CREATE CLUSTERED INDEX [ClusteredIndex_on_myPartScheme_634694274321586358] ON [dbo].[factCleanSales_Part] (    [YearPart]
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [myPartScheme]([YearPart])

insert into factCleanSales_Part
select dateadd(yy,-1,DateKey), StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount,
ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost,
UnitCost, UnitPrice, ETLLoadID, dateadd(yy,-1,LoadDate),
dateadd(yy,-1,UpdateDate) , year(dateadd(yy,-1,DateKey)) from factOnlineSales
where year(dateadd(yy,-1,DateKey))=2006

CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_ColumnStore] ON [dbo].factCleanSales_Part (
[OnlineSalesKey],    [DateKey],    [StoreKey],    [ProductKey],    [PromotionKey],
[CurrencyKey],    [CustomerKey],    [SalesOrderNumber],    [SalesOrderLineNumber],
[SalesQuantity],    [SalesAmount],    [ReturnQuantity],    [ReturnAmount],
[DiscountQuantity],    [DiscountAmount],    [TotalCost],    [UnitCost],
[UnitPrice],    [ETLLoadID],    [LoadDate],    [UpdateDate],    [YearPart]
)WITH (DROP_EXISTING = OFF)

Next, to check that there are no records in the partition already for 2006, I ran this:

SELECT YearPart, $PARTITION.myPartFunc(YearPart) AS Partition,
COUNT(*) AS [COUNT] FROM factCleanSales
GROUP BY YearPart, $PARTITION.myPartFunc(YearPart)
ORDER BY Partition

image

Next, I switched the data in using this, and then checked the partition values using the statement above.

alter table [factCleanSales_Part] with check add constraint chk2006 check (yearPart=2006)

image

Delightfully, the fact table now has another partition, and all without removing the ColumnStore index on it.

For Extra credit…

Now, should you want to get more details out of the columnstore index, there are a couple of new DMV’s that can be used. They are:

  • sys.column_store_dictionaries
  • sys.column_store_segments

To see useful information like the sizing or number of rows per column, you can use this query:

select object_name(p.object_id) as ‘TableName’, p.partition_number,p.data_compression_desc,
c.name, csd.entry_count, csd.on_disk_size
from sys.column_store_dictionaries csd
join sys.partitions p on p.partition_id = csd.partition_id
join sys.columns c on c.object_id = p.object_id and c.column_id= csd.column_id
order by p.partition_number, c.column_id

which will return the following data. Summing the on_disk_size will give you the size in bytes of the index.

My Demo Environment

Just for transparency, the timings I was getting above weren’t on any huge server. They were on a virtual machine, running in VMWare Workstation v8.0.2 on Windows 7 SP1. SQL Server is 2012 (obviously), Developer Edition in 64bit.

image

Wrapping up..

I think it’s reasonably safe to say that this is the longest (in size and time) blog post I’ve written, so I apologise if it rambles a bit, but I hope you get the importance of ColumnStore indexes, and I hope you get the chance to use them.

Having been sitting on the fence for a while, I’m finally leaping off, and presenting at some community events. Following a false start with SQLBits (I submitted, but wasn’t voted in, and given the number of attendees I’m a little relieved about that!), I’ll be presenting at the following events over the next couple of months.

Hope to see you there!

24th April (Tues) – SQL Server in the Evening (6:30-6:50) – First Timers Slot (http://sqlserverfaq.com/events/392/Sessions-including-SQL-Server-Parallel-Data-Warehouse-at-the-sixth-SQL-Server-community-event-615pm-April-24th-Reading-Berkshire.aspx)

I’ll talk about using the CLR within SQL Server, why and when it should be used and then how.

25th April (Weds) – DevEvening (http://www.devevening.co.uk/)

26th May (Sat)– DDD Southwest (http://dddsouthwest.com/)
NOTE: This session isn’t confirmed yet, and is still reliant on being voted in. You can vote by going to the DDD Southwest site, linked above! )

Both DevEvening and DDD Southwest will be the same session, summarised below:

Going Native with SQL Server 2012 and C++

I’ll be going through the delights of creating a module to interact with SQL Server 2012, a function in T-SQL (briefly), then using C# to create a SQL CLR module, and then looking into the performance gains by making a C++ application querying the SQL Native Client (ODBC). All three sections will do the same job, and we’ll cover the advantages and disadvantages of each.

We’ll cover the following:

  • T-SQL, SQL CLR (C#)
  • SQL Server Native Client
  • Advantages and Disadvantages
  • Performance Opportunities
  • How to use it to connect to SQL Server from C++
  • How to query a database
  • Comparison between T-SQL, SQL CLR & C++ solutions

Slides and follow-up articles will be coming soon.