Tag Archive: SQL Server


T-SQL Tuesday

Thanks to Erin Stellato for hosting this months #TSQL2sday. Erin wanted to know all about what we do every day!

Interestingly, when I was much younger, I wanted to be a Fire-fighter or a Pilot. I’m still quite keen on learning to fly, but that’s looking less likely as time is going by (Eyesight, time, age and cost in that order).

Now though, and for the past 12 years or so, I work as a Consultant. It’s a nice, vague title. It started out as ‘Technical Consultant’, moved through Systems Consultant, and CRM Consultant. It’s currently bouncing between BI Consultant and Data Warehousing Consultant depending on the project I’m working on.

2012-07-11 07.31.33
My Journey to Work

2012-07-12 07.33.32
The Office

My Day!

The day started by sitting in a traffic jam. Pretty common that, unfortunately.

However, when I made it to my desk, I did a couple of checks of a server that I was running maintenance jobs on overnight. All was well, so I dived into email.

A couple of interesting items in there, one was a link about a Pigeon with a USB stick being faster than UK broadband (BBC link here). Also, was an invitation to the Microsoft Hadoop on Azure trial, which looks really interesting, and something I’ll have a look at next week (link here).

The Morning

Then, I started work on a Customer project that I’m working on this week. It’s effectively adding two additional country feeds (Spain and France, since you asked), to a data warehouse. The customer is using WhereScape RED, so it was a pretty straightforward matter of dragging and dropping the tables from the DB2 source system, into the ETL tool. WhereScape RED then generates the stored procedures to allow the ETL process to run, to get the data into the DWH.

Sounds a pretty straightforward process, however, there are 91 tables, and a couple of minor modifications to each one. So that took up all of my morning.

The Afternoon

The afternoon was pretty much taken up by an interesting problem with a BusinessObjects (XI4)environment. It was apparently continually running a query against the SQL Server database. We managed to prove it was the BO server doing this by changing the service account it was running as. The query could be seen in sp_whoisactive (thank you @AdamMachanic) to be run by a different user. The query was proceeding to take the server utilisation to 100%, which meant that the other databases on the server couldn’t effectively service user queries.

To temporarily resolve this issue, we put Resource Governor on, which restricted the BusinessObjects service to 25% of the CPU power, thereby letting the other users have some resources.

I found a really helpful query that helped me to find the queries that were being run. The query (from SQLAuthority, is copied here).

SELECT sqltext.TEXT, req.session_id, req.status,
req.command, req.cpu_time, req.total_elapsed_time
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext

Copied from http://blog.sqlauthority.com/2009/01/07/sql-server-find-currently-running-query-t-sql/

The final solution to the issue was to apply the BO XI4 SP4 patch, which appears to have resolved the issue.

There was also a couple of questions on licencing, to which both answers were ‘If it looks to be too good to be true, it probably is’.

Sadly, I didn’t get any pictures of the Red Arrows flying around the Farnborough Airshow, which is just up the road from us, or any pictures of the White-tailed Kite we saw flying over the motorway.

And that, is pretty much my day; a comparatively quiet one, and for a change, I made it out the door and home at a reasonable time. I hope you found this interesting, and I look forward to reading about your day.

Thanks again to Erin for hosting.

Advertisements

Last night I had the opportunity to do my first community presentation, at the SQL Server in the Evening event, hosted by Gavin Payne and Justin Langford from Coeo. Thanks to both of you for the opportunity to present.

The session I presented was a 15 minute ‘Newcomers’ slot on SQL Server and the CLR. I think the session went well, and I had some positive feedback.

The slides from the session are available here: SQL Server and CLR Session Slides

It’s T-SQL Tuesday again, and this time hosted by Nigel Sammy. Thanks for hosting Nigel, enjoy the post.

Not so long ago, I was lucky enough to go to SQL Bits X. It was a great few days, an I highly recommend it to you!

The Keynote session, given by Conor Cunningham, was a 400 level session on the ColumnStore index, which is a new feature in SQl Server 2012.

The demo was, unsurprisingly, really good, and it made me wonder ‘is it really that good ?’ So I thought I’d give it a go and see.

Having Googled around a bit, I found a useful blog article by Sacha Tomey, that went through a few examples. With permission, I’m going to run through a similar process, add a few bits in, and use a different data set.

Part of me really hates the AdventureWorks demo database, so you can imagine my delight when I discovered that there is now a bigger Retail data set, structured as a DataWarehouse. This is the Contoso BI set, and I like it.

Getting down to it

After installing the ContosoBI  database, you’ll end up with a fact table, factOnlineSales, with approx. 12.6 million rows in it.

First off, I want to try and get a level playing field, so we’ll be running with Statistics IO and Statistics Time on, and we’ll be clearing the buffers before each query

set statistics IO on;
set statistics time on;
dbcc dropcleanbuffers;

The Clustered Index

Just to get a comparison, I ran the test query, shown below, to get an idea of the speed against the supplied Clustered Index.

dbcc dropcleanbuffers;
go
SELECT
StoreKey ,SUM(SalesAmount) AS SalesAmount
FROM   factOnlineSales
GROUP BY StoreKey
ORDER BY StoreKey

This gave the following results:

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘FactOnlineSales’. Scan count 5, logical reads 46821, physical reads 1, read-ahead reads 46532, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 8377 ms,  elapsed time = 3476 ms

Just a Heap

Next, I wanted to get rid of the Clustered index, but since I didn’t really want to lose the original table, I ran this code to insert the contents of the factOnlineSales table into factCleanSales.

select * into factCleanSales from FactOnlineSales

That gave me 12 million rows, I wanted more, so next I ran this:

insert into factCleanSales
select dateadd(yy,3,DateKey), StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount,
ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost,
UnitCost, UnitPrice, ETLLoadID, dateadd(yy,3,LoadDate), dateadd(yy,3,UpdateDate) from factOnlineSales

This gave me approx. 25 million records, and no Clustered Index. So I ran the test query again. It took a little longer this time.

dbcc dropcleanbuffers;
go
SELECT
StoreKey ,SUM(SalesAmount) AS SalesAmount
FROM   factCleanSales
GROUP BY StoreKey
ORDER BY StoreKey

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘factCleanSales’. Scan count 5, logical reads 505105, physical reads 0, read-ahead reads 504823, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 14976 ms,  elapsed time = 33987 ms.

Nearly 10 times longer to run, and more than 10 times the I/O, but that wasn’t surprising since we had no indexes.

Add one Non-Clustered

So, following Sacha’s lead, I added a compressed, nonclustered index into the pot.

CREATE NONCLUSTERED INDEX [IX_StoreKey] ON [dbo].factCleanSales
(    StoreKey ASC    )
INCLUDE ([SalesAmount]) WITH
(
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, DATA_COMPRESSION = PAGE
) ON [PRIMARY]
GO

Clearing the buffers and running the query now, resulted in a better experience.

Table ‘factCleanSales’. Scan count 5, logical reads 43144, physical reads 1, read-ahead reads 42999, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 18877 ms,  elapsed time = 5785 ms.

The query time was down to a more reasonable level, though still longer than the Clustered Index.

ColumnStore Time!

Adding the ColumnStore index took a while, just over 2 minutes. The definition is below, so I ran it. Note that the ColumnStore index has all the columns in the definition. You can’t have Include Columns, and by having all the columns in there, you gain huge flexibility for the Index.

Create nonclustered columnstore index [IX_ColumnStore] on  [dbo].factCleanSales
(    OnlineSalesKey, DateKey, StoreKey, ProductKey,
PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber,
SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity,
ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost,
UnitPrice, ETLLoadID, LoadDate, UpdateDate
) with (Drop_Existing = OFF) on [PRIMARY];

Next I ran the test query.

Table ‘factCleanSales’. Scan count 4, logical reads 6378, physical reads 27, read-ahead reads 13347, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 515 ms,  elapsed time = 378 ms.

That’s less than a tenth of the time the Clustered index took, and the great thing is, because it’s got all the columns in there, you can create more complicated queries, and still get amazing speed. By running the query below, we still got great speed!

dbcc dropcleanbuffers;
go
SELECT
year(DateKey), storekey ,SUM(SalesAmount) AS SalesAmount
FROM   factCleanSales with (index ([IX_ColumnStore]))
GROUP BY year(DateKey), storekey
ORDER BY year(DateKey)

Table ‘factCleanSales’. Scan count 4, logical reads 8156, physical reads 78, read-ahead reads 16224, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 4603 ms,  elapsed time = 1522 ms.

Is there a Downside ?

Yes. Two actually.

Firstly, it’s an Enterprise only feature. This is annoying, however, it is linked to the second downside. You cannot insert, update or delete directly, when a ColumnStore index is present.

Msg 35330, Level 15, State 1, Line 1
UPDATE statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, then rebuilding the columnstore index after UPDATE is complete.

This means that if you are using it on a Data Warehouse, you’ll need to disable the index on the fact table, insert/update the data, then rebuild the index to get it back online. This isn’t ideal, however, there is an alternative. You can use Partition Switching to switch data in and out of the table.

Effectively, what you’ll be doing to insert data, is to load data into a partition table, with the same schema as the fact table, and switch it in. For updating or deleteing, you’d switch the appropriate partition out, update/delete the data, then switch it back in again. It’s more complicated (obviously), but the performance improvement gained by ColumnStore indexes should be worth it. Given that Table Partitioning is an Enterprise feature, it makes sense (kind of) that ColumnStore indexes should be too.

Partition Switching

To demonstrate how inserting into a table with a ColumnStore index on it was working, I dropped the indexes against the factCleanSales table, and partitioned and clustered it using the following:

CREATE PARTITION FUNCTION [myPartFunc](int) AS RANGE RIGHT
FOR VALUES (N’2003′, N’2004′, N’2005′, N’2006′, N’2007′, N’2008′, N’2009′,
N’2010′, N’2011′, N’2012′, N’2013′, N’2014′, N’2015′)

CREATE PARTITION SCHEME [myPartScheme] AS PARTITION [myPartFunc] TO
([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY])

CREATE CLUSTERED INDEX [ClusteredIndex_on_myPartScheme_634694274321586358] ON [dbo].[factCleanSales]
( [YearPart] )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [myPartScheme]([YearPart])

Then, added the ColumnStore back into the table, and this is automatically matched to the Partitioning function and scheme above.

CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_ColumnStore] ON [dbo].[factCleanSales] (    [OnlineSalesKey],    [DateKey],    [StoreKey],    [ProductKey],    [PromotionKey],    [CurrencyKey],    [CustomerKey],    [SalesOrderNumber],    [SalesOrderLineNumber],    [SalesQuantity],    [SalesAmount],    [ReturnQuantity],    [ReturnAmount],  [DiscountQuantity],    [DiscountAmount],    [TotalCost],    [UnitCost],
[UnitPrice],    [ETLLoadID],    [LoadDate],    [UpdateDate],    [YearPart]
)WITH (DROP_EXISTING = OFF)

Next, I created a table to switch the data in from, then loading it up, adding the ColumnStore index, and then switching the partition in using this:

CREATE TABLE [dbo].[factCleanSales_Part](
[OnlineSalesKey] [int] IDENTITY(1,1) NOT NULL,
[DateKey] [datetime] NOT NULL,
[StoreKey] [int] NOT NULL,
[ProductKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [int] NULL,
[SalesQuantity] [int] NOT NULL,
[SalesAmount] [money] NOT NULL,
[ReturnQuantity] [int] NOT NULL,
[ReturnAmount] [money] NULL,
[DiscountQuantity] [int] NULL,
[DiscountAmount] [money] NULL,
[TotalCost] [money] NOT NULL,
[UnitCost] [money] NULL,
[UnitPrice] [money] NULL,
[ETLLoadID] [int] NULL,
[LoadDate] [datetime] NULL,
[UpdateDate] [datetime] NULL,
[YearPart] [int] NULL
)

alter table [factCleanSales_Part] with check add constraint chk2006 check (yearPart=2006)

CREATE CLUSTERED INDEX [ClusteredIndex_on_myPartScheme_634694274321586358] ON [dbo].[factCleanSales_Part] (    [YearPart]
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [myPartScheme]([YearPart])

insert into factCleanSales_Part
select dateadd(yy,-1,DateKey), StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount,
ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost,
UnitCost, UnitPrice, ETLLoadID, dateadd(yy,-1,LoadDate),
dateadd(yy,-1,UpdateDate) , year(dateadd(yy,-1,DateKey)) from factOnlineSales
where year(dateadd(yy,-1,DateKey))=2006

CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_ColumnStore] ON [dbo].factCleanSales_Part (
[OnlineSalesKey],    [DateKey],    [StoreKey],    [ProductKey],    [PromotionKey],
[CurrencyKey],    [CustomerKey],    [SalesOrderNumber],    [SalesOrderLineNumber],
[SalesQuantity],    [SalesAmount],    [ReturnQuantity],    [ReturnAmount],
[DiscountQuantity],    [DiscountAmount],    [TotalCost],    [UnitCost],
[UnitPrice],    [ETLLoadID],    [LoadDate],    [UpdateDate],    [YearPart]
)WITH (DROP_EXISTING = OFF)

Next, to check that there are no records in the partition already for 2006, I ran this:

SELECT YearPart, $PARTITION.myPartFunc(YearPart) AS Partition,
COUNT(*) AS [COUNT] FROM factCleanSales
GROUP BY YearPart, $PARTITION.myPartFunc(YearPart)
ORDER BY Partition

image

Next, I switched the data in using this, and then checked the partition values using the statement above.

alter table [factCleanSales_Part] with check add constraint chk2006 check (yearPart=2006)

image

Delightfully, the fact table now has another partition, and all without removing the ColumnStore index on it.

For Extra credit…

Now, should you want to get more details out of the columnstore index, there are a couple of new DMV’s that can be used. They are:

  • sys.column_store_dictionaries
  • sys.column_store_segments

To see useful information like the sizing or number of rows per column, you can use this query:

select object_name(p.object_id) as ‘TableName’, p.partition_number,p.data_compression_desc,
c.name, csd.entry_count, csd.on_disk_size
from sys.column_store_dictionaries csd
join sys.partitions p on p.partition_id = csd.partition_id
join sys.columns c on c.object_id = p.object_id and c.column_id= csd.column_id
order by p.partition_number, c.column_id

which will return the following data. Summing the on_disk_size will give you the size in bytes of the index.

My Demo Environment

Just for transparency, the timings I was getting above weren’t on any huge server. They were on a virtual machine, running in VMWare Workstation v8.0.2 on Windows 7 SP1. SQL Server is 2012 (obviously), Developer Edition in 64bit.

image

Wrapping up..

I think it’s reasonably safe to say that this is the longest (in size and time) blog post I’ve written, so I apologise if it rambles a bit, but I hope you get the importance of ColumnStore indexes, and I hope you get the chance to use them.

Having been sitting on the fence for a while, I’m finally leaping off, and presenting at some community events. Following a false start with SQLBits (I submitted, but wasn’t voted in, and given the number of attendees I’m a little relieved about that!), I’ll be presenting at the following events over the next couple of months.

Hope to see you there!

24th April (Tues) – SQL Server in the Evening (6:30-6:50) – First Timers Slot (http://sqlserverfaq.com/events/392/Sessions-including-SQL-Server-Parallel-Data-Warehouse-at-the-sixth-SQL-Server-community-event-615pm-April-24th-Reading-Berkshire.aspx)

I’ll talk about using the CLR within SQL Server, why and when it should be used and then how.

25th April (Weds) – DevEvening (http://www.devevening.co.uk/)

26th May (Sat)– DDD Southwest (http://dddsouthwest.com/)
NOTE: This session isn’t confirmed yet, and is still reliant on being voted in. You can vote by going to the DDD Southwest site, linked above! )

Both DevEvening and DDD Southwest will be the same session, summarised below:

Going Native with SQL Server 2012 and C++

I’ll be going through the delights of creating a module to interact with SQL Server 2012, a function in T-SQL (briefly), then using C# to create a SQL CLR module, and then looking into the performance gains by making a C++ application querying the SQL Native Client (ODBC). All three sections will do the same job, and we’ll cover the advantages and disadvantages of each.

We’ll cover the following:

  • T-SQL, SQL CLR (C#)
  • SQL Server Native Client
  • Advantages and Disadvantages
  • Performance Opportunities
  • How to use it to connect to SQL Server from C++
  • How to query a database
  • Comparison between T-SQL, SQL CLR & C++ solutions

Slides and follow-up articles will be coming soon.

TSQL2sDay150x150Thanks to Argenis Fernandez for hosting this month.

A Random Start

I’ve had an interesting (I hope) experience in my career this far. I started out, working in a Pharmacy.. It was back there that I wrote my first computer program, well actually it was at college, but it was for them. Many eons ago, back in the early 90’s, I’d been learning Pascal at college, and things are easier to learn, if you have an aim, so I wrote a program to assist will the filling in of paperwork. It worked, and it was good. It stored data in a pipe-delimited file, rather than any kind of database. Not good, for many reasons, but it was only a single user application, and didn’t have a huge amount of data.

Know your limits…

After that, I went to university, where they tried to teach me C++, assembly and Haskell. They failed with assembly and Haskell, though the theory is still there. C++ I love, and keep going back to. In fact, with any luck, you may see me present on it at DDD Southwest in May, and at DevEvening in April… Sadly, while I do enjoy C++, I’m not good enough to do that as a career, and I don’t think I’d enjoy it as much if it was my bread+butter.

Part of the Degree I did (Computer Systems, since you asked) included a year work placement. I did this at a small IT Company, where I worked on the Support Desk. This was a pretty small company, so while I did Application support for customers, I also managed the internal infrastructure, created their website and a few other bits. It was a great experience and I really enjoyed it. So much so, that I went back there to do consultancy after I’d completed the degree.

While I was there, I learnt Windows NT, Visual Basic, Btrieve and had my first introductions to SQL Server (6.5 and 7). It was also here that I took my first Microsoft Certifications, covering Windows NT, SQL Server and assorted Networking topics.

Know when it’s time to move

After four years, I was starting to feel claustrophobic, and needed more of a challenge. At the start of 2000, I moved on, and went to work for a Siebel Consultancy. This was a big change, as while I’d done some consultancy work before, I really had to up my game. Not a bad thing, and I really found my feet with Siebel as it was based on SQL Server, and had the ability to have custom components written in Siebel VB or eScript.

More Certifications

After a great couple of years, with big Siebel implementations, including a great system linking Siebel to SAP, via XML integrations (my first real experience with enterprise-grade XML), the Siebel market for us dropped off after Oracle bought Siebel ($5.8 Billion!).

I then moved my skills to Microsoft CRM, starting with V1.2 (unpleasant), then v3.0 (much better), and also SharePoint, all of which had associated MS Certifications which I completed, and all of which were based on SQL Server.

Try to see the Obvious…

At some point, and I can’t remember when, I realised that I’d been working with SQL Server, for over 12 years, and now it’s nearing 15. I hadn’t really noticed.

For the past two years I’ve been working as a Consultant, building Data Warehouses primarily, though I also do some C# for SQL CLR work, and C++ for fun. I’ve done a ridiculous number of Certifications (mostly Microsoft) and, my motivation is to get validation of my skills. I’m working on the MCM: SQL Server certification at the moment, and have the final Lab exam in May.

What Next ?

I don’t know. I’m pretty sure there will be SQL Server involved though. Fortunately the new version of SQL Server is out now, so the new Certifications will be out soon, and that’ll keep me occupied for a while.

From reading this though, the one thing that strikes me, is that I’ve been very lucky to be in a career that keeps my brain occupied. If there are less taxing times, then I have C++ to stretch the grey matter.

One other thing; I’ve also found that it is good to keep pushing yourself. Always try to work just outside your comfort zone. If everything is easy, then you need to push yourself more.

To end, a couple of thoughts from wiser people than me.

image

Recently, I’ve been working on a project where the reference data is stored in SharePoint lists. While it is possible to get the information out of the SQL Server database directly, using something like the T-SQL below, it’s a bit messy.

 1: SELECT      dbo.UserData.tp_ID,
 2:    dbo.UserData.tp_ListId,
 3:    dbo.UserData.tp_Author,
 4:    dbo.UserData.nvarchar1,
 5:    dbo.UserData.nvarchar2,
 6:    dbo.UserData.nvarchar3
 7: FROM            dbo.Lists
 8: INNER JOIN
 9:                  dbo.UserData ON dbo.Lists.tp_ID = dbo.UserData.tp_ListId
 10: WHERE    (dbo.Lists.tp_title like 'TestList')
 11:

I wasn’t able to use this to get the data out, as the client doesn’t allow direct access to the SharePoint database, which is entirely reasonable, given that it’s their corporate intranet.

To get around this, I found a very useful set of additional modules for Integration Services (http://sqlsrvintegrationsrv.codeplex.com/), one of which is a SharePoint List Source and Destination. These then allow you to read the data directly.

Using the SharePoint List Source & Destinations

1. The first step is to download the SharePoint List Source and Destination module from http://sqlsrvintegrationsrv.codeplex.com/, and install it.

2. Having done that, you need to start up BIDS (BI Development Studio / VS 2008) and create an ‘Integration Services Package’.

3. You’ll need to add the two new Data flow items into the Toolbox (in Tools > Choose Toolbox Items, in the SSIS Data Flow Items section)

image

4. Add a Dataflow Task to the Control Flow in the SSIS Package.

image

5. Right click on the Connection Manager Section at the bottom of the Control Flow, and choose SPCRED (Connection Manager for SharePoint Connections). Click OK, when the Dialog for the SharePoint Connection opens.

image

6. Then drill into the Data Flow Task, to take you to the Data Flow. In there, drag in a SharePoint List Source

image

7. Right click on the List Source, choose Show Advanced Editor. In the Connection Managers tab, pick the SharePoint Connection you created in step 5.

image

8. Next, click on the Component Properties tab. In this tab, you need to specify the Name of your SharePoint list (SiteListName) and the URL of your SharePoint server (SiteUrl). The SiteUrl is the Parent site within which your List appears. If you want to filter the information from SharePoint, you can modify the CamlQuery section in here, and add a SharePoint CAML query.

image

9. Once you’ve populated this, click on Refresh, and if everything is working, you’ll be able to move to the next tab. If there are errors (such as an incorrect SiteUrl), you’ll get errors like the one below.

image

10. Moving on to the Column Mappings tab, then gives you a list of fields and mappings, representing the Available fields from SharePoint (on the left) and fields that will be available to pass out of the List Source (on the right). You can remove fields that are not relevant here, if you’d like, then click Ok, to return to the Data Flow.

image

11. We need to add an OLE DB Connection manager, by right clicking Connection Managers at the bottom, and choosing ‘New OLE DB Connection’.

12. To get the SharePoint list contents into a database table, we need to add an OLE DB Destination, so drag that into the Data Flow and hook the Green output from the SharePoint List Source to the top of the OLE DB Destination. You’ll then see that there is a red X on the OLE DB Destination, so we need to make some changes.

image

13. Since we need to make changes to the OLE DB Destination, double click on the OLE DB Destination. As shown below, we need to specify a table for the SharePoint data to go to. The drop down list has a list of the tables in the database connected to the OLE DB Connection Manager, so pick a table (if you’ve made one already) or click new to create a new table.

image

14. Then click ‘Mappings’ on the left, and it’s possible to link the field in the source (SharePoint List) to your destination table.

image

15. You’ll then be able to run this SSIS Package, and assuming all is running successfully, you’ll see green boxes.

image

NOTE: Any text fields that are stored in SharePoint Lists, are stored as Unicode strings in the database (so nvarchar).

Further documentation on using these adapters is available here.

This is the first in a series of blog posts I’m planning to do, in preparation for a potential SQLBits session in March 2012.

This article will introduce how, at the most basic level, SQL Server can be communicated with using C++ and Native code, rather than using the .NET Framework.

The code shown below follows through the basic process, defined in the general flowchart for ODBC Applications as seen on MSDN. This was created in VS2010.

Using this process, we connect to a server (a local copy of SQL Server, with AdventureWorks 2008 R2, and does a straightforward query against it to do a Row Count of the Person table.

 1: // The bare basics to query SQL Server, using the Native Client, in C++
 2: //
 3: #include "stdafx.h"
 4: #include <iostream>
 5: using namespace std;
 6:
 7: #define _SQLNCLI_ODBC_
 8: #include "sqlncli.h"
 9: #include "sqlext.h"
 10:
 11: int _tmain(int argc, _TCHAR* argv[])
 12: {
 13:     // Define Handles
 14:     SQLHANDLE hEnv, hDBCCount, hStmtCount;
 15:     SQLINTEGER iRowCount, iRowCountInd;
 16:
 17:     char sConnString[120] = "Driver={SQL Server Native Client 10.0};Server=localhost;Database=AdventureWorks2008R2;Trusted_Connection=yes;";
 18:
 19:     // Step 1 - Assigning an Environment Variable
 20:     SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &hEnv);
 21:
 22:     // Step 1 - Declaring the use of ODBCv3
 23:     SQLSetEnvAttr(hEnv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER) SQL_OV_ODBC3, 0);
 24:
 25:     // Step 1 - Creating a Connection Handle
 26:     SQLAllocHandle(SQL_HANDLE_DBC, hEnv, &hDBCCount);
 27:
 28:     // Step 1 - Setting Connection Attributes
 29:     SQLSetConnectAttr(hDBCCount, SQL_ATTR_AUTOCOMMIT, (SQLPOINTER) SQL_AUTOCOMMIT_OFF, SQL_IS_INTEGER);
 30:
 31:     // Step 1 - Initiating the connection to SQL Server
 32:     SQLDriverConnect(hDBCCount, NULL, (SQLTCHAR *) sConnString, SQL_NTS, NULL, 0, NULL, SQL_DRIVER_NOPROMPT);
 33:
 34:     // Step 2 - Creating a Handle for the Statement
 35:     SQLAllocHandle(SQL_HANDLE_STMT, hDBCCount, &hStmtCount);
 36:
 37:     // Step 3 - Connecting to AdventureWorks2008R2
 38:     SQLExecDirect(hStmtCount, (SQLTCHAR *)"USE AdventureWorks2008R2;", SQL_NTS);
 39:     cout << "USE AdventureWorks2008R2;" << endl;
 40:
 41:     // Step 3 - Executing Query against Person.Person table
 42:     SQLExecDirect(hStmtCount, (SQLCHAR *)"select count(1) from Person.Person;" , SQL_NTS);
 43:     cout << "select count(1) from Person.Person;" << endl;
 44:
 45:     // Step 4a - Assigning a variable to the return column
 46:     SQLBindCol(hStmtCount, 1,SQL_C_ULONG, &iRowCount, 0, &iRowCountInd);
 47:
 48:     // Step 4a - Retrieving the data from the return dataset
 49:     SQLFetch(hStmtCount);
 50:
 51:     cout << "Rows = " << iRowCount << endl;
 52:
 53:     // Step 4a - Remove the Cursor
 54:     SQLCloseCursor(hStmtCount);
 55:
 56:     // Step 5 - Closing down and Cleaning up
 57:     SQLDisconnect(hDBCCount);
 58:     SQLFreeHandle(SQL_HANDLE_DBC,hDBCCount);
 59:     SQLFreeHandle(SQL_HANDLE_ENV, hEnv);
 60:
 61:     return 0;
 62: }

Over the coming weeks, I’ll be expanding on this to get better performance for more complex processes and going through what each of these sections to.

I hope you find these articles interesting.

SQL Server ODBC on Linux

Disclaimer : I’m not a Linux Expert, and I’m sure that doing everything as root is bad, just like doing everything as a Domain Admin account is bad.

Having seen that the CTP version of the Microsoft SQL Server ODBC Driver for Linux has been released, I thought that it would be an interesting thing to play with. Particularly since it might be something I’ll interact with using C++.

Getting Ready

Officially, it’s supported on Red Hat Enterprise Linux, but I’ve not got that, and you have to pay for it (not much, but still). Having downloaded Fedora 16,installed it in a VM (VMWare Workstation), and fired it up, I needed to install a number of prerequisites.

Using the Add/Remove Software option in Applications –>System Tools, I installed these Packages:

  • Development Libraries
  • Development Tools

I also needed to install wget. Type it into Filter box, tick the box against the result and click Apply.

Then download the driver from here: http://www.microsoft.com/download/en/details.aspx?id=28160

Note, that you’ll also need the unixODBC Driver manager, and the current version is 2.3.1. I couldn’t get that working, but 2.3.0 does work, and is available to download here (unixODBC-2.3.0).

Installing it

To get everything to work, I downloaded the files into the Downloads directory, and follow the instructions on the MS Downloads page (copied below, and with an item (3) added by me to make life easier).

To install the driver manager:

  1. Make sure that you have root permissions.
  2. Navigate to the directory where you downloaded sqlncli-11.0.1720.0.tar.gz and extract it:
    cd ~/Downloads/
    tar xvf sqlncli-11.0.1720.0.tar.gz.
  3. (added by me) Copy the unixODBC-2.3.0.tar.gz file into the  sqlncli-11.0.1720.0 folder with
    cp unixODBC-2.3.0.tar.gz sqlncli-11.0.1720.0/
  4. Change to the sqlncli-11.0.1720.0 directory, where you can run build_dm.sh to install the unixODBC Driver Manager:
    cd ./sqlncli-11.0.1720.0
    ./build_dm.sh –help
  5. You can install the driver manager by executing the following command:
    ./build_dm.sh
    Note: you can also download the driver manager manually at http://www.unixodbc.org/ and use the downloaded archive locally:
    ./build_dm.sh –download-url=file://unixODBC-2.3.0.tar.gz
  6. Type “YES” to proceed with unpacking the files. This part of the process can take up to five minutes to complete.
  7. After the script stops running, follow the instructions on the screen to install the unixODBC Driver Manager.

Next up, we need to install the driver, again, follow the instructions from the MS Download page (copied here):

To install the driver:

  1. Make sure that you have root permissions.
  2. Navigate to the directory where you downloaded sqlncli-11.0.1720.0.tar.gz and extract it:
    cd ~/Downloads/
    tar xvf sqlncli-11.0.1720.0.tar.gz.
  3. Change to the sqlncli-11.0.1720.0 directory, where you can run install.sh to install the driver:
    cd ./sqlncli-11.0.1720.0
    ./install.sh –help
  4. (Optional) You may want to make a backup of odbcinst.ini. The driver installation will update odbcinst.ini. odbcinst.ini contains the list of drivers that are registered with the unixODBC Driver Manager. Execute the following command to discover the location of odbcinst.ini on your computer:
    odbc_config –odbcinstini.
  5. Before you install the driver, you may run a verify step to check if your computer has the required software to support the Microsoft SQL Server ODBC Driver for Linux:
    ./install.sh verify
  6. When you are ready to install the Microsoft SQL Server ODBC Driver for Linux CTP, run the install script:
    ./install.sh install
  7. After reviewing the license agreement, type “YES” to continue with the installation.
  8. Verify that Microsoft SQL Server ODBC Driver for Linux CTP was registered successfully:
    odbcinst -q -d -n “SQL Server Native Client 11.0”

Resolving library issues

That then completed the installation. However, I did get a couple of issues when running sqlcmd. These issues were down to different versions of a couple of Linux SSL libraries being installed, rather than the expected version. Having had a root (pun not intended) around, the issues were resolved by adding a couple of symbolic links (kind of like shortcuts, kind of…), by doing this:

ln –s /lib64/libcrypto.so.1.0.0.e /lib64/libcrypto.so.6
ln –s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6

Time to Play!

As if by magic, I can now query a SQL Server database, from Linux!

image

This was surprisingly straightforward I thought. My next thoughts will be to see if I can communicate with it from code (C++ since it’s Linux ).

Update – 26/1/2012

It’s been requested that I post the odbc.ini and odbcinst.ini files I used. These are shown below, and are unchanged by me.

ODBC.INI

<empty file>

ODBCINST.INI

[SQL Server Native Client 11.0]
Description=Microsoft SQL Server ODBC Driver V1.0 for Linux
Driver=/opt/microsoft/sqlncli/lib64/libsqlncli-11.0.so.1720.0
UsageCount=1

Update 16/12/2011 :  I’ve Passed the exam! If you are taking it, the very best of luck!


I’m writing this now, as I have the exam on Monday, and after that point any comments I make on the exam will be influenced by taking it, and therefore restricted by the NDA.

So, here are a list of the Resources I’ve used to prepare for this exam:

Microsoft Exam Preparation Page – This has alot of links to Whitepapers, and books. I’ve read almost all of the whitepapers (all of them by Monday), and 60% of the books.

Online Training videos / Associated Resources – These are a phenomenal resource. Specific, highly detailed training by Paul Randal, Kimberley Tripp, Brent Ozar and Bob Beauchemin. These are useful, even if you don’t want to do the Master Certification.

SQL Skills Training IE1 – I attended the UK course held by SQL Skills in the UK. Ideally, I’d have attended all 4, but due to Training budgets and Time, I couldn’t. This was a great course, and I highly recommend their courses. Being taught SQL Server by such highly skilled Trainers (Paul Randal and Kimberley Tripp) is an amazing experience.

Community Training Notes – Hosted by Neil Hambly (who was also on the IE1 training), this is a great resource also, and has sections of all of the areas an MCM should know. Thanks Neil.

Community events – There are many, SQLBits, SQLPass (24Hours of PASS, and the Nordic Rally event, in particular), SQL Maidenhead, SQL in the Evening, they are all really helpful, and thank you.

SQLBits also have a huge number of videos of the conference sessions available.

Of course, on top of this is the past 12 years of experience I’ve had with SQL Server (though the most has been over the past  7-8 years). For that, I’d like to thank my previous employer JI Software and most of all to my current employer, TAH Limited. I’ve been fortunate enough to work on some really challenging Data warehousing projects over the past few years, including a huge one for Vodafone (read more here).

If you are looking to do the certification, the very best of luck to you!

I passed this exam on Friday. This was a hard one. Having taken it just over 2 months ago, and failing it, I changed the way I prepared for exams.

Previously, I’d been reading the material on the MS Learning Plan for the exam, and any associated books. For the 70-451 exam I did 6 weeks or so back, I did alot of reading of blog articles, MSDN and Books Online. For this exam however, I didn’t find the Learning plan overly helpful, and there are a huge amount of blog articles on Database Administration (some of which is contradictory).

So, having failed the exam, which isn’t a bad thing, as it proves that the exams aren’t so straightforward that everyone can pass first time, I made notes on the areas I was weakest. Then looked for relevant articles in those areas. The difficult part was finding blog articles by people who knew what they were talking about. A couple of the most useful were John Sansom (particularly the Something for the Weekend: SQL Server Links posts), and the articles from Jonathan Kehayias from SQLSkills (particularly those on Extended Events and Clustering). Thanks to you both!!

I did also find a useful book, and even better, it was one I already had. SQL Server 2008 Administrator’s Pocket Consultant (2nd Edition) by William Stanek was a great help, and while I’d read bits and pieces before, going through it from cover to cover was really helpful.

Having completed this exam, I’ve now completed the prerequisites for the SQL Master Certification, and I’ll be looking to take the Knowledge exam before Christmas.

First though, I’m going to have a break from studying for SQL Server exams. Just for a couple of days. Something different for a change, PowerShell or C++. We’ll see… Just for a couple of days though…