Category: Events


Having been sitting on the fence for a while, I’m finally leaping off, and presenting at some community events. Following a false start with SQLBits (I submitted, but wasn’t voted in, and given the number of attendees I’m a little relieved about that!), I’ll be presenting at the following events over the next couple of months.

Hope to see you there!

24th April (Tues) – SQL Server in the Evening (6:30-6:50) – First Timers Slot (http://sqlserverfaq.com/events/392/Sessions-including-SQL-Server-Parallel-Data-Warehouse-at-the-sixth-SQL-Server-community-event-615pm-April-24th-Reading-Berkshire.aspx)

I’ll talk about using the CLR within SQL Server, why and when it should be used and then how.

25th April (Weds) – DevEvening (http://www.devevening.co.uk/)

26th May (Sat)– DDD Southwest (http://dddsouthwest.com/)
NOTE: This session isn’t confirmed yet, and is still reliant on being voted in. You can vote by going to the DDD Southwest site, linked above! )

Both DevEvening and DDD Southwest will be the same session, summarised below:

Going Native with SQL Server 2012 and C++

I’ll be going through the delights of creating a module to interact with SQL Server 2012, a function in T-SQL (briefly), then using C# to create a SQL CLR module, and then looking into the performance gains by making a C++ application querying the SQL Native Client (ODBC). All three sections will do the same job, and we’ll cover the advantages and disadvantages of each.

We’ll cover the following:

  • T-SQL, SQL CLR (C#)
  • SQL Server Native Client
  • Advantages and Disadvantages
  • Performance Opportunities
  • How to use it to connect to SQL Server from C++
  • How to query a database
  • Comparison between T-SQL, SQL CLR & C++ solutions

Slides and follow-up articles will be coming soon.

TSQL2sDay150x150Thanks to Argenis Fernandez for hosting this month.

A Random Start

I’ve had an interesting (I hope) experience in my career this far. I started out, working in a Pharmacy.. It was back there that I wrote my first computer program, well actually it was at college, but it was for them. Many eons ago, back in the early 90’s, I’d been learning Pascal at college, and things are easier to learn, if you have an aim, so I wrote a program to assist will the filling in of paperwork. It worked, and it was good. It stored data in a pipe-delimited file, rather than any kind of database. Not good, for many reasons, but it was only a single user application, and didn’t have a huge amount of data.

Know your limits…

After that, I went to university, where they tried to teach me C++, assembly and Haskell. They failed with assembly and Haskell, though the theory is still there. C++ I love, and keep going back to. In fact, with any luck, you may see me present on it at DDD Southwest in May, and at DevEvening in April… Sadly, while I do enjoy C++, I’m not good enough to do that as a career, and I don’t think I’d enjoy it as much if it was my bread+butter.

Part of the Degree I did (Computer Systems, since you asked) included a year work placement. I did this at a small IT Company, where I worked on the Support Desk. This was a pretty small company, so while I did Application support for customers, I also managed the internal infrastructure, created their website and a few other bits. It was a great experience and I really enjoyed it. So much so, that I went back there to do consultancy after I’d completed the degree.

While I was there, I learnt Windows NT, Visual Basic, Btrieve and had my first introductions to SQL Server (6.5 and 7). It was also here that I took my first Microsoft Certifications, covering Windows NT, SQL Server and assorted Networking topics.

Know when it’s time to move

After four years, I was starting to feel claustrophobic, and needed more of a challenge. At the start of 2000, I moved on, and went to work for a Siebel Consultancy. This was a big change, as while I’d done some consultancy work before, I really had to up my game. Not a bad thing, and I really found my feet with Siebel as it was based on SQL Server, and had the ability to have custom components written in Siebel VB or eScript.

More Certifications

After a great couple of years, with big Siebel implementations, including a great system linking Siebel to SAP, via XML integrations (my first real experience with enterprise-grade XML), the Siebel market for us dropped off after Oracle bought Siebel ($5.8 Billion!).

I then moved my skills to Microsoft CRM, starting with V1.2 (unpleasant), then v3.0 (much better), and also SharePoint, all of which had associated MS Certifications which I completed, and all of which were based on SQL Server.

Try to see the Obvious…

At some point, and I can’t remember when, I realised that I’d been working with SQL Server, for over 12 years, and now it’s nearing 15. I hadn’t really noticed.

For the past two years I’ve been working as a Consultant, building Data Warehouses primarily, though I also do some C# for SQL CLR work, and C++ for fun. I’ve done a ridiculous number of Certifications (mostly Microsoft) and, my motivation is to get validation of my skills. I’m working on the MCM: SQL Server certification at the moment, and have the final Lab exam in May.

What Next ?

I don’t know. I’m pretty sure there will be SQL Server involved though. Fortunately the new version of SQL Server is out now, so the new Certifications will be out soon, and that’ll keep me occupied for a while.

From reading this though, the one thing that strikes me, is that I’ve been very lucky to be in a career that keeps my brain occupied. If there are less taxing times, then I have C++ to stretch the grey matter.

One other thing; I’ve also found that it is good to keep pushing yourself. Always try to work just outside your comfort zone. If everything is easy, then you need to push yourself more.

To end, a couple of thoughts from wiser people than me.

image

It’s MemeMonday : What SQLFamily means to you.

Everyone has a family. Only the lucky few (comparitively) have a #sqlfamily. It’s a great way to think of the people we interact with. Some a cool, some are annoying, but all of them bring their own thoughts and experiences.

I’ve not been actively taking part of the SQL community for long, only the past 6 months or so. Before that i was mainly involved with the UK developer community, and when I say involved, I mean attended events. My developer skills were nowhere near good enough to keep up with the likes of Jon Skeet, or many of the other great community-developers.

I’m hoping to become a more active member of the SQL community in the UK, as I go through the SQL MCM certification. I’ve already made a few friends in there, and am starting to feel confident enough to do presentations on SQL Server topics. Even started to do presentations on it internally at work. Maybe that makes me the annoying cousin who tries to hard? That’ll have to be for someone else to decide. 🙂

I’m writing this, sitting in a Starbucks at London Heathrow, on the way to the SQL Rally Nordic event in Stockholm, where I’ll be able to meet some more #SQLFamily members. Mostly the cool uncles & aunts. it’s shaping up to be a great event.

What does SQLFamily mean to me? I love you guys, and appreciate the openness and way everyone is so willing to share their knowledge. Thank you!

imageThese are the sessions I’m planning on ‘virtually’ attending at the 24 Hours of PASS event today and tomorrow (7th-8th Sept 2011). I’m planning on adding to this post to give thoughts after the sessions.

Session 02 – Start time 13:00 GMT on Sept 7
SAN Basics for DBAs
Presenter: Brent Ozar

Session 03 – Start time 14:00 GMT on Sept 7
Diving Into Extended Events
Presenter: Jonathan Kehayias

Session 05 – Start time 16:00 GMT on Sept 7
Why PowerShell?
Presenter: Aaron Nelson

Session 10 – Start time 21:00 GMT on Sept 7
Secrets of the SQLOS – Leveraging Microsoft SQL Server Internal Operating System for Improved Scalability and Performance
Presenter: Maciej Pilecki

Session 11 – Start time 22:00 GMT on Sept 7
Hardware 301: Diving Deeper into Database Hardware
Presenter: Glenn Berry

Would have liked to see Grant Fritchey presenting Execution plans, but that’d be finishing at 1am in the UK, and that’s too late..

Session 13 – Start time 12:00 GMT on Sept 8
Zero to Cube – Fast Track to SSAS Development
Presenter: Adam Jorgensen

Session 15 – Start time 14:00 GMT on Sept 8
Disaster Recovery Is Not Just About Technology
Presenter: Edwin Sarmiento

Session 18 – Start time 17:00 GMT on Sept 8
Baseline Basics or: Who Broke the Database
Presenter: Adam Machanic

Session 22 – Start time 21:00 GMT on Sept 8
Important Trace Flags That Every DBA Should Know
Presenter: Victor Isakov

Session 23 – Start time 22:00 GMT on Sept 8
Policy-Based Management in a Nutshell
Presenter: Jorge Segarra

T-SQL Tuesday again, and this month it’s hosted by Amit Banerjee at TroubleshootingSQL.

One of the things that I’ve become more aware of, due to preparation for the MCM certification and working on larger data warehousing projects is that multiple database file can always give you a performance improvement.

From testing that I’ve done, this is even apparent on small local databases.

As part of a series of blog posts that I’m doing, around the TPC-H benchmarks, I’ve been loading and and reloading a dataset of approx. 8.6 million records (in 8 entities). This dataset is an example of the default, 1Gb set from the TPC-H benchmark (downloadable here (approx 276mb), or you can read my previous blog article on creating it yourself)

To get some decent figures to show how how much of an improvement you can get with multiple files, I’ve created a script which does the following steps.

    1. Creates the database (2Gb per database file, and 512mb for the log file)
    2. Creates the tables
    3. Bulk loads data using a set of flat files
    4. Gives a count of each of the tables

A copy of the script is available here.

I carried out a few different tests, based on :

  • Single or Multiple files
  • Different Media
    • Running on the C drive (5400rpm SATA drive)
    • Running on USB Pen Drives
    • Running on an eSATA drive
  • Splitting over multiple media
  • Having the Transaction log stored separately

The results I found are shown below (times are shown in Minutes, Seconds, milliseconds (mm:ss:ms) )

image

The benchmark is the run on a single file, on my internal drive.

DB Build is the time to create the database, note that I’m using Instant File Initialisation, and so should you (unless you have a very good reason not to!)

Data load is the time to build the tables and load them

Conclusions

Effectively, you can see the following:

    1. Regardless of media, Multiple files always give a performance improvement
    2. USB Pen Drives are rubbish. Don’t use them for databases
    3. A fast drive, separate to the O/S, and separate from the System database will give a significant improvement

The best performance I managed to achieve was with the multiple database files, using the eSata drive.

However, given that I work primarily on a laptop, the fact that I can get a huge improvement (over a third!) by using multiple database files on the internal drive is impressive.

I’d be interested to know how much of an improvement you get on this, how much does your mileage vary ?

Thanks for reading, and thanks to Amit for hosting.

So, it’s another T-SQL Tuesday! This time, it’s around Disaster Recovery, and is hosted by Allen Kinsel. Thanks Allen.

For this month, I thought I’d share the experiences I had on a project a few years ago. This was for a multi-site retail organisation, where all the servers were hosted in the head office.

We were in the process of consolidating a number of SQL Server’s onto a more powerful, and up-to-date SQL Server environment, and the process was going well. We’d got buy-in from the board, and had started the process of consolidation and upgrades.

However (isn’t there always a however ?) a couple of weeks before Christmas, there was a power outage in the building. All power was gone. Due to the way the communications was configured, all network access on all remote sites went through the head office. Power going down meant no access to the internet for the sites. It also meant no access to the corporate servers (email, intranet, ERP systems). This also meant that the corporate website went down, since that was hosted in-house.

Once power came back up, 4 or 5 hours later, the servers restarted, and all was well again.

In the aftermath, a number of issues were raised. The main ones being:

  • Inability for customers to access the website
  • Corporate systems being offline

The website was the easiest, and quickest to resolve since that could be outsourced. The Corporate systems were more complicated, and required a little more thought.

We needed to get a Disaster Recovery plan. Having spent a good long time going through everything that needed to be covered, and how to ensure that systems were up, running and resilient enough, but not outrageously overpriced (we thought…), we submitted the plan to the board.

It included things like:

  • Remote SQL Server & mirroring of a dozen or so databases
  • Remote Exchange Server
  • Backup Internet connection
  • Support for all this, so there is someone to call if it does all go wrong…

The first time through it was rejected due to price, so we trimmed a few bits (slower backup line, changes to SQL Server licencing), and resubmitted. It then went into ‘consideration’ limbo. By the time I moved off the project (8 months later), it was still being considered. Having said that, there hasn’t been another power-outage since then (2.5 years ago), so maybe they are lucky, but then again….

So what I’m saying is, make the investment in planning and ensure it’s all up and running, because you never know…

It’s Un-SQL Friday, so all about something SQL-Adjacent. This month, something related to Lessons learned as a speaker. Ok, so that’s much easier than the last one I took part in. I’ve done a couple of presentations, and the first, well, that wasn’t so great.

So being a BI Developer, and aside from the past year or so, I previously spent pretty much the past 8-10 years writing code.  So, I’ve not done a huge amount of presentations.

However, having been to a number of developer and SQL focussed presentations, by a wide range of people, I allowed myself to be talked into doing a session. It wasn’t a long session, just 15 minutes (seemed longer…), covering an ORM (Object Relational Mapper). I’d chosen the Telerik OpenAccess one, and would be covering it as part of an ORM themed evening at our local community group (DevEvening).

So, having prepared a talk, slides, gotten some input from Telerik, and done some example code. I went off to do the session. The problem I found was that coding on the fly is bad. It can never end well, in the majority of cases.

What I’d done was write down what steps I needed to code, and although I’d run through it a few times beforehand, and had a couple of times with no issues, there is a difference between doing it at your desk, and doing it in a presentation situation. Actually, I should have had saved projects, to do each of the examples.

I know this now, and when I do my next one, something SQL related, I think, I’ll be better prepared for what I’m doing.

Aggregate Functions are the topic of this months T-SQL Tuesday. An interesting one, and it made me think about what I’ve done, that could be considered interesting, with relation to Aggregation.

One thing that sprung to mind was some work I did on a Data Warehouse. I worked on a Project a while back (a few years now), that included a data source from an ERP system, that was effectively a table populated from a series of Excel worksheets. The table was setup so that each cell in the worksheet had it’s own row. This had resulted in 6,435 (cells A1 to I715) rows, per project, per financial period, so 6435*200 (and then some) * 12 (so 15,444,000) per year. The code and table samples below are representative of the process that we followed, and the table structures have been appropriately anonymised, but you get the general idea.

It wasn’t necessary to load all the source data into the data warehouse, since there was alot of information that we didn’t need. Effectively, this was the process that we had.

image

To get the values out for the project, in the correct form, the following T-SQL was used:

SELECT project_id,xl_month,xl_year,
    MAX(CASE WHEN xl_cellref ='A1' THEN xl_value END) AS 'A1',
    MAX(CASE WHEN xl_cellref ='A2' THEN xl_value END) AS 'A2'
FROM dbo.xl_test
GROUP BY project_id, xl_month,xl_year

After a bit of time with this running, we made some changes, and ended up with the following:

SELECT project_id, xl_month, xl_year,  [A1], [A2]
    FROM (
        SELECT project_id, xl_month, xl_year, xl_cellref, xl_value
        FROM dbo.xl_test) AS xl_test
    PIVOT( MAX(xl_value) FOR  xl_cellref IN
    ([A1],[A2])
    ) AS aPivot

This (and some of the other changes we made) actually improved the performance of the DWH load by approximately 25%, however, I’d imagine a fair chunk of that was down to the fact that Pivot is quicker than a dozen or so case statements.

Well, this was an interesting one. The point of Un-SQL Friday is something that is SQL-adjacent, which is a little vague, to say the least.

This month, and since it’s Valentines day, it’s focussed on sharing the love, and on companies that are ‘doing it right’.

Having thought about this, I could have talked about SQLskills who are doing great MCM Level SQL training  (not to mention, acquiring @Brentozar), I could have talked about Microsoft who are doing good things with SQL Server (e.g. Features of Denali, etc), or I could’ve talked about Fusion-IO, and their great work on IO hardware for getting serious performance out of SQL Server (and for the CrappyCodeGames at SQLBits), but they aren’t really SQL-adjacent.

One of the things that a company, that is doing the ‘right thing’, is that you don’t notice them. You use their wares, and only notice when things go pear-shaped.

Given that, the only company, that I use directly, and is vaguely SQL-adjacent, is Zen Internet (@zeninternet). I’ve been a customer of theirs for nearly 6 years, and have only had to get in tough with them when I’ve wanted to change my services (such as changing my Internet access from ADSL to Fibre, or if I wanted to change from Linux hosting to Windows Hosting).

Zen customer service is the best in the UK, and is based in the UK (rather than being offshored).

I am aware that a lot of readers of this will not be UK-based, but, a lot of companies could look at how Zen are doing business, and take note.

Some of their key points, from my perspective are:

  • UK based customer service
  • Prompt and competent support staff
  • No traffic-shaping
  • No spam emails trying to upsell you
  • Respect from BT engineers
  • All in all, great job Zen, thank you, and happy Valentines day!

ps. If you are wondering how this is even vaguely SQL related, I have a website, running a database on it, and that’s using SQL server on it… Vague, but hey.. 😉

This is my first (of many, hopefully) post  for T-SQL Tuesday. I’ve been watching for the next invite to come out, and I’m glad to see it’s on something that I can comment on, Automation.

I’ve been doing alot of work recently on Data Warehousing, and thought it’d be useful, and relevant, to run through one of the tools we use to automate the loading of Data Warehouses.

While we could (and do, sometimes) use SSIS to load the Data Warehouse, we primarily use a tool called WhereScape RED (Trial versions, whitepapers, etc. available here: http://www.wherescape.com). This imageproduct makes the process of creating, prototyping, building, and the on-going running and maintenance of a Data Warehouse much more manageable, and efficient.

While you can read all about how to use this product online, I’m going to run through some of the key points, and how it actually interacts with SQL Server.

On the left you’ll see a list of the different types of data that RED can use and generate. If you’ve done any work with Data warehousing, the majority of these will be familiar to you.

Connection

These are the connections that are used to link RED with the data warehouse. This would include items such as flat files (on Windows or Unix), Analysis Services (2000 or better) or a database (using any form of ODBC connection).

Load Tables

Load tables are the source for all the data, so whether your data is coming from a file, another database or wherever, it’ll be brought into the system, through here, and into a table called load_<something>. These are created in a nice straightforward manner, imageby dragging the file from a Source Browser (on the right), into the main window (as seen below). This then creates a table, with a set of fields, which are then used to populate the automatically generated documentation!

It’s worth pointing out that, which each of the steps allow you to customise the T-SQL generated by RED, the Load process is the only one that doesn’t allow you to see the initial script, though you can add T-SQL to run before and after the load process.

Stage Tables

Stage tables are where all the work actually takes place. This is where you can modify the SQL to change the data from the Load tables into another form, add key’s, reformat fields, and so on. It’s possible to have multiple stage tables daisy-chaining into one another, so your final stage table would be the data as it would be prior to the Dimension or Fact Table.

Dimensions and Fact Tables

Finally, you have the dimension and fact tables, and these are populated by the prior stage tables. By this point, you should have all your data ready to go.

Get to the Point!

imageBy this point, you’re probably thinking, this has little to do with T-SQL, or Automating it.… Well, I’m coming to that. What RED does is generate T-SQL for you, handle it all using a Source control system, and allow you to automate the processing using a Scheduler.

From my experience, the Generated SQL is very performant, and the wizards that are used to get the information required to generate them, give you plenty of rope to hang yourself with.

As you can see from the screen on the left, you can choose how the data would be accessed, and add Hints if need be.

Jobs and Scheduling

One of the really nice things about RED is the scheduling function. Through this, you can setup imagemultiple tasks (such as Load_customer, load_product, stage_customer_1, stage_customer_2, etc.), define the threads that can be used, and how they interact. Therefore you can load the customer and products tables from the source system simultaneously, and similarly with any other sets of tasks. While RED doesn’t suggest how this would be achieved (it doesn’t know your source systems, or how which jobs are likely to interact), it will allow you to specify the parallelism to a high degree of granularity.

You can also set dependencies (so, don’t run this job until this other one has completed), and you can run commands based on the success or failure of the jobs also. This means you can send custom emails based on the results. All logging of the jobs is also written to a database, so you can query on it to see if tasks are taking longer over time, and if there are any spikes in time taken.

What’s Next ??

Hopefully, you’ve found this interesting. I’d highly recommend trying out RED as it’s a great tool for automating the loading of data. I’d welcome any feedback you have about this article.

Happy T-SQL Tuesday!