Category: Microsoft


So it’s TSQL Tuesday once more, and hosted by Jen McCown this month. Thanks Jen!
The topic is around tips, tactics and strategies for managing an Enterprise…

Well, having been in my current role for nearly a year now, which involves a degree of managing a number of SQL Servers, it’s a great opportunity to share what helps me.

1 – SQL prompt

This is an Add-in for SSMS produced by Redgate. While it does a wide range of awesome things, one of the main things that help me, from an enterprise perspective, is the ability to colour code tabs. This means that it is really easy to see what environment I’m running code against. For me, Green is dev, Purple is staging (and not quite live), and Red is Production (or requires caution!).

SQLPrompt_Tabs

The other, really nice feature is that when you close SSMS down (which occasionally happens accidentally, or when Windows Update does it’s thing, cos why would you deliberately close it!), SQL Prompt remembers all the tabs you had open, and reopens them (and connects them) on restart.

For me, this is great as it means that I can concentrate on the actual work, rather than waste time trying to think of a filename for some random query that may be of use at some point..

SSMS_Registered_Servers2 – Registered Servers

This is a really nice feature feature, that allows me to run the same query on multiple servers. One of the main uses of this is to get a ‘snapshot’ of what is running on the servers, with sp_whoisactive. Having a Query connected to one of these groups runs the query against each member server, and adds a column to the start of the result set showing the server name.

3 – Extended Events

I’ve had chance to get to grips more with this over the past few months, and I’m really starting to find lots of great uses for it. One of the really helpful (Enterprise-related) features that I’m using it for is to monitor execution times for a specific stored procedure. This is just for one SP, and it’s called thousands of times an hour. Using Extended events for this allows tracking of these times, with bordering on no overhead.

I’m in the process of doing a separate post on this, so keep em peeled.

Thanks again to Jen for hosting.

Advertisements

Recently, I’ve been learning about the Statistical language R. To allow me to do some testing against it, I wanted to get a decent sized dataset into SQL Server.

Having found a good dataset, the Airlines data from the 2009 Data Expo, I downloaded it, and proceeded to try and load it into SQL Server. I failed almost straightaway since the downloaded files were UTF-8 in format, and SQL Server cannot directly load them in using Bulk Copy or BCP.

So, I looked around for another option. I could have used SSIS, however, there seemed to be an issue with the Data Conversion module, where it wasn’t saving the changes having converted the fields from UTF-8 to ASCII.

My next choice was to have a look to see if C# would help. It did. The code below allowed me to take the source file (specified in _filePath) and load it into a database table (specified in _connString).

The code works by creating a DataTable into which the data from the CSV file is loaded. The data is then fed into SQL Server in batches of 100,0000 (parameterised using _batchSize).

Here is the code, it’s pretty well performing on my machine, running at 20-25k rows a second.

using Microsoft.VisualBasic.FileIO;
using System.Data;
using System.Data.SqlClient;

    public class LoadUTF8CSVIntoSQL
    {
        protected const string _filePath = @"C:\Inbox\DataSets\Flights\extract\1987.csv";
        protected const string _connString = @"Server=localhost;Database=Datasets;Trusted_Connection=True;";
        protected const int _batchSize = 100000;

        protected static void LoadDataTable(SqlBulkCopy sqlBulkCopy, SqlConnection sqlConnection, DataTable dataTable)
        {
            try
            {
                sqlBulkCopy.WriteToServer(dataTable);

                dataTable.Rows.Clear();
                System.Console.WriteLine("{0}\t- Batch Written",System.DateTime.Now.ToString());
            }
            catch (System.Exception ex)
            {
                System.Console.WriteLine("ERROR : "+ex.Message);
            }
        }

        public static void LoadCsvDataIntoSqlServer(string sFilePath)
        {
            try
            {
                // This should be the full path
                var fileName = "";
                if (sFilePath.Length == 0)
                {
                    fileName = _filePath;
                }
                else
                {
                    fileName = sFilePath;
                }

                var createdCount = 0;

                using (var textFieldParser = new TextFieldParser(fileName))
                {
                    textFieldParser.TextFieldType = FieldType.Delimited;
                    textFieldParser.Delimiters = new[] { "," };
                    textFieldParser.HasFieldsEnclosedInQuotes = false;

                    var connectionString = _connString;

                    var dataTable = new DataTable("airlines");

                    // Specify the columns in the Data Table
                    dataTable.Columns.Add("Yr");
                    dataTable.Columns.Add("Mth");
                    dataTable.Columns.Add("DayofMonth");
                    dataTable.Columns.Add("DayOfWeek");
                    dataTable.Columns.Add("DepTime");
                    dataTable.Columns.Add("CRSDepTime");
                    dataTable.Columns.Add("ArrTime");
                    dataTable.Columns.Add("CRSArrTime");
                    dataTable.Columns.Add("UniqueCarrier");
                    dataTable.Columns.Add("FlightNum");
                    dataTable.Columns.Add("TailNum");
                    dataTable.Columns.Add("ActualElapsedTime");
                    dataTable.Columns.Add("CRSElapsedTime");
                    dataTable.Columns.Add("AirTime");
                    dataTable.Columns.Add("ArrDelay");
                    dataTable.Columns.Add("DepDelay");
                    dataTable.Columns.Add("Origin");
                    dataTable.Columns.Add("Dest");
                    dataTable.Columns.Add("Distance");
                    dataTable.Columns.Add("TaxiIn");
                    dataTable.Columns.Add("TaxiOut");
                    dataTable.Columns.Add("Cancelled");
                    dataTable.Columns.Add("CancellationCode");
                    dataTable.Columns.Add("Diverted");
                    dataTable.Columns.Add("CarrierDelay");
                    dataTable.Columns.Add("WeatherDelay");
                    dataTable.Columns.Add("NASDelay");
                    dataTable.Columns.Add("SecurityDelay");
                    dataTable.Columns.Add("LateAircraftDelay");

                    using (var sqlConnection = new SqlConnection(connectionString))
                    {
                        sqlConnection.Open();

                        // Initialise Bulk Copy Object
                        var sqlBulkCopy = new SqlBulkCopy(sqlConnection)
                        {
                            DestinationTableName = "airlines"
                        };

                        // Define column mappings between Data Table and Target Table
                        sqlBulkCopy.ColumnMappings.Add("Yr", "Yr");
                        sqlBulkCopy.ColumnMappings.Add("Mth", "Mth");
                        sqlBulkCopy.ColumnMappings.Add("DayofMonth", "DayofMonth");
                        sqlBulkCopy.ColumnMappings.Add("DayOfWeek", "DayOfWeek");
                        sqlBulkCopy.ColumnMappings.Add("DepTime", "DepTime");
                        sqlBulkCopy.ColumnMappings.Add("CRSDepTime", "CRSDepTime");
                        sqlBulkCopy.ColumnMappings.Add("ArrTime", "ArrTime");
                        sqlBulkCopy.ColumnMappings.Add("CRSArrTime", "CRSArrTime");
                        sqlBulkCopy.ColumnMappings.Add("UniqueCarrier", "UniqueCarrier");
                        sqlBulkCopy.ColumnMappings.Add("FlightNum", "FlightNum");
                        sqlBulkCopy.ColumnMappings.Add("TailNum", "TailNum");
                        sqlBulkCopy.ColumnMappings.Add("ActualElapsedTime", "ActualElapsedTime");
                        sqlBulkCopy.ColumnMappings.Add("CRSElapsedTime", "CRSElapsedTime");
                        sqlBulkCopy.ColumnMappings.Add("AirTime", "AirTime");
                        sqlBulkCopy.ColumnMappings.Add("ArrDelay", "ArrDelay");
                        sqlBulkCopy.ColumnMappings.Add("DepDelay", "DepDelay");
                        sqlBulkCopy.ColumnMappings.Add("Origin", "Origin");
                        sqlBulkCopy.ColumnMappings.Add("Dest", "Dest");
                        sqlBulkCopy.ColumnMappings.Add("Distance", "Distance");
                        sqlBulkCopy.ColumnMappings.Add("TaxiIn", "TaxiIn");
                        sqlBulkCopy.ColumnMappings.Add("TaxiOut", "TaxiOut");
                        sqlBulkCopy.ColumnMappings.Add("Cancelled", "Cancelled");
                        sqlBulkCopy.ColumnMappings.Add("CancellationCode", "CancellationCode");
                        sqlBulkCopy.ColumnMappings.Add("Diverted", "Diverted");
                        sqlBulkCopy.ColumnMappings.Add("CarrierDelay", "CarrierDelay");
                        sqlBulkCopy.ColumnMappings.Add("WeatherDelay", "WeatherDelay");
                        sqlBulkCopy.ColumnMappings.Add("NASDelay", "NASDelay");
                        sqlBulkCopy.ColumnMappings.Add("SecurityDelay", "SecurityDelay");
                        sqlBulkCopy.ColumnMappings.Add("LateAircraftDelay", "LateAircraftDelay");

                        // Loop through the CSV and load each set of 100,000 records into a DataTable
                        while (!textFieldParser.EndOfData)
                        {
                            if (createdCount == 0)
                            {
                                textFieldParser.ReadFields();
                            }
                            dataTable.Rows.Add(textFieldParser.ReadFields());

                            createdCount++;

                            if (createdCount % _batchSize == 0)
                            {
                                LoadDataTable(sqlBulkCopy, sqlConnection, dataTable);
                            }
                        }

                        // Send a final set to SQL Server
                        LoadDataTable(sqlBulkCopy, sqlConnection, dataTable);

                        sqlConnection.Close();
                    }
                }
            }
            catch (System.Exception ex)
            {
                System.Console.WriteLine("ERROR : "+ex.Message);
            }
        }

        static void Main(string[] args)
        {
            try
            {
                var filePath = "";

                try
                {
                    filePath = args[0];
                }
                catch (System.Exception)
                {
                    filePath = "";
                }


                System.Console.WriteLine("{0}\t- Starting load of {1}", System.DateTime.Now.ToString(), filePath);
                LoadUTF8CSVIntoSQL.LoadCsvDataIntoSqlServer(filePath);
                System.Console.WriteLine("{0}\t- Completed load of {1}", System.DateTime.Now.ToString(), filePath);
            }
            catch (System.Exception ex)
            {
                System.Console.WriteLine("ERROR : "+ex.Message);
            }
        }
   
}

Having seen the request by redgate to review the Tribal SQL book, I leapt at the chance. I love the idea of these books and having read the MVP Deep Dives books (Book 1 and Book 2) previously I wanted to see what this one offered.

As with the MVP Deep Dives books, Tribal SQL’s authors have donated their royalties to charity, and in this case they go to Computers For Africa. There are also alot of well known names in this book with Dave BallantyneMark Rasmussen, Bob Pusateri, Stephanie Locke and Matt Velic, among them.

Tribal SQL is a good size book, weighing in at 454 pages, with 15 chapters covering various topics relevant to DBA’s. Each of the chapters has been written by a previously  unpublished author, and all are based on their experiences. There are chapters covering most topics from Internals Data Compression, Performance tuning, Auditing and SQL Injection, through to less technical areas such as Reporting and Database Mail. Additionally, there are sections on personal skills which include Communication Skills, Project Management and an Introduction to Agile Database Development.

While some of the sections don’t go into as much detail as you’d hope, this isn’t the point of this book (from my understanding). It covers each of the areas well, leaves you interested for more, and most of the chapters include links to various sources for further reading.

I enjoyed reading this book, and would highly recommend it for others. Chapters that I got the most out of were the Internals (Mark Rasmussen), Windowing Functions (Dave Ballantyne, and particularly since I was doing the SQL 2012 Querying exam), and the SQL Injection (Kevin Feasel).

My only criticism suggestion is that I’d like to see an eBook/Kindle version of it so I can have it in the eBook library I have on my Tablet, on the grounds that I can use it as a reference book.

You can read more about this book at the books website – TribalSQL.com

image Over the past couple of years I’ve been deliberately avoiding using any form of third-party add in for SQL Server Management Studio. I was doing this since I was pursuing the SQL Server Master Certification, and wouldn’t have had access to these tools during the exams. Since I’m no longer actively doing this, I’ve started looking around.

For several years, I’ve been using the DevExpress Coderush product for development work in C# and Visual Studio, having made the decision between it and ReSharper. That was a difficult choice to make.

Fortunately, the decision for SQL Server is easier; I’m only aware of one, which is Redgate’s SQL Prompt. I had a look at this a year or so back, but decided to uninstall it due to the reasons given above.

Having started looking at it again, and particularly with the new version I’m loving the experience.

What I like:

  • Tab History –  When you close Management studio (SSMS), SQL Prompt saves the contents of all your open tabs, and reopens them when you open SSMS again. I do this at least once a day, and it’s stopped me kicking myself. You can also see the history of tabs that have been open, and this also has a preview screen (shown below) which is really helpful for all those SQLQuery1.sql files.

image

  • Shortcuts – They are saving me time. You can type a shortcut, and it’ll replace it with the SQL. My favourite so far is st100, which gets replaced with SELECT TOP 100 * FROM
  • Improved Intellisense – It gives a more intelligent set of results to help you type your queries. This even goes down to the level of suggesting join criteria.
  • Performance – One of the best features is that it really doesn’t seem to impact performance of Management studio.

What I don’t like:

This was a hard one. The only thing that really stopped me using it (aside from the exam thing) is the cost.

Final Thoughts

Would I recommend it ? Yes, definitely, it’s improved my SQL, reduced code errors and allows me to concentrate on what the T-SQL is supposed to do, rather than focussing on what the syntax is.

There are a couple of features I’d like it to have: I’d quite like it to also work with MDX. I’d like it to also recommend code changes, to help with implementing best practices.

Disclaimer:

I was offered a free licence of SQL Prompt, by Redgate, for this review. However, the content of the review has been written by myself, and the benefit of the licence is that it gave me time to write the review.

Parallel SQL in C#

So, I’ve been wanting to get back to playing with C# for a while, and finally have had the opportunity.

I’ve also been wanting to play with the Task library in .NET and see if I could get it to do something interesting, well below is the result.

The code below, running in a .NET 4 project, will run two SQL SELECT statements against the AdventureWorks2012 database.

There are three tasks in here, ParallelTask 1 and 2, and a timing task. The Parallel task takes a Connection String and a query as inputs, and passes out a Status Message. One of the important points with a task is that the task has to be self contained. This is why the connection is instantiated within the task.

I also added in a Timing task (ParallelTiming) so I could pass out a ping message.

The whole thing is controlled by the code in the main section, which is used to start the three tasks, with their appropriate parameters.

After this it awaits the tasks completing, then passes out the resulting return messages.

Try it out; it’s good fun and all you need is SQL Server, AdventureWorks and something to build C# projects.

DISCLAIMER: I am by no means a C# expert… 🙂

You can download the code here

Have fun!

/// Parallel_SQL demonstration code
/// From Nick Haslam
/// http://blog.nhaslam.com
/// 16/9/2013

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Parallel_SQL
{
class Program
{
///
/// First Parallel task
///

///Connection string details ///Query to execute ///Status message to pass back ///
static Task ParallelTask1(string sConnString, string sQuery, Action StatusMessage)
{
return Task.Factory.StartNew(() =>
{
SqlConnection conn = new SqlConnection(sConnString);
conn.Open();

StatusMessage(“Running Query”);

SqlDataReader reader = null;
SqlCommand sqlCommand = new SqlCommand(sQuery, conn);

reader = sqlCommand.ExecuteReader();

while (reader.Read())
{
StatusMessage(reader[0].ToString());
}

return “Task 1 Complete”;
});
}

///
/// Second Parallel task
///

///Connection string details ///Query to execute ///Status message to pass back ///
static Task ParallelTask2(string sConnString, string sQuery, Action StatusMessage)
{
return Task.Factory.StartNew(() =>
{
SqlConnection conn = new SqlConnection(sConnString);
conn.Open();
StatusMessage(“Running Query”);

SqlDataReader reader = null;
SqlCommand sqlCommand = new SqlCommand(sQuery, conn);

reader = sqlCommand.ExecuteReader();

while (reader.Read())
{
StatusMessage(reader[0].ToString());
}

return “Task 2 Complete”;
});
}

///
/// Timing Task
///

///Milliseconds between ping ///Status message to pass back ///
static Task ParallelTiming(int iMSPause, Action StatusMessage)
{
return Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++) { System.Threading.Thread.Sleep(iMSPause); StatusMessage(“******************** PING ********************”); } return “Timing task done”; }); } static void Main(string[] args) { string sConnString = “server=.; Trusted_Connection=yes; database=AdventureWorks2012;”; try { var Task1Control = ParallelTask1(sConnString, “SELECT top 500 TransactionID FROM Production.TransactionHistory”, (update) =>
{
Console.WriteLine(String.Format(“{0} – {1}”, DateTime.Now, update));
});
var Task2Control = ParallelTask2(sConnString,
“SELECT top 500 SalesOrderDetailID FROM sales.SalesOrderDetail”, (update) =>
{
Console.WriteLine(String.Format(“{0} – \t\t{1}”, DateTime.Now, update));
});

var TimingTaskControl = ParallelTiming(250, (update) =>
{
Console.WriteLine(String.Format(“{0} – \t\t\t{1}”, DateTime.Now, update));
});

// Await Completion of the tasks
Console.WriteLine(“Task 1 Status – {0}”, Task1Control.Result);
Console.WriteLine(“Task 2 Status – {0}”, Task2Control.Result);
Console.WriteLine(“Timing Task Status – {0}”, TimingTaskControl.Result);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
Console.ReadKey();
}
}
}

On Monday 1st July 2013, the UK Police launched a hub for accessing their data. Read the announcement here: http://www.data.gov.uk/blog/the-launch-of-datapoliceuk

You can find the portal here: http://data.police.uk/

The data extracts are here: http://data.police.uk/data/

I wanted to have a look at this and see what can be done with it, but there are dozens of csv files to load in to get the full view, so this is a quick post on what I’ve been doing this evening. 🙂

Having found an interesting post on loading multiple CSV files in T-SQL , I’ve updated it to load the files in. First though, we need to create a table to take the data.

Create the Crime_Info table

An initial table to receive the data is shown below:

CREATE TABLE [dbo].[Crime_Info]( 
    [Crime ID] [varchar](100) NULL, 
    [Month] [varchar](100) NULL, 
    [Reported by] [varchar](100) NULL, 
    [Falls within] [varchar](100) NULL, 
    [Longitude] [float] NULL, 
    [Latitude] [float] NULL, 
    [Location] [varchar](100) NULL, 
    [LSOA code] [varchar](100) NULL, 
    [LSOA name] [varchar](100) NULL, 
    [Crime type] [varchar](200) NULL, 
    [Last outcome category] [varchar](max) NULL, 
    [Context] [varchar](max) NULL 
) ON [PRIMARY] WITH (DATA_COMPRESSION = PAGE )

Note that I Page compressed the table as I wanted to reduce the space it took up. It went down from 3GB to around 1.5GB.

Load the Crime files

I’ve modified the bulk load script mentioned above, and you can download it here:

https://dl.dropboxusercontent.com/u/2765900/Load_UK_Crime_Data.sql

Once loaded, I had 15,668,549 rows of data, sitting in a 1.2GB database.

Then, by playing with PowerPivot you can create ‘fun’ visualisations like this.

image

Have fun, and as they say on Crimewatch, “don’t have nightmares, do sleep well”.

So, this is the second article I’ve written against the TPC-H Benchmark (Part one here). Recently, Amazon announced that their ‘fast, fully managed petabyte-scale data warehouse service’ was available for public consumption. Having finally had some time to play, I thought I’d take it for a spin.

I was able to get a single node cluster up and running pretty quickly, and installed their sample data set easily. You can read how to go about this in their Getting Started Guide.

The initial issue I had with the sample data set was, well, it was pretty small. Ok, it got the concepts over, but I wanted more. I wanted to get an idea of performance and how it compared across the different levels. I wanted more data.

So, I decided to dump my set of test data (1Gb TPC-H, see part 1 for creating this) into it, and covered here is how I did it.

Getting Started

I’m going to assume that you’ve made it through steps 1-4 of the Getting Started guide above (which covers Prerequisites, Launching the Cluster, Security setup and Connecting to the cluster).

Shown below are the statements used to create the TPC-H tables, within the Redshift environment. You’ll need to create a connection to the Redshift environment, use SQL Workbench to connect to it, and copy and paste this into the SQL window.

CREATE TABLE customer(
C_CustKey int ,
C_Name varchar(64) ,
C_Address varchar(64) ,
C_NationKey int ,
C_Phone varchar(64) ,
C_AcctBal decimal(13, 2) ,
C_MktSegment varchar(64) ,
C_Comment varchar(120) ,
skip varchar(64)
);

CREATE TABLE lineitem(
L_OrderKey int ,
L_PartKey int ,
L_SuppKey int ,
L_LineNumber int ,
L_Quantity int ,
L_ExtendedPrice decimal(13, 2) ,
L_Discount decimal(13, 2) ,
L_Tax decimal(13, 2) ,
L_ReturnFlag varchar(64) ,
L_LineStatus varchar(64) ,
L_ShipDate datetime ,
L_CommitDate datetime ,
L_ReceiptDate datetime ,
L_ShipInstruct varchar(64) ,
L_ShipMode varchar(64) ,
L_Comment varchar(64) ,
skip varchar(64)
);
CREATE TABLE nation(
N_NationKey int ,
N_Name varchar(64) ,
N_RegionKey int ,
N_Comment varchar(160) ,
skip varchar(64)
);
CREATE TABLE orders(
O_OrderKey int ,
O_CustKey int ,
O_OrderStatus varchar(64) ,
O_TotalPrice decimal(13, 2) ,
O_OrderDate datetime ,
O_OrderPriority varchar(15) ,
O_Clerk varchar(64) ,
O_ShipPriority int ,
O_Comment varchar(80) ,
skip varchar(64)
);

CREATE TABLE part(
P_PartKey int ,
P_Name varchar(64) ,
P_Mfgr varchar(64) ,
P_Brand varchar(64) ,
P_Type varchar(64) ,
P_Size int ,
P_Container varchar(64) ,
P_RetailPrice decimal(13, 2) ,
P_Comment varchar(64) ,
skip varchar(64)
);
CREATE TABLE partsupp(
PS_PartKey int ,
PS_SuppKey int ,
PS_AvailQty int ,
PS_SupplyCost decimal(13, 2) ,
PS_Comment varchar(200) ,
skip varchar(64)
);
CREATE TABLE region(
R_RegionKey int ,
R_Name varchar(64) ,
R_Comment varchar(160) ,
skip varchar(64)
);
CREATE TABLE supplier(
S_SuppKey int ,
S_Name varchar(64) ,
S_Address varchar(64) ,
S_NationKey int ,
S_Phone varchar(18) ,
S_AcctBal decimal(13, 2) ,
S_Comment varchar(105) ,
skip varchar(64)
);

Next up, we need to get some data into it. I’ve had a copy of the TPC-H files sitting on my S3 account for a while, so I was hoping to just point Redshift at that (just like the sample code does). This was where I ran into my first issue. There may be an easier way, but I wanted to do it quickly. The problem was that I couldn’t get the S3 URL syntax to work, and this appears to be because my S3 Buckets are sitting in Ireland (EU). The S3 syntax looks to only work if you are using ‘US Standard’ as your S3 storage. I could be wrong, but I’m not an S3 expert.

Anyway, having created an S3 bucket in US Standard, and transferred the files over, I used the following to copy the contents from these files into the tables created in Redshift.

copy customer from ‘s3://oldnick-tpch/customer.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;
copy orders from ‘s3://oldnick-tpch/orders.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;
copy lineitem from ‘s3://oldnick-tpch/lineitem.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;
copy nation from ‘s3://oldnick-tpch/nation.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;
copy part from ‘s3://oldnick-tpch/part.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;
copy partsupp from ‘s3://oldnick-tpch/partsupp.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;
copy region from ‘s3://oldnick-tpch/region.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;
copy supplier from ‘s3://oldnick-tpch/supplier.tbl’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’;

You’ll need to replace <Your-Access-Key-ID> with your Amazon access key and <Your-Secret-Access-Key> with your secret key, though I bet you’d guessed that. Also, note that it’s possible to load from a gzipped file by adding the gzip parameter to the  copy statement, though I didn’t discover this till after the load.

After waiting a little while, though not too long, for Redshift to bring the data in from S3, you can use these queries to check the counts.

select count(*) from customer;
select count(*) from orders;
select count(*) from lineitem;
select count(*) from nation;
select count(*) from part;
select count(*) from partsupp;
select count(*) from region;
select count(*) from supplier;

Next, the Developer Guide section covering loading data into Redshift say you should run the following statements after loading. Analyze updates the database statistics, and Vacuum then reclaims storage space.

analyze;
vacuum;

So, there we go, now we’ve got a Redshift cluster running the TPC-H tables. So next I thought I’d do a basic test to compare results.

My test query for this is shown below, and just does some aggregation against the lineitem table (6 million or so rows).

select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc,  count(*) as count_order
from lineitem
group by l_returnflag, l_linestatus
order by l_returnflag,l_linestatus;

So I ran this on my laptop (i7, 12 Gb RAM, 512GB SSD) a couple of times, once as a straight query, and once with a Columnstore index on it, cold (after restart) and warm (2nd time).

SQL times are shown based on SET STATISTICS TIME ON times.

Analysing Redshift was interesting. Since I’ve not done much with Postgres-SQL, I had a look through the Redshift documentation to see what is going on. I found an interesting page showing how to determine if a query is running from disk . Working through this I saw that, once I got the query id from the query below, I could get the query details including memory used and times.

Getting the Query Id

select query, elapsed, substring
from svl_qlog
order by query
desc limit 5;

select *
from svl_query_summary
where query = 5931

image

So, having seen those figures, I had a look at the cluster details.

Initially I was using 1 node, so I went up a notch, to a 2 node cluster of the more powerful nodes.

Single Node Testing
image
Multi Node Testing
image

The Results

Time to Return (sec)
Laptop – SQL 2012 (Cold) 24515ms CPU time, 6475ms elapsed
Laptop – SQL 2012 (Warm) 24016ms CPU time, 6060ms elapsed.
Laptop – SQL 2102 Columnstore (Cold) 531ms CPU time, 258ms elapsed
Laptop – SQL 2102 Columnstore (Warm) 389ms CPU time, 112ms elapsed
Redshift (1 node cluster) 1.24 sec
Redshift (2 node cluster 1.4 sec

So, obviously, I’m not stretching the performance of the Redshift cluster.

Part 2b of this will cover similar tests, though I’ll be doing it with a 100GB TPC-H test data set.

Keep ’em peeled for the next post!

20121003-200545.jpg Welcome to TSql2sday issue #35, this time hosted by me…

It’s a bit last minute, as I stepped in to help Adam out, so bear with me. As always, thanks to Adam for starting this off, I’ve posted a few articles on previous runs, and have found other people’s posts to be really interesting. I hope this follows in the same way.

Over the past couple of days I’ve been attending a training course in Paris, and one evening, to relax I watched ‘Soylent Green‘, a classic science fiction film. If you’ve not seen it, I recommend it, and go and watch it …

So, what I’d like to know is, what is your most horrifying discovery from your work with SQL Server?

We all like to read stories of other people’s misfortunes and, in some ways they help to make us better people by learning from them. Hopefully, there is nothing as bad as Charlton Heston’s discovery, but there may be in its own way.

A couple of extra thoughts for motivational thinking…

Soylent Brown – You did a post, Great Job!!

Soylent Orange – You did a post, it made me wince!

Soylent Green  – You did a post, it made me wince, and it included some T-SQL.

Do you have the words straight?

Here are the rules as usual: If you would like to participate in T-SQL Tuesday please be sure to follow the rules below:

  • Your blog post must be published between Tuesday, October 9th 2012 00:00:00 GMT and Wednesday, October 10th 2012 00:00:00 GMT.
  • Include the T-SQL Tuesday logo (above) and hyperlink it back to this post.
  • If you don’t see your post in trackbacks, add the link to the comments below.
  • If you are on Twitter please tweet your blog using the #TSQL2sDay hashtag. I can be contacted there as @nhaslam, in case you have questions or problems with comments/trackback.

Thank you all for participating, and special thanks to Adam Machanic (b|t) for all his help and for continuing this series!

Thanks for posting, and I’ll have a follow-up post listing all the contributions as soon as I can.

Over the past few evenings, I’ve been playing with SQLIO, to get an idea of how SSD compares to a couple of servers (one quite old, one a bit newer) that I have access too.

SQLIO can be used to do performance testing of an IO subsystem, prior to deploying SQL Server onto it. It doesn’t actually do anything specifically with SQL, it’s just IO.

If you haven’t looked at SQLIO, I would highly recommend looking at these websites:

http://www.sqlskills.com/BLOGS/PAUL/post/Cool-free-tool-to-parse-and-analyze-SQLIO-results.aspx

http://tools.davidklee.net/sqlio/sqlio-analyzer.aspx

The SQLIO Analyser, created by David Klee, is amazing. It allows you to run the SQLIO package (a preconfigured one is available on the site) and submit the results. It then generates an Excel file that contains various metrics. It’s nice!

Running on my Laptop…

Having run the pre-built package on my laptop, I got the following metrics out of it. As you can see, it’s an SSD  (Crucial M4 SSD), and pretty nippy.

image

image

Interesting metrics here, and one of the key benefits of an SSD, is that regardless of what you are doing, the average latency is so low. For these tests, I was getting:

Avg. Metrics Sequential Read Random Read Sequential Write Random Write
Latency (ms) 19.28ms 18.38ms 23.21ms 51.51ms
Avg IOPs 3777 3493 2930 1340
MB/s 236.07 218.3 183 83.7

Running on an older server

So, running this on an older server, connected to a much older (6-8+ years old) SAN gave me these results. You can see that the metrics are all much lower, and there is a much wider spread of for all the metrics, and that is down to the spinning disks.

image

image

As you can see from the metrics below, there is a significant drop in the performance of the server, a lot more variance across the load types.

Avg. Metrics Sequential Read Random Read Sequential Write Random Write
Latency (ms) 24.81ms 66.79 373 260
Avg IOPs 1928 710 186 210
MB/s 120 44.3 11.6 13.14

Slightly newer Server

So, next I had the SQLIO package running on a slightly newer server (with a higher spec I/O system, I was told), which gave the following results.

image

image

As expected, this did give generally better results, it is interesting that Sequential read had better throughput on the older server.

Avg. Metrics Sequential Read Random Read Sequential Write Random Write
Latency (ms) 35.13 44.17 41.81 77.44
Avg IOPs 1474 1021 1314 794
MB/s 92.7 63.8 82.8 49.6

Cracking open VMware

Since I use VMware Workstation for compartmentalising projects on my laptop, I thought I’d run this against a VM. The VM was running on the SSD (at the top of the post), so I could see how much of an impact the VMware drivers had on the process. This gave some interesting results, which you can see below. Obviously there is something screwy going on here, it’s not likely that the VM can perform that much faster than the drive it’s sitting on. Would be nice if it could though…

image

image

Avg. Metrics Sequential Read Random Read Sequential Write Random Write
Latency (ms) 7.8 7.5 7.63 7.71
Avg IOPs 12435 13119 15481 14965
MB/s 777 819 967 935

While the whole process was running, Task manager on the host machine was sitting at around 0-2% for disk utilisation, but the CPU was sitting at 50-60%. So, it was hardly touching the disk.

image

Conclusion

Just to summarise this, in case you didn’t already know, SSD’s are really quick. For the testing I was doing, the SSD was giving me approx. double the performance from some pretty expensive hardware (or at least it was 5-10 years ago…)

Also, take your test results with a grain of salt.

It’s another TSQL2sday post, this time hosted by Rob Volk (b | t ). Thanks for hosting Rob.

So this month, it’s about how we fixed a problem, or found help when we couldn’t fix a problem, with a theme based on ‘Help’ by The Beatles

I chose the 2nd verse…

When I was younger, so much younger than today

So, many years ago, when I started out with SQL Server, back in the heady days of 6.5, there was much less of a SQL Community, actually, I don’t even remember one. The only way I could get help, was either through using MSDN, or by emailed colleagues I met on a SQL training course.

I never needed anybody’s help in any way.

Though that’s primarily due to stopping using SQL for a while, just a year or so, but still.

Everyone needs help, at some point, with something. It’s not a weakness, it’s a strength.

But now these days are gone, I’m not so self assured.

In the past few years, I’ve started working more and more with SQL, and found that it is such a huge product that no one can know the whole thing (SSAS, SSIS, SSRS included), and because of that, I’ve found several ways to get help if I need it.

Though, before I get into that, I need to say something about the community. There is a huge SQL Community out there, though the first community event I attended wasn’t a SQL One. It was a Developer event, Remix Uk, back in 2008 (http://www.microsoft.com/uk/remix08/default.aspx). It was a great event and I got to meet some great people there, including Scott Guthrie! Getting to this event was pretty much solely due to an ex-colleague, Jes Kirkup. Thanks Jes!

Since then I’ve started attending community events where I can, including the local DevEvening events (where I’ve done a couple of short presentations), and SQL community events (SQLMaidenhead, SQL in the Evening, and SQLBits of course!). I’ve found that these are a great way of getting a great insight into what skills others in the industry have, and so where I should be targeting my learning. Following on from that, I’ve met some great people, and there are people who I know I could ask for help if I needed to.

Not to mention the #SQLHelp hash tag on twitter, where there is help, pretty much 24hours a day, the only restriction being the need to phrase your question in 150 characters (160-hash tag).

Now I find I’ve changed my mind and opened up the doors.

Now I find that I am helping people where I get the opportunity, am publishing blog articles (here, like this one!) and am hoping to do more Community presentations. Furthermore, I’m doing internal training courses (next month I’m doing one on SSAS), and have recently started mentoring a colleague in SQL.

It’s great to be able to share knowledge and experience.

Thanks for listening, and reading, and thanks again to Rob for hosting.