Category: C#


Recently, I’ve been learning about the Statistical language R. To allow me to do some testing against it, I wanted to get a decent sized dataset into SQL Server.

Having found a good dataset, the Airlines data from the 2009 Data Expo, I downloaded it, and proceeded to try and load it into SQL Server. I failed almost straightaway since the downloaded files were UTF-8 in format, and SQL Server cannot directly load them in using Bulk Copy or BCP.

So, I looked around for another option. I could have used SSIS, however, there seemed to be an issue with the Data Conversion module, where it wasn’t saving the changes having converted the fields from UTF-8 to ASCII.

My next choice was to have a look to see if C# would help. It did. The code below allowed me to take the source file (specified in _filePath) and load it into a database table (specified in _connString).

The code works by creating a DataTable into which the data from the CSV file is loaded. The data is then fed into SQL Server in batches of 100,0000 (parameterised using _batchSize).

Here is the code, it’s pretty well performing on my machine, running at 20-25k rows a second.

using Microsoft.VisualBasic.FileIO;
using System.Data;
using System.Data.SqlClient;

    public class LoadUTF8CSVIntoSQL
    {
        protected const string _filePath = @"C:\Inbox\DataSets\Flights\extract\1987.csv";
        protected const string _connString = @"Server=localhost;Database=Datasets;Trusted_Connection=True;";
        protected const int _batchSize = 100000;

        protected static void LoadDataTable(SqlBulkCopy sqlBulkCopy, SqlConnection sqlConnection, DataTable dataTable)
        {
            try
            {
                sqlBulkCopy.WriteToServer(dataTable);

                dataTable.Rows.Clear();
                System.Console.WriteLine("{0}\t- Batch Written",System.DateTime.Now.ToString());
            }
            catch (System.Exception ex)
            {
                System.Console.WriteLine("ERROR : "+ex.Message);
            }
        }

        public static void LoadCsvDataIntoSqlServer(string sFilePath)
        {
            try
            {
                // This should be the full path
                var fileName = "";
                if (sFilePath.Length == 0)
                {
                    fileName = _filePath;
                }
                else
                {
                    fileName = sFilePath;
                }

                var createdCount = 0;

                using (var textFieldParser = new TextFieldParser(fileName))
                {
                    textFieldParser.TextFieldType = FieldType.Delimited;
                    textFieldParser.Delimiters = new[] { "," };
                    textFieldParser.HasFieldsEnclosedInQuotes = false;

                    var connectionString = _connString;

                    var dataTable = new DataTable("airlines");

                    // Specify the columns in the Data Table
                    dataTable.Columns.Add("Yr");
                    dataTable.Columns.Add("Mth");
                    dataTable.Columns.Add("DayofMonth");
                    dataTable.Columns.Add("DayOfWeek");
                    dataTable.Columns.Add("DepTime");
                    dataTable.Columns.Add("CRSDepTime");
                    dataTable.Columns.Add("ArrTime");
                    dataTable.Columns.Add("CRSArrTime");
                    dataTable.Columns.Add("UniqueCarrier");
                    dataTable.Columns.Add("FlightNum");
                    dataTable.Columns.Add("TailNum");
                    dataTable.Columns.Add("ActualElapsedTime");
                    dataTable.Columns.Add("CRSElapsedTime");
                    dataTable.Columns.Add("AirTime");
                    dataTable.Columns.Add("ArrDelay");
                    dataTable.Columns.Add("DepDelay");
                    dataTable.Columns.Add("Origin");
                    dataTable.Columns.Add("Dest");
                    dataTable.Columns.Add("Distance");
                    dataTable.Columns.Add("TaxiIn");
                    dataTable.Columns.Add("TaxiOut");
                    dataTable.Columns.Add("Cancelled");
                    dataTable.Columns.Add("CancellationCode");
                    dataTable.Columns.Add("Diverted");
                    dataTable.Columns.Add("CarrierDelay");
                    dataTable.Columns.Add("WeatherDelay");
                    dataTable.Columns.Add("NASDelay");
                    dataTable.Columns.Add("SecurityDelay");
                    dataTable.Columns.Add("LateAircraftDelay");

                    using (var sqlConnection = new SqlConnection(connectionString))
                    {
                        sqlConnection.Open();

                        // Initialise Bulk Copy Object
                        var sqlBulkCopy = new SqlBulkCopy(sqlConnection)
                        {
                            DestinationTableName = "airlines"
                        };

                        // Define column mappings between Data Table and Target Table
                        sqlBulkCopy.ColumnMappings.Add("Yr", "Yr");
                        sqlBulkCopy.ColumnMappings.Add("Mth", "Mth");
                        sqlBulkCopy.ColumnMappings.Add("DayofMonth", "DayofMonth");
                        sqlBulkCopy.ColumnMappings.Add("DayOfWeek", "DayOfWeek");
                        sqlBulkCopy.ColumnMappings.Add("DepTime", "DepTime");
                        sqlBulkCopy.ColumnMappings.Add("CRSDepTime", "CRSDepTime");
                        sqlBulkCopy.ColumnMappings.Add("ArrTime", "ArrTime");
                        sqlBulkCopy.ColumnMappings.Add("CRSArrTime", "CRSArrTime");
                        sqlBulkCopy.ColumnMappings.Add("UniqueCarrier", "UniqueCarrier");
                        sqlBulkCopy.ColumnMappings.Add("FlightNum", "FlightNum");
                        sqlBulkCopy.ColumnMappings.Add("TailNum", "TailNum");
                        sqlBulkCopy.ColumnMappings.Add("ActualElapsedTime", "ActualElapsedTime");
                        sqlBulkCopy.ColumnMappings.Add("CRSElapsedTime", "CRSElapsedTime");
                        sqlBulkCopy.ColumnMappings.Add("AirTime", "AirTime");
                        sqlBulkCopy.ColumnMappings.Add("ArrDelay", "ArrDelay");
                        sqlBulkCopy.ColumnMappings.Add("DepDelay", "DepDelay");
                        sqlBulkCopy.ColumnMappings.Add("Origin", "Origin");
                        sqlBulkCopy.ColumnMappings.Add("Dest", "Dest");
                        sqlBulkCopy.ColumnMappings.Add("Distance", "Distance");
                        sqlBulkCopy.ColumnMappings.Add("TaxiIn", "TaxiIn");
                        sqlBulkCopy.ColumnMappings.Add("TaxiOut", "TaxiOut");
                        sqlBulkCopy.ColumnMappings.Add("Cancelled", "Cancelled");
                        sqlBulkCopy.ColumnMappings.Add("CancellationCode", "CancellationCode");
                        sqlBulkCopy.ColumnMappings.Add("Diverted", "Diverted");
                        sqlBulkCopy.ColumnMappings.Add("CarrierDelay", "CarrierDelay");
                        sqlBulkCopy.ColumnMappings.Add("WeatherDelay", "WeatherDelay");
                        sqlBulkCopy.ColumnMappings.Add("NASDelay", "NASDelay");
                        sqlBulkCopy.ColumnMappings.Add("SecurityDelay", "SecurityDelay");
                        sqlBulkCopy.ColumnMappings.Add("LateAircraftDelay", "LateAircraftDelay");

                        // Loop through the CSV and load each set of 100,000 records into a DataTable
                        while (!textFieldParser.EndOfData)
                        {
                            if (createdCount == 0)
                            {
                                textFieldParser.ReadFields();
                            }
                            dataTable.Rows.Add(textFieldParser.ReadFields());

                            createdCount++;

                            if (createdCount % _batchSize == 0)
                            {
                                LoadDataTable(sqlBulkCopy, sqlConnection, dataTable);
                            }
                        }

                        // Send a final set to SQL Server
                        LoadDataTable(sqlBulkCopy, sqlConnection, dataTable);

                        sqlConnection.Close();
                    }
                }
            }
            catch (System.Exception ex)
            {
                System.Console.WriteLine("ERROR : "+ex.Message);
            }
        }

        static void Main(string[] args)
        {
            try
            {
                var filePath = "";

                try
                {
                    filePath = args[0];
                }
                catch (System.Exception)
                {
                    filePath = "";
                }


                System.Console.WriteLine("{0}\t- Starting load of {1}", System.DateTime.Now.ToString(), filePath);
                LoadUTF8CSVIntoSQL.LoadCsvDataIntoSqlServer(filePath);
                System.Console.WriteLine("{0}\t- Completed load of {1}", System.DateTime.Now.ToString(), filePath);
            }
            catch (System.Exception ex)
            {
                System.Console.WriteLine("ERROR : "+ex.Message);
            }
        }
   
}
Advertisements

Parallel SQL in C#

So, I’ve been wanting to get back to playing with C# for a while, and finally have had the opportunity.

I’ve also been wanting to play with the Task library in .NET and see if I could get it to do something interesting, well below is the result.

The code below, running in a .NET 4 project, will run two SQL SELECT statements against the AdventureWorks2012 database.

There are three tasks in here, ParallelTask 1 and 2, and a timing task. The Parallel task takes a Connection String and a query as inputs, and passes out a Status Message. One of the important points with a task is that the task has to be self contained. This is why the connection is instantiated within the task.

I also added in a Timing task (ParallelTiming) so I could pass out a ping message.

The whole thing is controlled by the code in the main section, which is used to start the three tasks, with their appropriate parameters.

After this it awaits the tasks completing, then passes out the resulting return messages.

Try it out; it’s good fun and all you need is SQL Server, AdventureWorks and something to build C# projects.

DISCLAIMER: I am by no means a C# expert… 🙂

You can download the code here

Have fun!

/// Parallel_SQL demonstration code
/// From Nick Haslam
/// http://blog.nhaslam.com
/// 16/9/2013

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Parallel_SQL
{
class Program
{
///
/// First Parallel task
///

///Connection string details ///Query to execute ///Status message to pass back ///
static Task ParallelTask1(string sConnString, string sQuery, Action StatusMessage)
{
return Task.Factory.StartNew(() =>
{
SqlConnection conn = new SqlConnection(sConnString);
conn.Open();

StatusMessage(“Running Query”);

SqlDataReader reader = null;
SqlCommand sqlCommand = new SqlCommand(sQuery, conn);

reader = sqlCommand.ExecuteReader();

while (reader.Read())
{
StatusMessage(reader[0].ToString());
}

return “Task 1 Complete”;
});
}

///
/// Second Parallel task
///

///Connection string details ///Query to execute ///Status message to pass back ///
static Task ParallelTask2(string sConnString, string sQuery, Action StatusMessage)
{
return Task.Factory.StartNew(() =>
{
SqlConnection conn = new SqlConnection(sConnString);
conn.Open();
StatusMessage(“Running Query”);

SqlDataReader reader = null;
SqlCommand sqlCommand = new SqlCommand(sQuery, conn);

reader = sqlCommand.ExecuteReader();

while (reader.Read())
{
StatusMessage(reader[0].ToString());
}

return “Task 2 Complete”;
});
}

///
/// Timing Task
///

///Milliseconds between ping ///Status message to pass back ///
static Task ParallelTiming(int iMSPause, Action StatusMessage)
{
return Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++) { System.Threading.Thread.Sleep(iMSPause); StatusMessage(“******************** PING ********************”); } return “Timing task done”; }); } static void Main(string[] args) { string sConnString = “server=.; Trusted_Connection=yes; database=AdventureWorks2012;”; try { var Task1Control = ParallelTask1(sConnString, “SELECT top 500 TransactionID FROM Production.TransactionHistory”, (update) =>
{
Console.WriteLine(String.Format(“{0} – {1}”, DateTime.Now, update));
});
var Task2Control = ParallelTask2(sConnString,
“SELECT top 500 SalesOrderDetailID FROM sales.SalesOrderDetail”, (update) =>
{
Console.WriteLine(String.Format(“{0} – \t\t{1}”, DateTime.Now, update));
});

var TimingTaskControl = ParallelTiming(250, (update) =>
{
Console.WriteLine(String.Format(“{0} – \t\t\t{1}”, DateTime.Now, update));
});

// Await Completion of the tasks
Console.WriteLine(“Task 1 Status – {0}”, Task1Control.Result);
Console.WriteLine(“Task 2 Status – {0}”, Task2Control.Result);
Console.WriteLine(“Timing Task Status – {0}”, TimingTaskControl.Result);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
Console.ReadKey();
}
}
}

Following on from my previous post on bringing data from Twitter into SQL Server, you’ll recall that we were able to take data from a JSON data feed, using cURL, and parse it into SQL Server, using a custom Function.

This enabled the twitter feed to be loaded into a SQL database at the staggering performance of 38 records a second. In a future post, I hope to do some optimisation on the T-SQL process for getting this data in.

However, I wanted to see how quickly the data could be loaded using a C# application (since I’m a bit of a closet coder).

So, following on from Step 1 and Step 2 from the previous post, so you’ve taken the feeds from Twitter, and loaded the data into a Load table in SQL, I then want to get the data into a table in SQL Server.

There are several JSON libraries that are available, that link to C# (a list is here: http://json.org/). I chose JSON.NET (http://json.codeplex.com/) on the basis that it was quite frequently mentioned on StackOverflow (so if I had any questions, I’d stand a reasonable chance of getting an answer), and the examples looked pretty straightforward.

Having fired up VS 2012, I created a project, and added JSON.Net into it using NuGet (I like NuGet!), using Install-Package Newtonsoft.Json.

using System;
using System.Collections.Generic;
using System.Linq;
using Newtonsoft.Json;
using System.Data.SqlClient;

namespace JSONParse
{

class Program
{
static void Main(string[] args)
{
DateTime dStart, dEnd;
dStart = DateTime.Now;

// Connect to SQL
SqlConnection conn = new SqlConnection(“server=localhost; Trusted_Connection=yes; database=twitter_stream;”);
try
{
conn.Open();
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}

try
{

// Set up the Reader process
SqlDataReader reader = null;
SqlCommand myCommand = new SqlCommand(
“select * from tweetJSON”, conn);
reader = myCommand.ExecuteReader();

while (reader.Read())
{
try
{
// Convert the JSON Data
dynamic obj = JsonConvert.DeserializeObject(reader[1].ToString());
string tJSID = reader[0].ToString();
Console.WriteLine(“id_str {0}”, obj.id_str);
SqlCommand insertcmd = new SqlCommand()
{
CommandText = @”insert into TweetJSONStaging ( tJSID, Country, id_str,
followers_count,profile_image_url,statuses_count,
profile_background_image_url,created_at,friends_count,
location,name,lang, screen_name, source, geo_loc, text)
VALUES (@tJSID, @country,@id_str,@followers, @profileURL, @statuses,
@backgroundimageurl, @created_at, @friendscount,@location, @name,
@lang, @screenname,@source, @geoloc, @text)”
};
insertcmd.Connection = new SqlConnection(“server=localhost; Trusted_Connection=yes; database=twitter_stream;”);
insertcmd.Parameters.AddWithValue(“@tJSID”, tJSID);
insertcmd.Parameters.AddWithValue(“@id_str”, (string)obj.id_str);
insertcmd.Parameters.AddWithValue(“@followers”, (string)obj.user.followers_count);
insertcmd.Parameters.AddWithValue(“@backgroundimageurl”, (string)obj.user.profile_background_image_url);
insertcmd.Parameters.AddWithValue(“@name”, (string)obj.user.name);
insertcmd.Parameters.AddWithValue(“@profileURL”, (string)obj.user.profile_image_url);
insertcmd.Parameters.AddWithValue(“@statuses”, (string)obj.user.statuses_count);
insertcmd.Parameters.AddWithValue(“@friendscount”, (string)obj.user.friends_count);
insertcmd.Parameters.AddWithValue(“@screenname”, (string)obj.user.screen_name);
insertcmd.Parameters.AddWithValue(“@lang”, (string)obj.user.lang);
insertcmd.Parameters.AddWithValue(“@text”, (string)obj.text);
insertcmd.Parameters.AddWithValue(“@source”, (string)obj.source);
insertcmd.Parameters.AddWithValue(“@created_at”, (string)obj.created_at);
string sCountry = “”, sLocation = “”, sGeoloc = “”;

try
{
sCountry = (string)obj.place.country;
}
catch (Exception e)
{
//Console.WriteLine(e.ToString());
}
insertcmd.Parameters.AddWithValue(“@country”, sCountry);

try
{
sLocation = (string)obj.user.location;
}
catch (Exception e)
{
//Console.WriteLine(e.ToString());
}
insertcmd.Parameters.AddWithValue(“@location”, sLocation);
try
{
sGeoloc = String.Join(“,”, obj.place.bounding_box.coordinates.Last.First);
}
catch (Exception e)
{
// Console.WriteLine(e.ToString());
}
insertcmd.Parameters.AddWithValue(“@geoloc”, sGeoloc);

insertcmd.Connection.Open();
try
{
insertcmd.ExecuteNonQuery();
}
catch (Exception)
{ }
insertcmd.Connection.Close();
}
catch (Exception)
{}
}
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}

dEnd = DateTime.Now;
Console.WriteLine(“Time taken = “+ dEnd.Subtract(dStart));

Console.ReadKey();

}
}
}

Obviously, this code is an example, and you’d be wanting to have proper error handling, better functional layout, better commenting, etc, etc, etc in there, however, you can see how the code runs, and from a performance perspective, it’s better.

Running this code through the Debug mode in Visual Studio, I managed to get 155 records processed a second, pretty snappy.

Then, running it as an Application, outside Visual Studio I got 393 records a second! Just over 10 times faster than T-SQL with a Cursor.

image

Having been sitting on the fence for a while, I’m finally leaping off, and presenting at some community events. Following a false start with SQLBits (I submitted, but wasn’t voted in, and given the number of attendees I’m a little relieved about that!), I’ll be presenting at the following events over the next couple of months.

Hope to see you there!

24th April (Tues) – SQL Server in the Evening (6:30-6:50) – First Timers Slot (http://sqlserverfaq.com/events/392/Sessions-including-SQL-Server-Parallel-Data-Warehouse-at-the-sixth-SQL-Server-community-event-615pm-April-24th-Reading-Berkshire.aspx)

I’ll talk about using the CLR within SQL Server, why and when it should be used and then how.

25th April (Weds) – DevEvening (http://www.devevening.co.uk/)

26th May (Sat)– DDD Southwest (http://dddsouthwest.com/)
NOTE: This session isn’t confirmed yet, and is still reliant on being voted in. You can vote by going to the DDD Southwest site, linked above! )

Both DevEvening and DDD Southwest will be the same session, summarised below:

Going Native with SQL Server 2012 and C++

I’ll be going through the delights of creating a module to interact with SQL Server 2012, a function in T-SQL (briefly), then using C# to create a SQL CLR module, and then looking into the performance gains by making a C++ application querying the SQL Native Client (ODBC). All three sections will do the same job, and we’ll cover the advantages and disadvantages of each.

We’ll cover the following:

  • T-SQL, SQL CLR (C#)
  • SQL Server Native Client
  • Advantages and Disadvantages
  • Performance Opportunities
  • How to use it to connect to SQL Server from C++
  • How to query a database
  • Comparison between T-SQL, SQL CLR & C++ solutions

Slides and follow-up articles will be coming soon.

DDD9 – My thoughts

On Saturday, it was DDD9 (http://www.developerdeveloperdeveloper.com/ddd9). For me, and a lot of feedback I’ve seen on Twitter, it was the best DDD yet. There were six sessions over four tracks, covering a wide range of topics and there seems to be a pretty good turn out.

First, great thanks to the DDD team for organising the event, and to the presenters.

The sessions I attended were:

Collections Deep Dive by @GaryShort
This was a really good session, and very deep (you got that from the title, right?). As session 1 it was a good start to the day. It was a debate I had with myself as to attend this session, or the Async session by @WestleyL. I went for this one, as it was something I could use straight away (since Async is still based on a CTP).

Gary covered the collections that are part of .NET 4, which, for me, was really useful, since I’ve not used the majority of them (as I primarily am working on legacy .NET 2 apps at the moment).

Real World NHibernate, Fluent NHibernate and Castle Windsor by @ChrisCanal
This was a good session too, and covered how these three projects were used in an MVC app. For me, it could’ve done with an overview slide or two, however it covered a lot of technical content well.

Functional Alchemy by @MarkRendle
This was the best session of the day for me. Mark was a great presenter, and really covered everything well. Highly summarised, it covered how to do functional programming in C#. This was interesting as it helped me to understand how I could change the code I’d written to be better, and be DRY (Don’t Repeat Yourself)

Next we had lunch (Usual DDD sandwiches, crisps, fruit, Twix. 🙂 ), followed by a few grok talks.

The three grok talks (fewer to support the shorter lunch, to allow for more sessions. The sessions were on Creating data visualisation in Silverlight using Silverlight, Visio, and SharePoint 2010 with Dave McMahon (SharePoint MVP), CUDA (with @RobAshton) and a session on IronRuby. The CUDA session was good in that it covered how programming against the GPU works, though at present, it’s using another version of C (nVidia C) to program it. However, at some point, I’m sure there’ll be an API to support GPU processing in C#.

SOLID code with @NathanGloyn
This was an interesting session , as it covered the basics of Agile programming using the SOLID principles, as defined by Robert C. Martin (@UncleBob). The book (Agile Principles, Patterns and Practices in C#) that covers these principle has been on my reading list for a while, and following this session it should move up a bit.

Enforcing Code Beauty with StyleCop by @GuySmithFerrier
This session covered how StyleCop works with Visual studio, what it can be used to achieve, and how it can be modified to suit your needs. I’ve recently started to use this, through CodeRush (with @RoryBecker’s CR_StyleNinja addin). An interesting session, and covered a lot, and as always, well presented by Guy (who, of course, mentioned Internationalisation (i18n) a few times 😉 ).

Introducing PowerShell with @JimmyBoo
Something that has been on my ‘list of tools to learn better’ for a while, it seemed like a good choice. James’ session covered the basics, including what you can do, GUI’s you can use, and how you can use the features of PowerShell to pipe the outputs of one function to the input of another. An interesting session, but it could’ve done with being a little more technical, or having some code to cover creating a PowerShell function (though, that’d then not be an Introduction…) Good session though.

All in all a great day, and covered a lot of topics. As I said, the best session for me was the @MarkRendle Functional alchemy session, closely followed by @GaryShort

What’s Next?
Next up for me is SQLbits 8 in April, and hopefully, followed by DDDSW3 in June. I don’t imagine I’ll be attending DDDScot this year due to other commitments.

I’ve recently been looking at optimising a data load that I’m working on, and have got a quite a nice example of how the performance can be improved significantly by creating a CLR object.

To do this, I created a set of data (which is downloadable from here, is approx. 3mb and is extractable using 7-Zip ) which has approx. 200k records in it. The file has two columns (pipe delimited), which are a row id, and a set of events in a string (in the format “1,2,3,4,5”). Each string has a variable number of events in it (between 1 and 75), which looks something like this.

image

What I’m wanting to do is split this out, to have an Event Id, Sequence_Number and Event with the EventList above being split over multiple records, and a Sequence number showing the position of the event code in the list. Sounds straightforward.

image

I’m effectively looking for a quick way to generate this data.


T-SQL – Custom_Split

A custom script to split the table was created, looking like this:

create function [dbo].[Split] 

    @string nvarchar(4000), 
    @delimiter nvarchar(10) 

returns @table table 

    [Value] nvarchar(4000) 

begin 
    declare @nextString nvarchar(4000) 
    declare @pos int, @nextPos int 
 
    set @nextString = '' 
    set @string = @string + @delimiter 
 
    set @pos = charindex(@delimiter, @string) 
    set @nextPos = 1 
    while (@pos <> 0) 
    begin 
        set @nextString = substring(@string, 1, @pos - 1) 
 
        insert into @table 
        ( 
            [Value] 
        ) 
        values 
        ( 
            @nextString 
        ) 
 
        set @string = substring(@string, @pos + len(@delimiter), len(@string)) 
        set @nextPos = @pos 
        set @pos = charindex(@delimiter, @string) 
    end 
    return 
end 
-- This code was taken from here: http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-recor/314833#314833

This was executed in Management Studio, with this script:

SELECT EventId,
   IDENTITY (int, 1, 1) as SequenceNo,
   Value as Event
   INTO dbo.SQL_Events   
      FROM dbo.LargeFile
   CROSS  APPLY  dbo.split(dbo.LargeFile.EventList, ‘,’) AS t
   ORDER  BY dbo.LargeFile.EventId

Running this generated a table with 9.8 million rows and took 6 minutes 8 seconds, so handling approx. 26.6k records a second.


CLR – SplitEvents

To split the EventList in a CLR object, I did the following in Visual Studio 2010.

1. Choose the .NET framework, CLR Project

image

2. Right click the project when it’s created, and choose ‘Add –> Stored Procedure’.

In the body of the class, I used the following code:

using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void CLR_Split()
    {
        // Put your code here
         char[] delimiters = new char[] { ‘,’, ‘-‘ };

        using (SqlConnection conn = new SqlConnection(“Context Connection=true”))
        {
            using (SqlCommand cmd = new SqlCommand(“SELECT EventId, EventList FROM LargeFile”, conn))
            {
                SqlPipe pipe = SqlContext.Pipe;
                SqlMetaData[] cols = new SqlMetaData[3];
                cols[0] = new SqlMetaData(“EventId”, SqlDbType.Int);
                cols[1] = new SqlMetaData(“Sequence_No”, SqlDbType.Int);
                cols[2] = new SqlMetaData(“Event”, SqlDbType.Int);

                conn.Open();
                using (SqlDataReader reader = cmd.ExecuteReader())
                {
                    int iRow = 0;
                    if (reader.HasRows)
                    {
                        SqlDataRecord rec = new SqlDataRecord(cols);
                        pipe.SendResultsStart(rec);
                        while (reader.Read())
                        {
                            int iCTIQueueId = Convert.ToInt32(reader[0]);
                            string sIncoming = reader[1].ToString();
                            if (sIncoming.Length > 0)
                            {
                                string[] sSplit = sIncoming.Split(delimiters);
                                int iSplitCount = 1;
                                foreach (string s in sSplit)
                                {
                                    try
                                    {
                                        int iSplitVal = Convert.ToInt32(s);
                                        rec.SetSqlInt32(0, iCTIQueueId);
                                        rec.SetSqlInt32(1, iSplitCount);
                                        rec.SetSqlInt32(2, iSplitVal);
                                        pipe.SendResultsRow(rec);
                                    }
                                    catch (Exception)
                                    {
                                    }
                                    iSplitCount++;
                                }
                            }
                            iRow++;
                        }
                        pipe.SendResultsEnd();
                    }
                    else
                            pipe.Send(“”);
                }
                conn.Close();
            }
        }
    }

};

Having compiled and deployed this to SQL Server, it could be run using this :

INSERT INTO CLR_Events (EventId, Sequence_No, Event)
EXEC CLR_Split

This process created the 9.8 million rows in approx. 2 minutes 59 seconds, so handling approx. 54.7k records a second


Summary

So, having run these two processes, you can see that the CLR process is much faster.

Process Time Taken Rec / Sec
T-SQL Stored Procedure 6 minutes 8 sec 26.6k
CLR Procedure 2 minutes 59 sec 54.7k

Notes

When running these tests, I was running SQL Server 2008 R2, on a i5-430m laptop, with 4Gb RAM.

The database had plenty of space pre-allocated to it, so there was no performance hit based on growing the database.

Playing with new toys

Recently I’ve had the need to create a new website, to host various sets of media files (including pictures and videos). I’ve been wanting for a while to have a look as ASP.NET MVC, so this was a great opportunity.

I created a website (http://nhaslam.com/primus) and have uploaded it with a set of basic (basic as in, out of the box) templates.  It was an interesting experience creating a set of views to display images, and the main thing I found was  ‘Don’t over complicate things’. 

I also had to load some videos up there, and as there was going to be a fair amount of them (approx 2gb), and I didn’t really want them sitting on my hosting site (sucking bandwidth), I wanted to host them on my Amazon S3 account. Uploading them was easy with Cloudberry Explorer for S3.

Getting them to stream from Amazon was a little more complicated, but was helped by the mass of websites (Google is your friend!) that went through the issues. The main issue being the ClientAccessPolicy.xml file. Making some changes to that, then meant that I could have my video files hosted on S3, and streamed to the visitors Browser.

Thank you Internet.

Comments are welcome. Smile

DDD Southwest writeup.

This is a ‘reprint’ of an article I’ve written for our company (TAH Ltd) magazine.

clip_image002So on the 23rd May 2009 (the last Bank holiday Saturday in May), I was up at 5.30 in the morning, and on the road driving down to Taunton in Somerset. Mad, absolutely. But it was lovely weather, and sunny, before the bank holiday traffic hit the road, and air conditioning in the car always makes for a pleasant drive.

I also got to see Stonehenge appear, through the early morning mist, which was incredible. No picture of that though, as I was driving….

Anyway, why was I up and out that that unearthly time ? well, there was a Microsoft Developer community event running down in Taunton, free too, so why not attend and see what’s new ?

This year there has been a good number of free Community events run, primarily because everyone is aware that most companies are financially strapped at the moment, so realistically, the only way for people to keep up to date on the latest developments is to make it free, and out of working hours. With this in mind, there have been in-person events at Microsoft (such as the Web DD event a month or so ago (see my blog article in the Consultants Blog in SharePoint), and events held by User groups (such as DevEvening in Woking, held in a Pub ). These events have been covering new technologies released by Microsoft (SQL Server 2008, Windows 7 RC), and upcoming products such as Visual Studio 2010, Office 14, and Windows Server 2008 R2. My intention is to write articles about these products (probably not Visual Studio 2010, as you really don’t care about that.).

So, the DDDSW (Developer Developer Developer South West!) event was organised by the Developer user group in Bristol. Its the third Developer event in the past few months (with previous events held in Belfast and Glasgow). It covered a number of interesting (actually interesting) topics including the new version of Silverlight (Microsoft equivalent of Flash), the Microsoft Clould platform, Refactoring , the new version of C#and even the video gaming platform that Microsoft have produced. I’ll run through some of these, though I’ll probably stick to the ones that might be relevant. If you are interested in more on any of these, please ask. I’d be delighted to tell you more.

Firstly, lets talk about the new version of Silverlight. You might not know it, but silverlight is now on it’s third version, and is used by a wide variety of websites (its a web technology…). It was one of the main technologies used to broadcast the Olympic events from Beijing last year over the internet, and is used to varying degrees to make websites more interactive. It has a number of advantages over Flash (currently owned by Adobe), in that the installation is smaller, has more functionality available for it, can be created by a far larger group of people (developers) and the resulting Silverlight applications can (but don’t have to be) be hosted by Microsoft (on their servers). This hosting reduces the impact on your web servers, and thereby increases speed. So what does Silverlight 3 give you? It gives a wider range of graphical effects, allows you create standalone applications (right click on the website, and say Install on desktop, useful for any web application) and it also now supports Hi definition Video. You’ll already be aware of the predominance of videos throughout the internet at the moment (not just Youtube, but the majority of websites have some form of ‘moving media’ which is either Flash, Silverlight or Video). Higher resolution screens, mean that low resolution videos look terrible, hence high resolution videos.

Azure is the Microsoft Cloud platform, and will be used to support some of the features of the next versions of SQL Server, Office, SharePoint and Dynamics CRM. You may already have heard of the Cloud computing platform that Amazon have (Amazon S3 is online storage, and Cloud computing is available for online virtual servers). Cloud services are becoming more and more widely used as they allow companies to access significantly more computing power and storage than would be economically viable to have in house. The advantage for Microsoft is that you pay them a monthly few (a few cents for gigabytes of storage, or hours on a server). The advantage for the customer is that they don’t need to have all the hardware they might need in the future, it is very highly and easily scaleable (one server this week, 20 next week, with seconds notice), and also that the back up is carried out by Microsoft too (all stored data is replicated in at least three places). Speed of access to these services is then based on the speed of your internet connection.

Refactoring is pretty complicated and generally not particularly interesting, even to me, but suffice to say that it makes code and products more reusable by making them more generic. For example, a car or a plane is very specific, whereas a vehicle is more generic, as cars and planes are both vehicles, but they do both have specific features.

C# version 4 (when it is released later in the year) gives a wide range of new features, but the most interesting one by almost all the people at the event was that it would allow you greater flexibility with the development of Office applications. Effectively, it would allow you to develop for Office 2003, and the same produced application would work with Office 2007, and Office 14 (in theory) when released. This is an incredible feature, since it means that all the custom work created in Office 2003 (for everyone who hasn’t moved to Office 2007 yet, and there are a huge number) will just need to recompile the code in C#4 (straightforward) rather than re-write it (complete nightmare).

There was an interesting session on the Windows Communications Foundation (WCF) also, which talked about new ways of integrating systems together. This is all based around XML, and Web services. Most systems (ERP, CRM, back end systems) have methods for creating or recieving XML documents. WCF enables the interaction of these documents between some system and a custom system. The information therein can be transformed, manipulated and forwarded on to some other system. The Microsoft product BizTalk Server takes this one step further by ensuring that the messages (XML Documents) are completely trackable from the point they leave the source system to the point they reach their destination. This then allows corporate systems such as those used by Banks to be utterly resilient to error.

There was also an interesting session on the Microsoft XNA platform, which allows the creation of games. It allows someone to create a reasonably professional looking game within a few hours. It handles all the complicated stuff such as sound, graphics, collision detection and menuing, which then leaves you free to do the design and plot of your game. Does anybody really care about a load of games, created by techies ? well, Microsoft have also made it possible for games created using XNA to be published to the Xbox Marketplace, so you can download them onto your Xbox and play them on there. This then gives you a access to a huge number of potential players. It’s all interesting stuff, if you fancy your hand at designing games, rather than just playing them. I did
used to do this on my Atari ST 15 or 20 years ago (can’t believe it was that long ago…), so it’s interesting for me…

I’m writing this on the train down to Seaford, to see the sea, and my in-laws, and we’re nearly there, so I’m going to stop now. Hope you enjoyed it, and if you have questions, please let me know.

If you’d like to read more, the conference website is here: http://www.dddsouthwest.com/

There’s photos from the event here too: http://www.flickr.com/photos/dddsouthwest

WebDD09 Conference

I attended the Webdd09 conference at Microsoft Reading on Saturday (yes, a Saturday…).

Was a really interesting group of sessions, covering the new features coming in Visual Studio 2010, ASP.NET 4 and Silverlight 3. It also covered some of the features of IIS 7, which has been out for a while now.

IIS 7 can automatically set up URL Rewriting. Sounds dull you say, but it effectively means that it’ll setup your website for Search Engine optimisation. Whats that you say… Well it’ll convert this:

http://www.tah.co.uk/Whatweoffer/Pages/WhereScapeRED.aspx

to this:

http://www.tah.co.uk/Products/WhereScapeRED.aspx

without you having to change anything on your website. This is a good thing as Search Engines, such as Google (84 % of the Market share) index pages based on the URL.  This is a good thing to know as Google have stopped taking into account embedded keywords on websites (the meta tags). In fact, the only meta tag they are now taking account of is the Description one. Useful stuff to know.

If you’d like to see more of the event, including the more technical presentations, you’ll be able to view the videos and PowerPoint’s here soon:

http://developerdeveloperdeveloper.com/webdd09

Ok, this is complicated to explain.

Say, for example, that you want to have a link on a website, that a user can click on, to gain access to their Home Directory (which is retrieved from the Active Directory field).

Ok, kind of a specific need, I know.

What do you need to do this:

1. Create a linked server connection, called ADSI, with the following :

sp_addlinkedserver ‘ADSI’, ‘Active Directory Service Interfaces’, ‘ADSDSOObject’, ‘adsdatasource’

2. Create a website in IIS, ensure that authentication is Windows Integrated, and not Anonymous.

3. Create an ASPX page, and give it a FormView (or data enabled tool).

4. Give it a SQL Server Data source, with a query similar to the one below:

SELECT ‘file:’+REPLACE(homeDirectory,’\’,’/’)
FROM OpenQuery( ADSI,’LDAP://AD_DETAILS;
(&(objectCategory=Person)(objectClass=user));
homeDirectory, sAMAccountName, distinguishedName’)
where sAMAccountName in (select SUBSTRING ( SYSTEM_USER , CHARINDEX(‘\’,SYSTEM_USER)+1, len(SYSTEM_USER)-CHARINDEX(‘\’,SYSTEM_USER)) )

This query will return the home directory, from ADSI Linked Server, for the current user.

5. All you then need to do is bind the returned field to a button. You can do this with the following:

<asp:HyperLink ID=”HyperLink1″ runat=”server” NavigateUrl=’<%# Bind(“homeDirectory”) %>‘> mylink is here</asp:HyperLink>