Category: VS2010


DDD9 – My thoughts

On Saturday, it was DDD9 (http://www.developerdeveloperdeveloper.com/ddd9). For me, and a lot of feedback I’ve seen on Twitter, it was the best DDD yet. There were six sessions over four tracks, covering a wide range of topics and there seems to be a pretty good turn out.

First, great thanks to the DDD team for organising the event, and to the presenters.

The sessions I attended were:

Collections Deep Dive by @GaryShort
This was a really good session, and very deep (you got that from the title, right?). As session 1 it was a good start to the day. It was a debate I had with myself as to attend this session, or the Async session by @WestleyL. I went for this one, as it was something I could use straight away (since Async is still based on a CTP).

Gary covered the collections that are part of .NET 4, which, for me, was really useful, since I’ve not used the majority of them (as I primarily am working on legacy .NET 2 apps at the moment).

Real World NHibernate, Fluent NHibernate and Castle Windsor by @ChrisCanal
This was a good session too, and covered how these three projects were used in an MVC app. For me, it could’ve done with an overview slide or two, however it covered a lot of technical content well.

Functional Alchemy by @MarkRendle
This was the best session of the day for me. Mark was a great presenter, and really covered everything well. Highly summarised, it covered how to do functional programming in C#. This was interesting as it helped me to understand how I could change the code I’d written to be better, and be DRY (Don’t Repeat Yourself)

Next we had lunch (Usual DDD sandwiches, crisps, fruit, Twix. 🙂 ), followed by a few grok talks.

The three grok talks (fewer to support the shorter lunch, to allow for more sessions. The sessions were on Creating data visualisation in Silverlight using Silverlight, Visio, and SharePoint 2010 with Dave McMahon (SharePoint MVP), CUDA (with @RobAshton) and a session on IronRuby. The CUDA session was good in that it covered how programming against the GPU works, though at present, it’s using another version of C (nVidia C) to program it. However, at some point, I’m sure there’ll be an API to support GPU processing in C#.

SOLID code with @NathanGloyn
This was an interesting session , as it covered the basics of Agile programming using the SOLID principles, as defined by Robert C. Martin (@UncleBob). The book (Agile Principles, Patterns and Practices in C#) that covers these principle has been on my reading list for a while, and following this session it should move up a bit.

Enforcing Code Beauty with StyleCop by @GuySmithFerrier
This session covered how StyleCop works with Visual studio, what it can be used to achieve, and how it can be modified to suit your needs. I’ve recently started to use this, through CodeRush (with @RoryBecker’s CR_StyleNinja addin). An interesting session, and covered a lot, and as always, well presented by Guy (who, of course, mentioned Internationalisation (i18n) a few times 😉 ).

Introducing PowerShell with @JimmyBoo
Something that has been on my ‘list of tools to learn better’ for a while, it seemed like a good choice. James’ session covered the basics, including what you can do, GUI’s you can use, and how you can use the features of PowerShell to pipe the outputs of one function to the input of another. An interesting session, but it could’ve done with being a little more technical, or having some code to cover creating a PowerShell function (though, that’d then not be an Introduction…) Good session though.

All in all a great day, and covered a lot of topics. As I said, the best session for me was the @MarkRendle Functional alchemy session, closely followed by @GaryShort

What’s Next?
Next up for me is SQLbits 8 in April, and hopefully, followed by DDDSW3 in June. I don’t imagine I’ll be attending DDDScot this year due to other commitments.

I’ve recently been looking at optimising a data load that I’m working on, and have got a quite a nice example of how the performance can be improved significantly by creating a CLR object.

To do this, I created a set of data (which is downloadable from here, is approx. 3mb and is extractable using 7-Zip ) which has approx. 200k records in it. The file has two columns (pipe delimited), which are a row id, and a set of events in a string (in the format “1,2,3,4,5”). Each string has a variable number of events in it (between 1 and 75), which looks something like this.

image

What I’m wanting to do is split this out, to have an Event Id, Sequence_Number and Event with the EventList above being split over multiple records, and a Sequence number showing the position of the event code in the list. Sounds straightforward.

image

I’m effectively looking for a quick way to generate this data.


T-SQL – Custom_Split

A custom script to split the table was created, looking like this:

create function [dbo].[Split] 

    @string nvarchar(4000), 
    @delimiter nvarchar(10) 

returns @table table 

    [Value] nvarchar(4000) 

begin 
    declare @nextString nvarchar(4000) 
    declare @pos int, @nextPos int 
 
    set @nextString = '' 
    set @string = @string + @delimiter 
 
    set @pos = charindex(@delimiter, @string) 
    set @nextPos = 1 
    while (@pos <> 0) 
    begin 
        set @nextString = substring(@string, 1, @pos - 1) 
 
        insert into @table 
        ( 
            [Value] 
        ) 
        values 
        ( 
            @nextString 
        ) 
 
        set @string = substring(@string, @pos + len(@delimiter), len(@string)) 
        set @nextPos = @pos 
        set @pos = charindex(@delimiter, @string) 
    end 
    return 
end 
-- This code was taken from here: http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-recor/314833#314833

This was executed in Management Studio, with this script:

SELECT EventId,
   IDENTITY (int, 1, 1) as SequenceNo,
   Value as Event
   INTO dbo.SQL_Events   
      FROM dbo.LargeFile
   CROSS  APPLY  dbo.split(dbo.LargeFile.EventList, ‘,’) AS t
   ORDER  BY dbo.LargeFile.EventId

Running this generated a table with 9.8 million rows and took 6 minutes 8 seconds, so handling approx. 26.6k records a second.


CLR – SplitEvents

To split the EventList in a CLR object, I did the following in Visual Studio 2010.

1. Choose the .NET framework, CLR Project

image

2. Right click the project when it’s created, and choose ‘Add –> Stored Procedure’.

In the body of the class, I used the following code:

using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void CLR_Split()
    {
        // Put your code here
         char[] delimiters = new char[] { ‘,’, ‘-‘ };

        using (SqlConnection conn = new SqlConnection(“Context Connection=true”))
        {
            using (SqlCommand cmd = new SqlCommand(“SELECT EventId, EventList FROM LargeFile”, conn))
            {
                SqlPipe pipe = SqlContext.Pipe;
                SqlMetaData[] cols = new SqlMetaData[3];
                cols[0] = new SqlMetaData(“EventId”, SqlDbType.Int);
                cols[1] = new SqlMetaData(“Sequence_No”, SqlDbType.Int);
                cols[2] = new SqlMetaData(“Event”, SqlDbType.Int);

                conn.Open();
                using (SqlDataReader reader = cmd.ExecuteReader())
                {
                    int iRow = 0;
                    if (reader.HasRows)
                    {
                        SqlDataRecord rec = new SqlDataRecord(cols);
                        pipe.SendResultsStart(rec);
                        while (reader.Read())
                        {
                            int iCTIQueueId = Convert.ToInt32(reader[0]);
                            string sIncoming = reader[1].ToString();
                            if (sIncoming.Length > 0)
                            {
                                string[] sSplit = sIncoming.Split(delimiters);
                                int iSplitCount = 1;
                                foreach (string s in sSplit)
                                {
                                    try
                                    {
                                        int iSplitVal = Convert.ToInt32(s);
                                        rec.SetSqlInt32(0, iCTIQueueId);
                                        rec.SetSqlInt32(1, iSplitCount);
                                        rec.SetSqlInt32(2, iSplitVal);
                                        pipe.SendResultsRow(rec);
                                    }
                                    catch (Exception)
                                    {
                                    }
                                    iSplitCount++;
                                }
                            }
                            iRow++;
                        }
                        pipe.SendResultsEnd();
                    }
                    else
                            pipe.Send(“”);
                }
                conn.Close();
            }
        }
    }

};

Having compiled and deployed this to SQL Server, it could be run using this :

INSERT INTO CLR_Events (EventId, Sequence_No, Event)
EXEC CLR_Split

This process created the 9.8 million rows in approx. 2 minutes 59 seconds, so handling approx. 54.7k records a second


Summary

So, having run these two processes, you can see that the CLR process is much faster.

Process Time Taken Rec / Sec
T-SQL Stored Procedure 6 minutes 8 sec 26.6k
CLR Procedure 2 minutes 59 sec 54.7k

Notes

When running these tests, I was running SQL Server 2008 R2, on a i5-430m laptop, with 4Gb RAM.

The database had plenty of space pre-allocated to it, so there was no performance hit based on growing the database.