Partitioning in SQL Server

Filed under: Optimization, SQL Server — Leave a comment

22/08/2011

During the my preparation for the 70-451 Certification, I did some work on Partitioning. I found this to be quite an interesting exercise, so wanted to share it.

There are a number of reasons that you could do Partitioning, though primarily they are related to performance, and easier maintenance. However, while you can get an increase in performance, it isn’t the answer to all your problems.

The performance gain is through the ability to have a database table spread over multiple I/O devices (through filegroups), though this also gives you the ability to do partial database restores.

Step 1 – Create a database

The First step, is obviously to create a database to play with.

CREATE DATABASE [sandpit] ON  PRIMARY
( NAME = N'sandpit', FILENAME = N'C:\temp\sandpit.mdf' , SIZE = 200mb ,
   MAXSIZE = UNLIMITED, FILEGROWTH = 256000KB )
 LOG ON
( NAME = N'sandpit_log', FILENAME = N'C:\temp\sandpit_log.ldf' ,
   SIZE = 50Mb , MAXSIZE = 2048GB , FILEGROWTH = 256000KB )
GO
use sandpit;
go

Step 2 – Create a Partition Function

The Partition function is used to determine where data appears in the partitions. You specify it as left or right, so the partition value is either to the Left or to the Right. The sample below is right, so 15/10/1992 (19921015) appears in the second partition.

Partition 1 being infinity to 17530101

Partition 2 being 17510101 to 19990101

Partition 3 being 20000101 to 20101231 etc

create partition function myPF (datetime)
as range right for values
('17530101','20000101','20100101','20110101','20120101','20130101','99990101')

Step 3 – Create a Partition Scheme

The Partition Scheme is used to determine which file group the data goes into. There needs to be as many entries in here, as there are in the Partition Function. You cannot specify less, and if you specify more, they will be used in the next partitions (so if you use the Split function, coming later on!). Also in here, we specify the Partition function that we defined previously, to link the two together.

CREATE PARTITION SCHEME [myPS] as Partition [myPF]
to ([PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY])

Step 4 – Create a table using the Partition Scheme

Next we need to create a table. Rather than specifying ‘on <filegroup>’ as normal, we specify the Partition Scheme, and the field used to partition the data. In this case, I’m using the date field for partitioning.

CREATE TABLE [dbo].[myPTable](
    [pKey] [bigint] IDENTITY(1,1) NOT NULL,
    [pDateTime] [datetime] NOT NULL,
    [uidGuid] [uniqueidentifier] NULL,
    [tDesc] [varchar](100) NULL,
 CONSTRAINT [PK_myPTable] PRIMARY KEY NONCLUSTERED
(
    [pKey] ASC,    [pDateTime] asc
))
on     myPS(pDateTime);

This will create our table, but we need data in it, so if you run this script for a while, it’ll populate the table with some random data.

while 1=1
begin
    insert into myPTable
        (pDateTime, uidGuid,tDesc)
    Values (
        dateadd(day, -1 * abs(convert(varbinary, newid()) % ((200*365))),
        dateadd(year,2,getdate())), NEWID(),GETDATE())
end

This script will continue until you run out of disk space, or until the end of time, so you can stop it. You can check the spread of data by running this script. This queries the location of data, gets the partition number, Row count, Min and Max values for the table.

select $partition.myPF(pDateTime) as PartitionNumber, COUNT(1) as RowsInPartition,
    MIN(pDateTime) as MinDateInPartition,MAX(pDateTime) as MaxDateInPartition from myPTable
group by $partition.myPF(pDateTime)

This gives me:

Step 5 – Splitting the Partition

As you should (hopefully) see from the query above, there will be significantly more data in the 2nd Partition, than in the others. To help with this, we can split this partition. This can be achieved in two steps: First add a filegroup to the Partition Scheme, then add a split to the Partition Function.

alter partition scheme myPS next used [Primary]
alter partition function myPF() split range ('19500101')

Running the distribution script above, now gives me:

There are still quite alot in the 2nd Partition, so lets Split again:

alter partition scheme myPS next used [Primary]
alter partition function myPF() split range ('19000101')

Now we get:

Extra Credit 1 – Data Compression by Partition

Partitioning is an Enterprise (and Developer!) edition feature, and so is Data Compression. Given this, we can use data compression on the Partitions, and also have different compression levels on each Partition. So, by using the script below, we can have Page compression on Partition 2 and Row compression on 3-5. (Creating Compressed Tables and Indexes – MSDN)

USE [sandpit]
ALTER TABLE [myPTable]
REBUILD PARTITION = ALL
WITH
(
DATA_COMPRESSION = PAGE ON PARTITIONS(1 to 2),
DATA_COMPRESSION = ROW ON PARTITIONS(3 TO 6)
) ;
GO

Also note, that if you split a compressed table, the new partition will keep the compression from the partition before it was split.

Extra Credit 2 – Data Compression by Index

Interestingly, you can also change the Partitioning on an index. This can be carried out in the following manner:

create clustered index IX_myPTablePKey
on myPTable(pKey,pDateTime)
with (data_compression = Page on Partitions(1 to 2),
data_compression = row on Partitions(3 to 7))

There is further information around Creating Compressed Tables and Indexes here.

Tags: Optimisation, SQL Server, T-SQL

Comments RSS feed

Dev Blog