T-SQL Tuesday again, and this month it’s hosted by Amit Banerjee at TroubleshootingSQL.
One of the things that I’ve become more aware of, due to preparation for the MCM certification and working on larger data warehousing projects is that multiple database file can always give you a performance improvement.
From testing that I’ve done, this is even apparent on small local databases.
As part of a series of blog posts that I’m doing, around the TPC-H benchmarks, I’ve been loading and and reloading a dataset of approx. 8.6 million records (in 8 entities). This dataset is an example of the default, 1Gb set from the TPC-H benchmark (downloadable here (approx 276mb), or you can read my previous blog article on creating it yourself)
To get some decent figures to show how how much of an improvement you can get with multiple files, I’ve created a script which does the following steps.
- Creates the database (2Gb per database file, and 512mb for the log file)
- Creates the tables
- Bulk loads data using a set of flat files
- Gives a count of each of the tables
A copy of the script is available here.
I carried out a few different tests, based on :
- Single or Multiple files
- Different Media
- Running on the C drive (5400rpm SATA drive)
- Running on USB Pen Drives
- Running on an eSATA drive
- Splitting over multiple media
- Having the Transaction log stored separately
The results I found are shown below (times are shown in Minutes, Seconds, milliseconds (mm:ss:ms) )
The benchmark is the run on a single file, on my internal drive.
DB Build is the time to create the database, note that I’m using Instant File Initialisation, and so should you (unless you have a very good reason not to!)
Data load is the time to build the tables and load them
Effectively, you can see the following:
- Regardless of media, Multiple files always give a performance improvement
- USB Pen Drives are rubbish. Don’t use them for databases
- A fast drive, separate to the O/S, and separate from the System database will give a significant improvement
The best performance I managed to achieve was with the multiple database files, using the eSata drive.
However, given that I work primarily on a laptop, the fact that I can get a huge improvement (over a third!) by using multiple database files on the internal drive is impressive.
I’d be interested to know how much of an improvement you get on this, how much does your mileage vary ?
Thanks for reading, and thanks to Amit for hosting.