Kick Your Microsoft SQL Server BI/DSS Platform Into High Gear
Enterprise-grade SSDs have become a mainstay of IT’s data centers. Because they have no moving parts, they excel with highly random workloads. No moving parts mean that there is no delay when reading randomly distributed data – no waiting for the head to swing back and forth in an SSD. We see SSDs deployed very effectively with workloads like online transaction processing, application acceleration, and virtual desktops/servers, as well as many more of these highly random workloads. Enterprise SSDs are the king here.
But all IT workloads aren’t random. In fact, one key workload – Business Intelligence/Decision Support Systems (or BI/DSS) – tends to be very sequential. Not only that, but BI/DSS also tends to read data in very large chunks. A highly sequential, large-chunk workload sounds ideal for hard disk drives (HDDs).
SSDs can’t offer any advantage for this particular workload…or can they?
Head-to-Head: BI/DSS Performance Measurement
Like most would, we assumed that due to the nature of a BI/DSS workload, we would see HDDs do quite well and SSDs, at best, just keep up. To test that theory, we measured some real system performance – and we got a surprise.
We started with one of the most commonly deployed Relational Database Management Systems (RDBMS) platforms – Microsoft’s SQL Server. We took a standard server (a Dell 2P) and installed a dozen enterprise-grade HDDs. To keep it simple, we set up a RAID 5 array and placed a 442GB database on it. We used a large database for two reasons: (1) we wanted to make sure we were looking at a reasonably sized data set; and (2) we wanted to make sure that the data did not fit into memory because we wanted to make sure we saw disk I/O.
We then sent the 22 business-oriented queries to the database and timed how long they took to complete. We did it once for a single user and again for six users (to add more load to the system). Knowing that BI/DSS performance is closely tied to storage I/O performance, we also measured the throughput of the disk system for each test. We did the same tests on the same databases with the same user counts – once for HDDs and twice for SSDs (first with only 6 SSDs, then again with 12).
The first surprise we got was how much faster the queries completed on the SSD tests. And not by just a little bit! Given that this was a predominantly sequential workload, we were surprised. By moving from 12 enterprise HDDs to 6 M510DC SSDs the 6-user query completion time was reduced by 75% compared to the legacy disk array! When we moved to 12 M510DC SSDs, it was reduced by 86%! Remember, this was a highly sequential workload!
We knew that at the application level, BI/DSS performance would be very tightly coupled to storage performance, so we dug a bit deeper into the data to compare the raw storage performance. Surprise number two: The data showed us that the M510DC array was 2.6X to 5.6X the throughput of the HDD array. Since all other variables were held constant, this difference (like query completion time) was a direct result of the storage configuration.
Many of us tend to presume that certain storage devices fit best into certain workloads, but we do this without testing – without any real data to back up those assumptions. We assume that since a workload is highly sequential that HDDs will excel and SSDs will offer no particular value. We may completely ignore the possibilities that a different approach can bring.
We should stop doing that. We may just be surprised!