logo-micron

Add Bookmark(s)


To:

Email


Bookmark(s) shared successfully!

Please provide at least one email address.

VSAN Demo 2014: A How-To Guide

VSAN Demo 2014: A How-To Guide

Now that the team is back from VMWorld, we wanted to share the configuration for our VSAN demo that drew so much attention. This blog is intended to give you the high-level view of how we created the demo and provide some ideas for your own VSAN explorations. We documented other information such as node configurations, VSAN observer settings, and fio test workloads but didn’t provide it here for brevity’s sake. We’d love to hear comments and questions from anyone who wants to know more; post your comments below.

Our primary goal was to demonstrate best-in-class VSAN performance and show how that compared to a standard VSAN configured with SAS HDDs. One of the most interesting aspects of our configuration was that our M500 client SSDs were actually less expensive than the SAS 10K HDDs.

Equipment

Our cluster consisted of 6 units of the following:

Quantity Description Notes
1 Dell R620 config (see table below) 10 drives, dual 10 Gb/E, dual E5-2697v2 (12-core, 2.7 GHz)
2 Micron 1.4TB P420m PCIe SSD Part# MTFDGAR1T4MAX-1AG1Z
10 Micron 960GB M500 Part# MTFDDAK960MAV-1AE12ABYY
24 Micron 32GB PC3-14900 LRDIMM Part# MT72JSZS4G72LZ-1G9E2A7
1 Lexar 8GB USB drive For ESXi boot device

To compare our all-SSD configuration to a standard HDD configuration, we used: Seagate Enterprise Performance 1.2TB 10K HDD v7 (Part# ST1200MM0017).

The table below outlines the purchases we made from Dell for the PowerEdge R620 server configuration. Irrelevant or user-optional items (like bezel, power cords, and warranty options) have been omitted:

Quantity Description Product Code
1 PowerEdge R620: PowerEdge R620, Intel® Xeon® E-26XX Processors R620IB
1 Chassis Configuration: Chassis with up to 10 Hard Drives and 3 PCIe Slots 10H3P
1 Processor: Intel® Xeon® E5-2697 v2 2.70 GHz, 30M Cache, 8.0GT/s QPI, Turbo, HT, 12C, 130W, Max Mem 1866 MHz E52697V
1 Additional  Processor: Intel® Xeon® E5-2697 v2 2.70 GHz, 30M Cache,  2E52697 8.0 GT/s QPI, Turbo, HT, 12C, 130W 2E52697
1 RAID Configuration: RAID 0 for H710P/H710/H310 (1-10 HDDs) R0H7H3
1 RAID Controller: PERC H710 Integrated RAID Controller, 512MB NV Cache R0H7H3
1 Select Network Adapter: Intel Ethernet X540 DP 10Gb BT + I350 1Gb BT DP Network Daughter Card X540DC
1 Power Supply: Dual, Hot-Plug, Redundant Power Supply (1+1), 1100WSingle, Hot-Plug Power Supply (1+0), 750W RPS1100NPS750
1 Power Management BIOS Settings: Power-Saving Dell Active Power Controller DAPC

BIOS Configuration

The BIOS configuration we used for the VSAN hosts is no different than our standard BIOS settings for ESXi hosts.  Because we were benchmarking and trying to measure optimal performance, we used the Performance profile along with the following settings:

Configuration Option Setting
Memory Operating Mode Optimizer
Node Interleaving Disabled
Alternate RTID Disabled
Logical  Processor (HyperThreading) Enabled
QPI Speed Max
I/O AT DMA Engine Enabled
SR-IOV Global Enable Enabled
Memory Map I/O Above 4GB Enabled

USB Boot

To optimize storage, we installed ESXi to a USB drive, which freed up a drive bay on the server for a VSAN storage drive. Scratch data was put on an NFS server as recommended for hosts with greater than 512GB of RAM.

Storage Controller

An array of 10 SAS HDDs is capable of less than 5K IOPS in a random workload. Ten of our M500 drives can sustain 100X that performance so, if anything, it is even more important to get the controller configuration right with an all-SSD configuration. The best controller option from Dell, in our opinion, is the H710. (The H310 is reported to have poor performance stemming from an incredibly low queue depth.) The H710 is a RAID controller and lacks a pass-through mode; therefore, we have to create individual RAID 0 volumes—one per physical M500 disk. The RAID 0 volumes should be set with the minimum stripe size (64KB in this case), no read ahead, write-through cache, and with the disk write buffer disabled.

An HBA or controller with pass-through requires a step to allow the VSAN to see the M500 as a HDD. Find more details on that procedure

esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -M Micron_M500_MTFD -o disable_ssd

reboot

This configuration uses the P420m, a PCIe SSD, as the SSD cache. It should be noted that because it’s a PCIe SSD the P420m does not connect to the storage controller.

P420m Driver

The P420m has an “inbox” driver for ESXi 5.5. We recommend updating the driver and firmware. You can find support releases for the P420m on our website. Toward the bottom of the screen, you can find a link to the Linux/VMware driver support pack. Download the current version. As of this writing, the most recent version is B144.04.00.

First from workstation:

unzip B144.04.00_Linux_VMware.zip

     scp "B144.04.00_Linux_VMware/VMware Driver/mtip32xx-native-3.8.2-esxi55-cert.zip"  root@esx_hostname:/

     scp "B144.04.00_Linux_VMware/RealSSD Manager/VMWare/ESX5.5/rssdm" root@esx_hostname:/scratch/rssdm

     scp "B144.04.00_Linux_VMware/Unified Image/B144.02.00.ubi"  root@esx_hostname:/



Second from ESX shell:

/scratch/rssdm -T /B144.02.00.ubi -n 0 -r

     /scratch/rssdm -T /B144.02.00.ubi -n 1 -r

     esxcli software vib install -d /mtip32xx-native-3.8.2-esxi55-cert.zip --no-sig-check

Networking

The following diagram outlines the host servers and network interconnects between them. It should be noted that while our VSAN operates on a network configuration with distributed switches, the infrastructure hosts do not. We chose to do this because our infrastructure needs to support more than any single network configuration; in particular, VSAN, login VSI, and VMmark. We have found that the VSAN setup works best with distributed switches, and we highly recommend them when using clusters of hosts. In this configuration we chose to give the infrastructure access to all network points, which is not required for normal use. Our assertion is that we may need to do packet sniffing or test operations over each network, and having access to it via the infrastructure gives us flexibility as to where we put the VMs to perform testing.

View image in full size
Enlarge +

Disk Group Details

The diagram below outlines the VSAN storage configuration from a single-host perspective. A disk group is the element of storage configuration in VSAN and consists of one caching SSD and between one and seven "data" drives. The VSAN 2014 demo configuration has two 1.4TB P420m cache drives and ten 960GB M500 drives organized into two disk groups consisting of one P420m and five M500 drives. The end result is 9.4TB of storage space and a 1.9TB read cache, for a 1:5 cache to data ratio. The unrepresented cache space is used for a write buffer, for a total of 840GB on each host.

The illustration further shows how this disk group configuration is viewed by VSAN for a single host and illustrates the data flow into and out of a disk group. Notice that the block allocation unit on the data storage is 1MB. This means that the I/O seen by the data storage is a 1MB random read/write workload. For an SSD, this is extremely good news for endurance. Most of the endurance specifications for SSDs are measured against 4K random workloads, and VMware VSAN specifications are based around 8K random workloads. A 4K or 8K random workload is actually the hardest workload on an SSD in terms of endurance. Thus, a 1MB random read/write workload improves the drive's ability to make drive reclaim operations much more efficient during cleanup operations. This means lower overall write amplification, meaning fewer writes are going on in the background.

Note also that the write buffer is the primary storage entity that is significantly impacted with writes that vary in I/O size and rate. Our understanding of the read cache workload is not complete, but we are hypothesizing that the write workload to the read cache (upon cache fill) is relatively consistent at 1MB, which is the same I/O size range as the data storage. We also believe that the read cache will experience varying I/O read sizes because even though the allocation unit is 1MB, there is nothing stopping the storage system from issuing smaller block reads and writes to the overall storage subsystem.

View image in full size
Enlarge +

Wrap-Up

VMware’s VSAN software, coupled with this configuration, shows it is possible to a build a supercharged virtualization solution that is scalable, easy to deploy, power efficient, dense, self-contained, and supercharged. The performance differences between this all-SSD solution and the usual hybrid solution are striking when measured in I/O latency and end-application responsiveness. We measured surprising differences in VDI performance using LoginVSI to benchmark a Horizon View configuration. We are also working through the process of vetting our results from synthetic benchmarks and sysbench MySQL testing. What we’ve seen looks pretty interesting, and we are excited to share what we’ve found. Let us know what workloads you’re interested in seeing and any questions you have about what we learned.

Login or Sign Up Now for an account to leave a comment.