This blog post originally appeared on VirtuallyBrave on September 17, 2016. Andrew Braverman is a global pre-sales leader at Micron Technology. Andrew regularly blogs on VirtuallyBrave about the cloud, storage, transformation, software and other interesting topics.
The reviewers have been raving about the Micron 9100. I, for one, agree with their reviews, having tested a 9100PRO and witnessing such incredible performance that I was tempted to get a fire extinguisher in case it started smoking. (Don’t worry, it didn’t catch on fire. I do, however, have a great story, direct from the fire chief, about a HOT server self-igniting — feel free to ask me about that one next time you see me in person). No matter how awesome the 9100 really is, though, we have witnessed first-hand that it doesn’t solve all problems. This episode will discuss a few of the key areas where NVMe, and thus the 9100, fit really well. Look for these things and go make them better with 9100s.
First, however, two minutes on NVMe. We had PCIe SSDs, why did we need a new thing called NVMe? It’s important to understand that NVMe is a protocol, not a connector or an interface. All of our NVMe drives connect to servers over PCIe. Previous generation PCIe drives (Micron P420m, for example) generally used something called Advanced Host Controller Interface (AHCI) as a protocol. AHCI was designed for controllers with SATA disks behind them, which means it has two hops: commands go from the host (“computer” for the account managers out there) to the controller, and then the controller sends commands to the drive. Data flow is the opposite. These two hops, even happening on the same PCIe drive, cause a good amount of latency. Remember that latency is time, and time is money, and nobody want to waste money on the two hops of AHCI. NVMe eliminates the two hops — the storage device itself receives commands from the host (remember — host = computer) and thus operates more quickly. We shouldn’t forget that the drive directly on PCIe also eliminates the slow SATA (or SAS) connection between the controller and the drive, removing lots more latency and also dramatically increasing the amount of bandwidth between the host and the drive.
So NVMe is fast. Really, really fast. Why shouldn’t we use it anywhere and everywhere? There are some limitations in modern server architecture that make that impossible. First, PCIe devices get their speed by being connected directly to the CPU. Each 9100 drive uses 4 PCIe Gen3 lanes. Consider the popular Intel E5–2600 v4 (Broadwell) family of CPUs. These max out at 40 PCIe lanes per CPU. In a dual CPU server that means no more than 80 PCIe lanes in a server. Subtract lanes for Ethernet and other attached peripherals, you are generally limited to no more than 40 lanes for storage. That means no more than 10 NVMe drives in any given server. Accounting for the power consumption of modern NVMe drives (excepting the Micron 7100 series low power NVMe drives) also restricts the number of devices per server. The bottom line is that, with one notable exception, no server vendor is shipping more than four NVMe sockets (U.2, 2.5” form factor in the front of the server) in any server. (One vendor DOES have very cool 24 drive systems for NVMe, but they use PCIe switches, which means far less bandwidth available to each NVMe drive).
So we are limited, essentially, to four really fast drives per server. What can we do with these? Here are some use cases:
- Video transcoding. Think about all of the videos that are on the internet. Now that you have explored the “cats of YouTube” think about how many people watch video on different devices. Today we are seeing players new and old in the video and content space providing various formats for various devices. For example, watching a cat video on your tiny smartphone screen over the cellular network is very different from selecting that same incredible piece of cinematography from your cable company on-demand to watch on your 65” 4K LCD TV. Creating the different video formats is very intensive. It requires lots of storage bandwidth, and I/O latency dramatically slows the process down. This is a perfect use case for NVMe drives.
- Real-time data capture. This occurs in research and other areas frequently, so think about pharma or oil & gas exploration, where tools (like microscopes) or sensors are rapidly collecting data. Storing that information quickly while always being ready for the next data to flow is important, as is very low latency retrieval of reasonably (1TB+) large datasets for analysis. Lower latency storage here can mean better utilization of extremely expensive pieces of equipment and dramatically quicker results from data processing.
- Caching or filesystem journaling. Many larger distributed systems that rely on the network leverage local cache to provide rapid access to data and to dramatically improve system performance. For example, CEPH systems use what is known as a journal (remember this, you will see it again) which is a high speed storage location that absorbs writes so they can be acknowledged to the client before being de-staged to slower storage media, potentially across a network hop. VMware VSAN does this as well, but it doesn’t guarantee the writes will occur without a network hop — but it still benefits greatly from the high bandwidth and low latency of NVMe drives in the write-buffer tier. Other applications include Datagres for server-side data locality, Microsoft Storage Spaces caching tier, and dm-cache on Linux, which allows for NVMe to act as a cache buffer in front of any block storage device, local or remote over the SAN.
- Database journals. I said you would see journaling again and here it is. In the (so called legacy) database world, database servers absorb writes via a journal, which are then committed to the full database after the client is acknowledged. It is best practice to use the fastest, lowest latency storage available for journal volumes. While many databases continue to leverage SAN storage, local, single instance databases have made a huge comeback. Microsoft SQL server, for example, has been focused on single-server DBs for quite some time. This doesn’t apply in cluster situations (like Oracle RAC), but when databases are protected with log shipping between a primary and standby host, NVMe for journals can have a massive positive impact on performance. Even in mirrored databases, where network latency is generally far greater than storage latency, reduced time to commit to the log on the mirror server can have a significant impact on overall system performance.
- Databases. Didn’t we just talk about databases? Well, we did, but that was about database journals, using NVMe as the journal volume in front of databases living on other storage. While there are many large databases out there, many others are of sizes that are manageable and certainly below the maximum capacity of the 9100 (3.2TB today). In these cases, again where HA is handled via log shipping between servers, moving an entire database (likely without a separate log volume) to NVMe can provide massive performance benefits.
There are many more use cases for NVMe, but these should get you started. Keep looking for good opportunities where bringing the data closer to the CPU and dramatically reducing the time it takes to read and write into storage can make huge application-level performance gains. Remember that we live in interesting times; 4TB of storage and ¾ of a million IOPS used to take multiple cabinets and was hindered by latency at every step. Today we can do all of that in a single device and with latencies in the order of 30µs.