As all of us know, Big Data is driven by the V’s—Volume, Variability and Velocity. With the advent of persistent memory, leading users are borrowing a page from high-performance computing (HPC) systems to handle Velocity: Burst Buffers are back!
One of the major challenges within both Big Data and HPC systems is persisting data quickly. These systems have to spend quite a bit of time writing to disk or SSDs—to make sure that data is retained.
Classically, HPC systems had large, parallel storage systems. While they could handle many reads or writes in parallel, it took a while to write any one piece of data. This is very analogous to how Big Data systems achieve their speedup—by dividing blocks of data across lots of disks. Likewise, writing any one piece of data will take a while.
A burst buffer is just a low-latency, persistent storage layer that is inserted between the compute and the large, parallel back-end storage system. How they work is pretty simple. Data is first persisted in the burst buffer, like a staging area. The data is then moved, transparently, to the slower back-end storage subsystem.
The first burst buffers were built with SSDs. Now, with the announcement of NVDIMM-N, system designers were excited to give them a try—especially since it is a standardized part supported by major OEMs. One question that comes up is capacity. For this application—persisting data arriving at high velocity—the existing 8 and 16 GB sizes are all that is required; it is just a staging area.
Sophisticated users might have heard of NVDIMMs before. Micron’s NVDIMMs are based on the -N standard, meaning they write to DRAM chips. Only in the event of a power failure does the controller store their data into NAND chips (used in SSDs).
Micron’s approach gives two benefits. First, performance is a hair away from DRAM speed—consistently. Second, by using DRAM, the ratio of time required for reads and writes is 1:1. In other approaches, NAND chips are used. NAND chips have closer to a 20:1 write to read ratio. This matters in our application in which we want to minimize the time to write data.
NVDIMMs can be used either as storage or memory. System designers can use them as block storage devices with no application-level changes. To gain the extra speedup from using them as a memory device, sophisticated users will need to modify their application using the pmem library.
I like to say that the proof is in the pudding. With that in mind, I would like to direct interested users to a video by Microsoft. They have recently enabled NVDIMMs in Windows for SQL Server 2016. This video shows writing to NVDIMMs in both block and memory mode.