Deep learning with Four (bits per cell)
When we designed our Quad Level Cell (QLC) SSD – the Micron 5210 ION SSD – we had a good idea we were breaking new ground. I’ve worked with SSDs for over a decade and have seen their adoption grow and usage change. A lot. And the Micron 5210 SSD isn’t ‘just the next SSD’ from Micron. It’s different. It opens new applications and workloads to the benefits of SSDs.
We’re seeing workloads change (more rapidly than ever). Some of the most important workloads - the ones at that really push the limits of human design and machine ability - have changed radically. These promising workloads read data quickly (they don’t rewrite it repeatedly).
Emerging workloads and their adoption have driven changes in SSDs, and those changes have driven broader adoption into more workloads and applications. A virtuous cycle.
We designed our Micron 5210 SSD to meet the growing demand for cost-effective storage focused on read performance. The Micron 5210 SSD is the industry’s first SSD built on quad level cell (QLC) NAND technology, delivering fast capacity for less. It’s optimized for read-intensive and performance-sensitive workloads like deep learning (DL).
Until now, in the name of cost efficiency, these read-intensive workloads have often been shackled to the slow performance of hard drive technology developed 50 years ago. The Micron 5210 SSD’s more approachable price point is changing that.
Designing a new drive and running benchmarks in the lab is one thing, but putting it through a real work test is a completely different ballgame. We decided to reach out to the experts.
Engaging with Experts: AMAX® Puts the Micron 5210 SSD to the Deep Learning Test
We were thrilled to work with the AMAX Total Computing Solutions (AMAX) team on a deep learning project to investigate what the Micron 5210 SSD can do. Headquartered in Fremont, California, AMAX is well-recognized in the deep learning community (see AMAX's website for more details). Questions we wanted to answer included:
- What are the exact requirements for deep learning-specific storage systems, and what are the shortcomings of existing solutions?
- How would a deep learning-optimized storage solution featuring the Micron 5210 SSD look like?
- How does a Micron 5210 SSD-based storage solution perform in a real world deep learning environment against HDD and NVMe based systems?
AMAX engineers put the drives to the test - they plugged them into their deep learning storage solution, hooked it up to their internal GPU cluster, ran deep learning trainings and benchmarks, and compared them to local NVMe. The results were impressive. So impressive that AMAX engineers decided to equip their StorMax™ NFS solution with Micron SSDs. Below are the test results.
StorMax™ NFS Featuring the Micron 5210 SSD: Simplify and Accelerate Deep Learning
AMAX just published their findings in a new white paper, Deep Learning Performance and Cost Evaluation - Micron 5210 ION QLC SSDs vs 7200 HDDs, in which they focus on simplifying DL through centralized storage while accelerating the process (over typical legacy designs). In the paper, AMAX notes:
“Near-instant access to training data is critical for most deep learning (DL) workloads to ensure that training durations are not negatively affected by data transfer times…”
AMAX also focused on the price versus performance advantages of centralized Micron 5210 SSD-based storage, noting that:
“New QLC enterprise SSDs offer a compelling means to reduce costs, as QLC NAND stores 33% more bits per cell and delivers similar read-performance as traditional TLC-based SSDs. Because the Micron 5210 ION SATA SSD family is targeted as an HDD replacement option, this test compares the results of a 5210 ION deployment to a HDD deployment, while also putting the results in context with TLC-based NVMe all-flash configurations, which carry a significant cost premium.”
In their paper, AMAX compared three common DL configurations: local NVMe SSDs (installed inside the server), remote Micron 5210 SSDs in a centralized NAS, and HDDs in the same configuration. According to AMAX:
“Common practice is to use local NVMe SSDs as a data cache, while the storage back end is a traditional NAS solution using hard disk drives (HDDs). Data is streamed from the back end into the local NVMe for CPU-proximity processing. To update training data, only the back end is updated. Depending on the nature of the update, this process may be lengthy due to limited HDD ingest rate”
AMAX notes the centralized (‘remote’) storage (NAS) can offer several benefits (lower cost, less complexity, etc.). Local comparisons have each storage device installed server-local; remote storage configurations in their paper look like this:
Accelerate Deep Learning - Local Storage Results
AMAX first measured DL training time (in seconds/epoch) with each drive installed locally. They found that the training time for NVMe and SATA was about equal. Using HDDs, they found that the same training took about 11x longer.
Simplify Deep Learning – Remote Storage Results
In their results analysis, AMAX noted:
“In the studied configuration, the SSD based remote storage array exceeds the performance of a local SATA SSD and is on par with a local NVMe solution. Test results indicate commonly used 1GbE, 10GbE and even 25GbE storage network may not be sufficient to fully support the performance of the 5210 ION-based storage solution. Given the test results, an integration of the storage solution into an EDR Infiniband or 100GbE compute fabric or attachment via separate EDR Infiniband/100GbE storage fabric is recommended.”
AMAX Findings on Simpler, Remote Storage - Micron 5210 SSD versus HDDs
Finally, AMAX found four major advantages the Micron 5210 SSD brings to their remote storage configuration, noting that:
“After comparing a 64TB all-flash NAS array built on new QLC SSDs (Micron 5210 ION) to that of a 64TB 7200 RPM HDD-based solution, we find:
The Micron 5210 ION QLC-based storage array is well suited for DL workloads…the Micron 5210 use of QLC NAND lowers the cost per GB, making it a more attractive option for this use.
The QLC SSD NAS solution performs significantly better during normal operation and in single-bit parity RAID mode during the rebuild of a degraded volume…the QLC SSD-based NAS array can reduce training times, increase GPU utilization, increase development efficiency and streamline the development processes through fast centralized shared storage.
We recommend integrating SSD NAS storage solutions into a 100G high-speed compute fabric.
The performance of the evaluated 64TB QLC SSD storage solution is on par with local NVMe storage.”
How Can the Micron 5210 ION SSD Help You?
We’re thrilled to see the amazing benefits AMAX found! Speed, economy, and simpler deployments all show the long-term value of the Micron 5210 SSD. Capacity, affordability, and AMAX’s measured results are a winning combination in deep learning! How can the Micron 5210 SSD help with your deep learning projects?
If you’d like to learn more:
Chat with me about deep learning on Twitter @GreyHairStorage.