DESIGN TOOLS

Invalid input. Special characters are not supported.

SSD

Inference = IOPS: Staying ahead of the curve with the Micron 9550 high performance SSD

Ryan Meredith | June 2025

Inference will become the most common data center workload, full stop. With the NVIDIA H100 becoming ubiquitous in the data center and the NVIDIA DGX B200 launching into non-NVL72 systems, we’re seeing an explosion of compute capability.

If you look at the bandwidth scaling of PCIe® generations versus the increase of compute, PCIe has increased by eight times from Gen3 to Gen6 while GPU FLOPs has increased by 37.5 times in the same time frame.

We’ve also seen the average FLOPS of training clusters increase by 905 times over the past four years with the datapoints in training datasets increasing 2,500 times in the same time frame.

gpu vs pcie
Graphic user

While inference has been and will continue to be a compute intensive workload, its reliance on fast storage is emerging quickly. Reasoning models will drive a massive increase in LLM usefulness, accuracy and resource requirements. Increased sequence lengths are driving innovation into the design of LLM systems where it is becoming efficient to store KV cache to disk instead of flushing and recomputing it. This is going to drive higher performance requirements into the GPU local systems enterprises will use for inference.

We’ve had our eye on this trend for some time and have developed an extremely high-performance SSD in the Micron 9550. High IOPs and power efficiency complement these emerging workloads.

As an example, we tested the Micron 9550 versus a leading competitor with Microsoft DeepSpeed ZeRO-Inference and found that reads are 15% faster with 27% lower average power, resulting in 37% less SSD energy used and 19% lower total system energy used.

While they are a small component of inference workloads, writes show a stark difference between SSDs. The Micron 9550 is 78% faster while using 22% less average power. That means the Micron 9550 uses half of the energy to complete the inference job and the total system ends up using 43% less energy.

With the explosion of compute and amazingly useful innovations in inference, storage will need to keep up. The development of data center SSDs takes a long time; NAND manufacturing, ASIC design, power, thermals, etc., are critical to the end performance of the storage in an AI system. Micron has been testing AI workloads for years now as part of our development of the Micron 9550 and the rest of our current generation data center SSDs. We know that making the right drives for the AI workloads of tomorrow requires us to be ahead of the curve today. 

Deep speed zero aio reads and writes
Test details:

DeepSpeed ZeRO AIO reads — Simulates synthetic workload from within the DeepSpeed Libraries from the GPUs.
Test system: 2x Intel Xeon Platinum 8568Y+, 768GB DDR5 DRAM, 2x NVIDIA L40S GPUs
Competitor is a PCIe Gen5 high performance data center SSD, similar in spec and target use case to the Micron 9550.
Data was generated from 850 test runs taking 446 hours.

Director, Storage Solutions Architecture

Ryan Meredith

Ryan Meredith is director of Data Center Workload Engineering for Micron's Core Data Center Business Unit, testing new technologies to help build Micron's thought leadership and awareness in fields like AI and NVMe-oF/TCP, along with all-flash software-defined storage technologies.