It is an exciting time to be working in storage. We are on the cusp of a disruptive change in the IT industry. It revolves around how artificial intelligence (AI) will change how we architect and build servers, and what we expect computers to do for us. There is tremendous buzz in the industry and the public around generative AI. The emergence of ChatGPTTM earlier this year captured imaginations around how a computer could understand our natural language questions, carry on a conversation with us on any subject, and write poems and rhymes as if human. Or the various image-generation AI models that can create stunning visual masterpieces based on simple text prompts given by the user.
The rapid emergence of AI is creating considerable demands for higher bandwidth memory, HBM. HBM solutions now have become more desired than gold. Large language models (LLM) are driving demand for larger capacity memory footprint on the CPU to support even bigger, more complex models. While the importance of more memory bandwidth and capacity are well understood, often forgotten is the role of storage in supporting the growth of AI.
What is the role or importance of storage in AI workloads?
Storage will play a vital role in two areas. One is the local, high-speed storage that acts as a cache for feeding training data into the HBM on the GPU. Because of the performance needs, a high-performance SSD is utilized. The other key role of storage is to hold all the training datasets in large data lakes.
Local cache drive
LLMs are training on human-generated information found on the web, in books and related dictionaries. The I/O pattern to the training data on the local cache drive is structured and is mainly reading of large data blocks to prefetch the next batch of data into memory. Hence, for traditional LLMs, the SSD’s performance is not normally a bottleneck to GPU processing. Other AI/ML models, such as computer vision or mixed mode LLM+CV, require higher bandwidths and challenge the local cache drive.
Graph Neural Networks (GNN) are often used for product recommendation/deep learning recommendation models (DLRM), fraud detection and network intrusion. The DLRM is sometimes referred to as the largest revenue generation algorithm on the internet. Models for the training of GNNs tend to access data more randomly and in smaller block sizes. They can truly challenge the performance of the local cache SSD and can lead to idling expensive GPUs. New SSD features are required to ease this performance bottleneck. Micron is actively working on solutions with industry leaders and is presenting some of this work at SC23 in Denver, where we will demonstrate ways for the GPU and SSD to interact to speed up some I/O intensive processing times by up to 100x.
AI data lakes
For large data lakes, large-capacity SSDs will become the storage media of preference. HDDs get cheaper ($/TB) as they get larger capacity, but they also get slower (MB/s / TB). HDD capacities larger than 20TB will truly challenge the ability of large data lakes to power-efficiently source the type of bandwidth (TB/s) needed for large AI/ML GPU clusters. SSDs, on the other hand, have plenty of performance, and, in purpose-built forms can deliver the required capacities at lower power (8x lower Watt/TB) and even lower electrical energy (10x lower kW-hr /TB) levels than HDD. Those savings leave more power in the data center to add more GPUs. Today, Micron is deploying its 32TB high-capacity data center SSD into numerous AI data lakes and object stores. Capacities for 15-watt SSDs that can individually deliver several GB/s of bandwidth will scale up to 250TB in the future.
How will AI affect NAND flash storage demand?
First, all training of new AI/ML models require data from which to “learn.” IDC estimated that starting in 2005, the amount of data generated every year exceeded the amount of storage purchased each year. That means that some data must become ephemeral. The user must decide on its value, and whether the value of keeping the data exceeds the cost of buying more storage to retain it.
Machines – cameras, sensors, IoT, jet engine diagnostics, packet routing information, swipes and clicks – now generate several orders of magnitude more data in a day than humans can. Machine-generated data that humans did not previously have the time or capacity to analyze can now be especially useful to AI/ML routines to extract useful and valuable information. The emergence of AI/ML should make this data more valuable to retain and hence grow the demand for storage.
This training data is stored in AI data lakes. These data lakes exhibit characteristics of higher-than-normal access density to feed a growing number of GPUs per cluster while simultaneously supporting a high mixture of ingestion and preprocessing. There is also a lot of re-training on the data such that there is often little “cold” data. That workload characteristic is much better suited to large-capacity, power-efficient SSDs than traditional HDD-based object stores. These data lakes can be quite large – hundreds of petabytes – for computer vision, such as autonomous driving or DLRM. As these data lakes grow in capacity and number, that will generate a large growth opportunity for NAND flash SSDs.
As AI models evolve and expand, NAND flash storage will become increasingly critical to maintain their exponential growth in performance.