As I've been exploring infrastructure for AI and Machine Learning systems, I've found a strange lack of performance data that discusses the underlying storage or memory in any way. The conversations are instead dominated by the different compute resources available (GPUs, CPUs, FPGAs, TPUs, etc.).
At first this was concerning to me as I primarily support our storage products by providing relevant solution engineering validation and information. As I began running my experiments, however, I discovered that the lack of performance data discussing the non-compute infrastructure wasn't due to a lack of necessity, but to it just being ignored as a problem.
Here’s some data to provide a bit of context.
This chart shows the performance of different AI training systems for a specific model and dataset. (Specifically, this is the ResNet-50 Image Classification model being trained on the Imagenet dataset, a dataset of 1.2 million images that's around 145GB in size.)
Going back to 2014, we see that the storage in an AI training system needed to provide only about 50 MB/second of disk throughput to feed the 8x top-of-the-line (for the time) GPUs. And, as much as I'd like you to use SSDs for every workload, it'd be a stretch to say flash drives were required to support this use case. 50 MB/second is pretty trivial.
Move forward a generation from the K80 GPUs to the P100s, and we see a significant increase in storage requirements−from 50MB/second to 150MB/second. While that increase was large, it still wasn't cause for concern - 150MB/s may not be trivial for HDD-based systems, but it doesn't present any real architectural challenges.
However, the latest generation (along with further software optimizations) have pushed things into new territory. That same model - ResNet-50 - processing the same dataset requires nearly a Gigabyte-per-second of storage throughput to keep the GPUs running at max efficiency. An HDD-based system has a hard time meeting those requirements.
So, now it kind of makes sense why we haven't been talking about storage performance when we talk about AI systems - it hasn't been necessary to do so until recently. Additionally, if the trend continues (and we have no reason to think that it won't) the future is going to rely our ability to architect storage systems that can manage the requirements of the next-generation GPUs.
Alright, we can agree that storage performance is important - but how important is it? What is the actual impact of improperly architecting storage (and memory) for our AI systems?
To answer those questions, I ran some additional experiments to try to shine some light on the issue. The following data was run using the same model and dataset as above - Resnet-50 trained against the Imagenet dataset. The hardware was a dual Intel® Xeon 8180M server with 8x Nvidia® V100 GPUs. Each GPU had 32GB of memory, the system had 3TB of memory, and my storage was 8x 3.2TB Micron 9200 NVMe™ solid-state drives in RAID10.
I tested the impact of two variables, memory amount and disk throughput. Each of these variables was adjusted by changing the appropriate docker container parameters (mem_limit and device_read_bps).
For memory, the container either had all memory available (3TB) or a smaller amount of memory that resulted in only half of the dataset fitting in the filesystem cache after the system was at steady state (128GB).
For storage, the container either had unlimited access to the NVMe storage, or it was limited to 500 MB/s of throughput. This number was selected as it is roughly half of the peak throughput observed (1.2GB/second) and corresponds to the sorts of disks available to GPU instances from the various cloud providers.
The results shouldn't be surprising. If the storage in the AI system wasn’t able to keep up with the GPUs and there wasn’t enough memory to cache the dataset, then the system performance was seriously degraded. Thankfully, this is a problem we can solve. Though you'll get maximum efficiency out of your AI system with loads of memory and very fast disks, just having fast disks or just having dense memory will get you much of the way there.
The last set of experiments I'll discuss here were around the impact of GPU memory on training performance. These tests were run on the same hardware as above (8x v100 GPUs), but I scaled the batch size (number of images sent to a GPU at one time for processing) as well as the algorithm 'complexity' (number of layers in the ResNet model).
Each line represents the training throughput in images per second for a specific model. Once a batch size becomes too large, there just isn't enough memory and the application will crash (shown above as where the lines end).
There are a couple things we can take away from this chart. The first and most obvious is that throughput increases with batch size. The chart describes training specifically, but the same behavior is seen for inference. Bigger batches increase the throughput.
The next takeaway is that the maximum batch size is dependent on the model complexity. As a model gets larger and more complex, the model's weights take up more of the GPU memory space, leaving less space for the data. Depending on your specific use case, it's possible to push the model complexity far enough that the model can't be trained at all, even limiting the model to a batch size of 1. This will be especially apparent when deploying models to Smart Edge or IoT devices that generally have much lower memory capacities than the GPUs I used here.
To sum it all up, when designing AI systems there are three main components you'll want to account for:
- Storage performance
- System memory density
- GPU/Accelerator memory density
Storage performance and system memory density are critical for getting the most performance out of your system, and GPU/accelerator memory is important for performance and enabling future model development.
For more details: Learn more about aligning memory and storage with specific AI and machine learning models. Watch the webinar on-demand: AI Matters – Getting to the Heart of Data Intelligence with Memory and Storage, featuring me, Chris Gardner from Forrester®, and Eric Booth from Micron’s Compute and Networking unit.