The history of high-performance computing (HPC) can be viewed as a battle between compute and networking – that is, moving data between the untold thousands of processors in a modern supercomputer and the storage system. The goal is to have the whole HPC system in balance so processors are never waiting for the data they crave or for other parts of the network to complete their tasks. There’s no point in making more powerful processors or faster SSDs if the network can’t keep up. This is especially true when it comes to cutting-edge GPUs, which are indispensable these days for training AI models and running complex simulations.
My guests for this interview, Barton Fiske from NVIDIA, who looks after HPC alliances, and Wes Vaske, principle storage solutions engineer at Micron, have made it their mission to restore balance to the HPC force.
They’ve been working together on a new NVIDIA technology called Magnum IO™ GPUDirect Storage, which is a really clever way to overcome data movement bottlenecks increasingly found in our traditional computing architecture. Today, data destined for the GPU has to go through a CPU. Magnum IO™ GPUDirect Storage cuts out the middleman and provides a direct data path between storage and GPU.
The result is considerably higher performance with just a few lines of code. Wes was lucky enough to get hold of a mighty NVIDIA DGX system stuffed with GPUs, which he connected to a chassis loaded with Micron 7300 NVMe SSDs and fast Ethernet switches. You’ll have to listen to the interview to find out just how fast this setup is, but suffice it to say, it’s fairly awesome.
For a deeper dive, sign up to watch this webinar which explores where these complementary technologies will have the greatest impact on emerging AI workloads.
To get started with Magnum IO™ GPUDirect, visit developer.nvidia.com/gpudirect.
Learn more about Micron’s SSDs at micron.com/data-center-ssd.