DESIGN TOOLS
applications

AI and machine learning need high-performance storage

Micron Technology | February 2019

A mainstay of the academic, geoscience and government industries for many years, artificial intelligence and machine learning (AI/ML) are fast becoming mainstream as pragmatic enablers for commercial business. Understanding how AI/ML works and the challenges that this new compute function demands will decide who will maximize the value of their investment. In this blog and others to follow, I want to highlight what we think are the key hurdles to overcome and how Micron can help as you start your journey into the AI/ML world as a commercial enterprise.

As a quick overview, AI/ML has a workflow that consists of four major components (see figure): ingest, transform, train, and production/execution. In this blog, I will address the ingest and transformation phases of deploying an AI/ML based solution.

AI/machine learning workflow; transform, train, execute

The ingesting of data and transformation process is one of the most important steps in how quickly your AI system will be able to provide value. In many AI solutions, these steps can represent up to 80 percent of the entire AI execution process and has recently become a major focus of data science. To better understand why this is the case, it is important to understand what each of these steps entails.

data ingest and transform process

Ingest

Ingest is exactly what it sounds like. Data must be collected from various sources – many that are incompatible with each other in format – and stored in a way that the transformation process(es) can convert to a usable form to train the system. The training process is what makes AI "smart" and useful in the real world. AI processes, and the answers those processes provide, depend on massive amounts of data. The storage solution must be fast – otherwise the transformation of the data can take an exorbitant amount of time before it is usable. During the ingest process, data movement, relative to the repository, is 100 percent write but it is typically “Write Once”. The size of the ingested data can vary and is typically in unstructured object or file forms such as videos, images, documents, or conversation transcripts, among others, and is often located in disparate data lakes and other data sources. It is this variability in the data format that requires the transformation process that follows. The ingestion process relies on two major components: high-speed (high bandwidth) network connections and large, fast data repositories ... and when I say large, I mean LARGE! While we need large capacity to collect this data, it is even more important for the storage solution to be fast.

Transform

The transformation process is the first of three iterative processes that make up the AI solution, and is very likely the most impactful to the AI development. Because the ingested data most likely comes in a wide variety of sizes and formats, it is important that the data be normalized into a single format that is easily consumable by the training process that follows. In most AI solutions, the format that results from this transformation process will be one that supports the training and production engines selected. Today, this is typically the open-source (TensorFlow™) or other AI framework.

To transform data into this standard format requires an iterative process. This process is broken into three major steps: Preparing the data for conversion, converting the data to the target format (e.g., TensorFlow data format) and evaluating the resulting formatted data to identify unusable records. These steps are repeated on each set of data until all data is properly written in the desired target data format.

The speed of transformation of the data will depend on the quantity and quality of the memory installed in each compute node and the speed of the storage solution. The storage access during this phase is varied - unlike the previous ingest process - requiring both sequential and random access to the ingested data. This read to write ratio will vary depending on the target AI framework used and its resulting requirements for the standard data format for training. For most transformation processes, the worst-case scenario would be 50% reads and 50% writes, though this is very dependent on the data set being transformed. For example, when a data object is converted, every object is read and then written in the target format. If you are analyzing conversational data and pulling only the text of the data and removing all of the meta-data, then your read percentage will be more like 80%.

Analysis and Conclusions

So, why is Micron discussing AI solutions?

First, Micron is a premier manufacturer and provider of advanced memory and storage products, and our SSDs are the current standard for fast, responsive storage for massive amounts of data. Micron offers a variety of high-capacity, high-performance SSDs that are perfect for AI use, from our keenest price-performance SSD solution for enterprise read-intensive use cases - the Micron 5210 ION SSD, the first quad-level cell based SSD on the market - to our highest performance, class-leading, highest capacity commercially available SSD, the Micron 9200 Eco SSD (11TB). These are often used together in hot and warm tiered storage. We also offer storage class memory solutions that allow an additional layer of non-volatile storage performance that is faster by a factor of 10 than current SSD solutions.

In our testing with Red Hat® Ceph – a common Linux-based object storage solution for large data lakes/oceans – we are seeing scalable capacity solutions (scalable to petabytes of data) that offer some of the fastest Ceph performance experienced at write throughputs of 23 GB/s using four dual-socket 2RU storage nodes1.

Unlike HDDs, solid state drives can support massive bandwidth. We’ve seen how adding a small amount of flash to an existing Hadoop cluster boosts performance by as much as 36%.

Second, Micron's advanced DRAM solutions offer high-performance memory solutions that allow you to scale each compute server in your solution to help increase overall system performance during the transformation process. Our innovations in low power, high capacity memory for edge storage devices enables the AI/ML to be deployed out in the field. For example, Micron’s latest GDDR graphics DRAM also accelerates memory bit rates to 16 GBps.

Flash memory and storage let you get more data closer to processing engines for faster analytics. GPUs, a key enabler for faster processing, can handle millions of operations in parallel (CPUs use sequential order). Together, these Micron products provide the broad spectrum of high-performance components that are critical for advanced AI/ML and even Deep Learning solutions now being deployed in commercial solutions. 80% of the overall AI solution design and deployment process consists of the ingest and transformation steps. The faster your solution can obtain usable training data sets for your AI engine, the faster you will be able to deploy and benefit from this new technology to build smarter edge functionality.

1As stated in Micron Red Hat Ceph Reference Architecture published in November 2018. Your experience may vary.