Historically, one of the major challenges for data scientists has been providing CPUs with data fast enough to reduce idle times and fully utilize these expensive resources. CPU idle time is not only inefficient, it's detrimental to getting real-time, actionable results. Attaining the benefits of real-time analytics requires the faster storage incorporated in the Micron Accelerated Hortonworks Data Platforms.
How It Was Built and Tested
The test environment utilized one node running KVM to virtualize the servers running the NameNode, Secondary NameNode, Resource Manager, Zookeeper, Hive and the Ambari server. The datanodes were four servers. The network switch was a 48-port 10GbE switch running Cumulus Linux 3.4.2.
The Hadoop cluster software consisted of a Hortonworks HDP 3.0 Hive database on HDFS/YARN deployed on two separate four-node clusters. The two clusters differed only in that one cluster used a group of 15K SAS HDDs and the second cluster used the same HDD configuration plus a single Micron 9200MAX NVMe SSD added to each node with the YARN cache redirected to use the NVMe SSD.
To ensure true measurement of the storage I/O, the database size-to-memory ratio was targeted at about 2-to-1 (2TB of data with an aggregate cluster memory of 822GB available after operating system overhead).
Why Hortonworks Data Platform 3.0
Hortonworks Data Platform (HDP) is an open source framework for distributed storage and processing of large, multi-source data sets. When intelligently integrated with Micron SSDs, HDP 3.0 provides drastically improved database query performance - enabling faster time-to-insights more cost-efficiently than traditional Hadoop infrastructures.
Key Benchmarks and Benefits
- 1.7x overall average improvement in TPC-DS benchmark completion query times
- Elimination of CPU I/O wait times during TPC-DS benchmark queries
Micron IT Hadoop Case Study
The great results in our testing led Micron IT do deploy the configuration in our real-world cluster used for manufacturing efficiency analytics, resulting in much more performance for minimal costs. Read the blog here.