logo-micron

Add Bookmark(s)


To:

Email


Bookmark(s) shared successfully!

Please provide at least one email address.

Microchip background

Get 36% Faster Hadoop – Without Adding Servers

Get 36% Faster Hadoop – Without Adding Servers

Add One 9100PRO to HDD-Based Nodes, See 36% Average Faster Run

When IT needs better Hadoop performance from their HDD-based nodes, they typically have two options:

They can add more nodes to the existing cluster. This may help meet performance goals, but the incremental cost of the added nodes may be prohibitive.

Alternatively, they can replace the current cluster nodes with new ones. Rebuilding with higher performance nodes may meet performance goals, but the cost is higher still.

Now there is a third choice that enables better performance and economics: add Micron’s 9100PRO NVMe SSD to existing HDD-based cluster nodes.

This blog post shows how adding a single Micron 9100PRO to each existing Hadoop node (10-node cluster) and making a slight change in YARN’s resource localization provides 36% average reduction in benchmark runtime (over 10 test runs) and is far more economical than adding more nodes to achieve a similar improvement.

Reduce Benchmark Runtime. Keep Your Cluster Investment.

Each Hadoop distribution comes with a set of standardized, built-in benchmarks. These benchmarks enable broad range performance measurements across technologies and deployments.

The first configuration used standard HDDs for both the YARN cache and HDFS. The second used the same hardware – adding one 9100PRO and configuring YARN caching to use the 9100PRO instead of the HDDs (we still used the HDDs for HDFS in the second configuration).

The results are in Figure 1.

 

Figure 1: Benchmark Completion Time (Lower is Better) 

When we used the 9100PRO as the YARN cache, we saw a 36% reduction in benchmark completion time.

Add the 9100PRO for Better Economics (Than Expanding the Cluster)

After we saw the above improvement, we also wanted to understand if adding one 9100PRO to the existing cluster nodes was more economical than expanding the cluster (adding more all-HDD nodes) to achieve a similar runtime reduction.

Our goal was to see how far we had to expand the all-HDD cluster to approximate the benchmark completion time for our 9100PRO YARN cache cluster (3518 seconds average across 10 runs), then analyze the cost.

We added 2 more nodes (12 total) and repeated the tests. The 10-run average was faster (4,353 seconds mean), but fell short. We added one more node (13 total) and again ran the same tests. This run was very close.

Table 1 summarizes the results:

 

Table 1 also shows that it takes 13 all-HDD nodes to provide similar performance to one 10-node 9100PRO + HDD cluster. Table 2 shows the additional advantage of cost savings of adding one 9100PRO to each node.

When we ran these tests, the manufacturer’s suggested retail price (MSRP) for the all-HDD cluster nodes (as configured in these tests) is just over $12,000 each. The price for one 9100PRO (2.5” form factor, 3.2TB capacity) is just under $2,500.

Adding the 9100PRO to our existing cluster is more economical than adding cluster nodes to reach similar performance.

Summary

When IT needs more from their Enterprise deployments, they typically weigh several options. They may consider acquisition and recycling costs, performance benefit, and deployment time among a host of others.

Distributed systems like Hadoop typically offer two options for improving performance: Add more of what you already have (cluster expansion) or replace what you have (decommission and build new).

With Micron’s 9100PRO NVMe SSD there is another, more attractive option: add one 9100PRO to each existing cluster node and make a small change to YARN resource allocation (to use the 9100PRO as the YARN cache).

The results are compelling.

Adding a single 9100PRO to each node (10-node, all HDD) in a cluster reduced standard benchmark runtime by 36% and costs less than expanding the existing cluster (to reach a similar benchmark runtime reduction).

How We Measured These Results

You can see additional details about our results here. We used the standard benchmarks and built it to most Hadoop distributions, running the set 10 times and recording the mean completion time. Table 3 shows the benchmarks we used and their parameters.

About Our Blogger

Doug Rollins Doug is a Senior Technical Marketing Engineer for Micron's Storage Business Unit, with a focus on enterprise solid state drives.
Login or Sign Up Now for an account to leave a comment.