Big data has evolved rapidly. Very recently we thought of ‘big data’ very differently. I’d venture a guess that many reading this remember when big data was synonymous with thoughts like: ‘really slow’ or ‘eventually you might get something out of it’ or even ‘…just don’t be in a hurry...’ We used to build our big data platforms with those thoughts in mind – we’d use huge, rotating disk drives and we’d need patience…..lots of patience. If our customers were fine with ‘slow’ and ‘eventual results’ then we didn’t usually put big data and performance in the same thought, certainly not in the same sentence.
But in a short time we’ve seen a real transition – our customers aren’t OK with slow. Big data expectations are no longer about ‘eventual results’ – they are about immediate real-time value. That got us thinking about how flash can help big data – we found that using flash and virtualizing Hadoop made a big performance difference.
Virtualizing Hadoop Is The Way To Go
Prior work done by our internal engineering team (linked to above) show that virtualizing Hadoop is an economical way to significantly boost cluster performance. When we virtualize Hadoop, we get a greater number of Hadoop instances per U of rack space, so we also get more processing tasks per U of rack space. In the prior study, we set up four KVM-based virtual machines per node and loaded a Hadoop distribution on each. This was the ideal performance vs density configuration and made the arithmetic easy – four times as many instances per U, four times as many processes per U – all done on the same hardware. We knew virtualized Hadoop was the right direction.
What Can The Micron M510DC Bring?
That initial study was performed using the Micron M500DC – a moderate capacity SATA SSD designed for more write-intensive use. Since then we’ve introduced a larger capacity SATA SSD (Micron M510DC) focused more on read-traffic – and with a lower price. That made us wonder: what if we compared 2 virtualized Hadoop clusters to determine the performance improvements the SSD would bring over hard disk drives?
We wanted to make a fair comparison, so we used standard benchmarks built into Hadoop distributions. We used a shell script to run these in sequence and we considered one complete sequence (all benchmarks executed once) as a single benchmark run. To ensure prior runs did not affect future results, we deleted the data from each run, restored the systems to a fresh ‘out of the box’ state before the next run launched. We repeated this 10 times with a brief pause in between each. Below is a list of the benchmark used and what each does:
The Results Are In
So…what happened? The Micron SSD (M510DC) based cluster was the hands down winner, delivering standard performance benchmarks 2.6x to 15x faster than the 15K HDD. In the chart below, each benchmark is shown on the left and the time to run it is represented by the horizontal bar. Shorter bars mean faster run times, so shorter bars are better.
There are at least 2 things we learned and I hope you find them useful as well. First – when we increase demands on existing processes, applications, or technologies we have to throw out convention and think of new solutions. We all ‘know’ that virtualization can slow down performance, don’t we?
Except with Hadoop. Second – when we throw out that wisdom (it may not be so ‘wise’ after all), we should also spend some time thinking how newer technology may apply – even if that technology isn’t normally applied. Using SSDs with Hadoop isn’t as common as it potentially could be – but we found that when we need performance from Hadoop, virtualization is a real enabler and virtualization with Hadoop relies on high performance SSDs for great results.
Want to learn more? Visit the Micron big data and analytics webpage. If you have questions about the testing we ran, connect with me on Twitter @GreyHairStorage and connect with Micron storage on Twitter @MicronStorage and LinkedIn.
About Our Blogger
Doug is a Senior Technical Marketing Engineer for Micron's Storage Business Unit, with a focus on enterprise solid state drives.