My last tech tip, Hadoop + SSDs Don’t Create Great Value—Yet, generated quite a few discussions, which social media experts tell me is a good thing–and I agree. I think it’s important that we have honest conversations about all use cases for SSDs and Flash in the enterprise. This way, we can decide where SSDs are actually adding value and where work still needs to be done for that value to be truly recognized. That is why we’re known as a trusted advisor for our enterprise customers: We provide them with a “voice of reason” about where and how Flash is best used in the enterprise.
Hadoop and SSDs Revisited
I’d like to clarify some key points from my previous post on Hadoop and SSDs:
- SSDs may not currently (emphasize currently) add tremendous value to a Hadoop distributed file system (HDFS), but HDFS is only one option in the big data ecosystem. HDFS I/O accesses today tend to be very large and sequential while SSDs excel at small, random I/O accesses—which can be tremendously valuable in other areas.
- We all have an exciting and important opportunity in front of us: how do we rethink some of our assumptions about certain elements of the big data ecosystem to better take advantage of SSDs? Note that I’ve expanded the discussion a bit to encompass the whole ecosystem—not just one part of it.
What’s the real key driver here? I think it’s simple; because SSDs are here, they are real, and they can offer incredible value, we need to architect our applications to take advantage of their unique characteristics.
How SSDs Bring Value to Big Data Right Now
Here are two examples of areas (outside of HDFS) where SSDs are already excelling in today’s big data ecosystem:
To best mine our results, we need fast accesses that may be very small and random—a workload where SSDs excel. Many big data users are placing query results on SSDs, which exploits the SSDs’ incredibly fast, small I/O, random performance to learn more about the data, and learn it faster.
HDFS Node Startup
SSDs can bring value as a node startup device as well. Consider how much denser we could make our HDFS clusters if we booted the nodes without occupying a drive slot. Our mSATA and M.2 SSDs (with appropriate system designs) do exactly that.
Today’s big data ecosystem clearly benefits from SSDs in certain areas. Now imagine if we start afresh and are able to architect each element of that ecosystem to take full advantage of SSDs? While that day may not be today, I’m confident it will be soon. SSDs offer incredible value today and for the future—it may just take a bit more software development to fully exploit them.
How do you see SSDs benefiting the big data ecosystem of today and tomorrow? Please leave your comments below so we can continue this discussion and shape the future of SSDs.