If your data center applications are struggling to keep pace with user demand, you may be looking to SSDs to close the gap. While SSDs will generally offer a host of performance advantages over hard disk drives (‘HDDs’), not all SSDs are the same, and neither is the NAND flash inside them.
I recently did a webinar with Storage Switzerland’s Lead Analyst George Crump during which we discussed this very topic – how do we choose the ‘right’ solid state storage for data center applications?
“What is TBW and What is TLC?”
Doug: So I’ll answer them in order. TBW is Total Bytes Written, and I want to be really careful and not mix that up with Terabytes written. It refers to how much data one can write though the SSD interface before the SSD becomes worn out. The actual behavior of a worn out SSD would be determined by the SSD vendor. Typically what will happen is, when the drive is worn out, it will go into read-only mode.
The TBW value in the data sheet is a warranty figure. It doesn’t mean that all the SSDs of that family will go to read only at TBW plus one bit. Far from it. What it means is that, from a warranty perspective, the drive is guaranteed to tolerate a certain amount of data written through the interface, or a certain number of years – whichever comes first. It’s much like a car warranty: Your car is warranteed 10 years or 100,000 miles whichever comes first, the same model applies here - but it’s bytes through the interface.
Now the other one was TLC, that’s a relatively new one so it’s not really surprising that the question came through. If we look at SLC that’s Single Level Cell which means there’s one bit stored per cell. Then after SLC, we had MLC. The “M” means Multi-Level Cell, which specifically means two bits per cell. Arguably, we should have used a different acronym or some different letter, rather than “multi,” maybe. But we didn't, we’re stuck with MLC and it means two bits per cell.
So what’s TLC? TLC is three bits per cell. It’s a relatively new NAND technology, like I mentioned earlier. There are a few products out on the market, one of them sayys they’re enterprise class, which might be an interesting discussion to have at some point, but this relatively new technology that stores three bits per cell. Why wouldn't everyone go running to TLC? Well, as you see with the difference between an SLC part and an MLC part, it takes quite a bit of development and engineering to get the drive level stacks of an MLC drive up to that of an SLC drive. Because by it’s nature, MLC tolerates fewer program erase cycles than SLC before it’s worn out. TLC typically tolerates even fewer. The engineering uplift to get a drive based on TLC into the enterprise may prove a bit more difficult and challenging than people think. But, to answer the question directly, TLC means Three Bits per Cell or Three Level Cell.
“I thought flash wore out before hard drives, isn't that true?”
George: I think that sort of gets into that variability that we were talking about. With Flash we can kind of predict when wear out is going to occur. A hard drive might last ten years, or it might last 10 days, we just don't know. Like you said, it’s a scatter-graph. Please elaborate on that.
Doug: It is, but is it possible to wear out an enterprise grade SSD before you wear out an enterprise grade HDD? The very simple answer to that question is: Yes, it is possible. The difference here is, what do we get out of that drive while it’s in active use? For example: Let’s take an enterprise grade two and a half inch 15,000 RPM HDD. If we look at a small random metric like an 8k [data block size] and we’re going to read two for every one [block] that we write, we do about 450 to 480 IOPS, somewhere in there.
Now if we take a look at an enterprise grade SSD, roughly the same capacity, what we see is performance that would be two orders of magnitude better. That SSD in steady state will support 20,000+ IOs. So it makes you ask the question, what do I get for my money?
Well, what you get from the HDD is, over the course of a 5-year platform refresh cycle, the HDD at its 480 or so IOPS will generate about 15 billion IOs. The SSD, during the same 5-year life cycle, will reach about 800 billion IOs. The actual work performed can be measured by the IOPS executed during its useful life. SSD: 800 billion. HDD: 15 billion
Under that workload scenario, the SSD wears out more quickly. So, let’s look at it from another perspective. Let’s say, that I didn’t use my SSD to full performance capability and let’s say I wanted to make sure it lasted 5 years, how fast could I run it? And the answer to use the same two drives as a comparison, is about 8,000 IOPS, about 20 times the tangible work output by the SSD if you force it to last 5 years by slowing it down.
So you really get the best of both worlds here George. You can run this thing full-bore and get almost 800 billion IOPS before it’s worn out. And yes, that does happen before 5 years. Or, you can use the SSD a bit more conservatively, and if you meter your workload such that it’s going to last the full five years, you still get about 20 times the useful work output out of it [compared to an HDD]. So, is it possible to wear them out? Sure. But, you get a more useful life out of the product during the natural platform refresh cycle.
George: I think that goes to your point earlier, if you have a drive thats going to last you 25 years, and you refresh it in five, what difference does it make?
“How long do I have to use my SAS spin drive, to get 800 billion IOPS?”
Doug: The arithmetic is hard to do in your head, the answer is 92 years. If you run that hard drive 24-hours-a-day, 7-days-a-week, 365-days-a-year, to get to 800 billion IOPs, you’ve got to do it for almost a century. Now, I’m pretty sure if we ask folks, what is your refresh cycle, it’s shorter than 92 years.
So, again, we need to look at the tangible work output of a given device, during it’s expected life span. It’s kind of like a printer cartridge: If I have an ink jet printer and I put in a new cartridge, how many pages can I print? Well, it depends, if I print a really thin font, I can print a lot of pages. If I fill the page with ink, I’ll be able to print fewer pages, but it’s the same printer cartridge. It really depends on how the device is used. In the case of SSD vs. HDD while the analogy still holds, we get far more useful work output from that SSD.
“What happened to EMLC?”
George: EMLC was essentially a cross between SLC and MLC, and actually what we did is – kind of what Doug talked about – we slowed down the writes a little bit. Is thistechnology that we just don’t care about anymore?
Doug: Well, the problem is EMLC doesn't mean anything. Whereas SLC, MLC and TLC have come to a common meaning in the open market and amongst vendors and designers, EMLC never really had a concise and well defined meaning. And from a customer perspective you never knew what you were going to get. It could be just a plain old, vanilla client class MLC slowed down a little bit. Or it could be, a very special purpose built die, it could still be MLC but it has some additional benefits inside the NAND ware. So George, just because it basically meant nothing, the market just dropped that term.
George: Okay, give me a quick summary of what have Flash companies have done to make MLC last so long?
Doug: Several. Without using too many buzzwords, Micron has a suite of features we call X.P.E.R.T., which means eXtended Performance Enhanced Reliability Technology. All of which are firmware features or NAND features with hardware acceleration in the controller that extends the life. Things like, background adjustments on the NAND or the foreground adjustments on the NAND.
We even have a technology where if we have an out and out failure on the NAND we completely overcome that in our enterprise drive. So all of these technologies have been developed to bring the specs of an MLC-based enterprise SSD up to what we would expect only from a SLC, or the domain of an SLC.
George: Great, thanks again. You can listen to this webinar On-Demand, at your convenience, here.