Just back from SC12, the huge show highlighting huge computers capable of tackling huge problems while ringing up huge power bills. There, you have the highlights in a nutshell! :) Compared to shows like the Consumer Electronics Show, SC12 isn’t really that big. But the event filled up most of Salt Lake City’s “Salt Palace” convention center which showed that “Big Iron” scientific computing is alive and well and definitely has a lot of interest. The show featured numerous speakers (including our own Todd Farrell), discussion forums and a huge exhibition floor. I spent some time on the exhibition floor and came away with these observations:
Memory is still a hot topic for HPC (High Performance Computing). There was, as always, a lot of hardware on display at the show. The importance of memory could easily be seen in the makeup of the hardware. On most systems the CPU was easily identified, and for each CPU chip on the board there was a corresponding set of DIMM modules loaded up with memory. In most cases there were 8 DIMM modules per CPU and in all cases the modules were buffered modules. In the presentations and in talking with engineers, the topics of memory density and energy are still most concerning. Below are a couple pictures of some of the systems.
GPU’s are hot. There was a new Top500 list released on Monday at SC12. The reigning Top500 machine, Lawrence Livermore’s IBM BlueGene-Q “Sequoia” was dethroned by the newly-upgraded Oak Ridge National Labs “Titan” Supercomputer. Titan is a Cray XK7 and features more than 260,000 Nvidia Kepler GPU accelerators for a total of over 560,000 processors. Thank goodness they can draw power from the Tennessee Valley Authority! Titan’s new speed record was 16.6 Petaflops, which means a little under 17 million billion floating point operations per second. A big number.
Intel is taking HPC seriously. Cray, a longtime AMD fan, was showing off a new machine based on Intel Xeon E5 processors. I queried them about deployment of the new hardware and they told me to wait and see. But besides having replaced AMD, there were other signs of Intel’s interest in this space. There are now two Intel Xeon E5-based systems on the Top500 list. The new number 7 is perhaps the most interesting as it is made up of Dell PowerEdge servers with the hot new Intel Xeon Phi accelerator boards installed. Xeon Phi isn’t really a typical CPU in the Intel family of CPU’s. This is the chip formerly known as MIC (Many Intel Core) or alternatively as Knight’s Corner. This chip is a peripheral to a server CPU and incorporates 60 X86 processor cores and support circuitry. Oh yes, these chips have a set of memory controllers, too. You might view the Xeon Phi as Intel’s answer to the GPU: A peripheral to the main CPU that is good for crunching numbers. But perhaps Phi is good for other things as well. For this we’ll have to wait and see. Below is a picture of Cray’s new board with Intel Xeon E5 processors.
AMD is suffering in HPC (see Figure #3). AMD has been a longtime favorite of the supercomputing community. AMD was first to address the “memory wall” when they introduced processors with memory controllers built in. With Hyperchannel, AMD had a solution to allow scaling to greater numbers of CPU chips in a system. For supercomputers that needed epic amounts of address space, AMD was first with a 64-bit instruction set. If “imitation is the sincerest form of flattery” (credit: Charles Colton), AMD must be truly flattered. Well, now it appears they are about to be flattened. With the Intel juggernaut focused on this space AMD will find the going to get tougher. Hot off the press: AMD is apparently shopping around for a buyer. Not a good sign.
But, watch this space. ARM is coming! Here’s the wild card. Could it be that Intel’s reign in supercomputers will ultimately be threatened by the lowly CPU core that has made smartphone smart? It’s no secret that some of the biggest names in servers are fielding “micro-servers” based on ARM CPU technology. So far, these systems have used 32-bit ARM cores, which severely limits the upward mobility of these servers. ARM has now introduced their 64-bit IP cores and that means 64-bit server-capable chips won’t be far behind. One of the major (but little) players in ARM-base server chips is Calxeda, who is one of the companies working fast and furious to bring 64-bit ARM to the server masses. ARM is attractive in this space due to the same reason it was a winner in cell phones: Energy. Low energy is very important in the supercomputer and server space. Energy is a major component of the total cost of ownership for these machines. Consider that a large datacenter can consume upwards of 20MW per year, and each MW can cost over $1M and you can see why energy matters.
|Figure 1: AMD CPU with Micron memory
||Figure 2: Fujitsu with Micron memory
Some final thoughts:
I like GPU’s as much as anyone, but frankly, I’m concerned with the results I see. Sure, the GPU-equipped Titan offers impressive performance, but it does so with increased energy consumption. If we look at the performance gain over the previous #1 (Lawrence Livermore’s “Sequoia”), the gain is less than 8% in performance at a 4% increase in power consumption. Clearly this is not a trajectory that’s going to get us to a viable Exascale-class supercomputer with a viable energy bill. Exascale machines are supposed to be capable of a billion-billion floating point operations per second with a power consumption of under 20MW. We need bigger improvements in performance and power. One additional concern on GPUs: Not a single GPU-based system showed up on the Graph500 list. What does this mean? Many experts consider the Graph500 benchmark to be more representative of most real-life applications. The Top500 benchmark is Linpack, which is very floating-point intensive. What type of system do you want to run your datacenter? Unless you’re computing fluid dynamics, climate simulations or protecting a nuclear stockpile you might want to go for the Graph machine. Of course, this does make me wonder about the pursuit of Exascale floating-point performance…
If you looked carefully you would see that many of the Intel-based supercomputers use Intel’s Xeon E5 instead of the Xeon E7 processors. The E7 is the higher performance part, Intel’s flagship CPU, yet these machines are using E5 processors. I asked two different companies why their systems didn’t use the E7 and I received the same answer from both: price. So while performance is paramount in these systems, price is more important. How much of a price delta are we talking about here? Intel publishes the single-unit list price of their processors and if I compare a Xeon E7-8860 to a Xeon E5-4610 I see list prices of $4016 and $1219, respectively. (Other part numbers can have larger or smaller deltas.) Maybe there’s something more subtle here than what we see in just the price. The lower-rated E5 processor actually has higher CPU-CPU bandwidth than the E7. Both CPU’s can control over 1TB of memory, but the memory frequency (and thus bandwidth) is 25% faster for the E5. I’ll leave you with this closing thought: It’s about the memory. :)