Micron’s Hybrid Memory Cube (HMC) controller implements the Hybrid Memory Cube Consortium’s Specification 1.1. This specification corresponds to second-generation HMC.
The HMC controller supports Intel® (formerly Altera®) Stratix® V and Arria® 10 FPGAs as well as Xilinx® Kintex® UltraScale™ and Virtex® UltraScale+™ devices.
The HMC controller’ has an interface with five 128-bit ports or a 512-bit AXI-4 interface with one 128-bit port used for host accesses.
The total combined latency for the HMC controller can range from 100ns to 700ns for both the RX and TX sides in a round-trip transaction. The amount of latency depends on how the controller is configured and the features that are used. For example, if using the multiport interface, the controller creates well-formed packets according to the HMC protocol, reducing latency. The 512-bit AXI interface has read data reordering built in so read data is always returned to the user in the order requested, resulting in some packets having more latency.
The link retry feature can also contribute to the controller’s latency, bringing it up to ~300ns. This feature requires the controller to perform a complete cyclic redundancy check (CRC) on all incoming data before it is delivered. Without performing this feature, the controller latency will be at ~140ns to as low as ~100ns. Here are a few reasons to turn off the CRC checks on incoming data prior to delivery:
- If you have an application architecture (sitting on top of the controller) that enables the error to be resolved downstream. In other words, the controller can do CRCs in parallel to the data being delivered by triggering an error flag that can be addressed within the application architecture itself. In this case, the controller does not have to gate data until you are certain that it is received.
- If you have hardware that is designed with enough margin that you can turn retry features off or only keep on the feature that activates an error flag but doesn’t retrain the link.
NOTE: In the rare event of a retry, a long tail is added to the 300ns latency.
The transceivers for Xilinx and Altera use slightly different gear boxes when they ingest 16 streams of data, turn them into 640 bits, and balance this with clock speed. Narrow is better, so 512 bits is an ideal number because it is a binary multiple, but in this case, the controller would have to process at almost 450 MHz, which runs the clock rate too fast. 650 bits, on the other hand, is as narrow as possible without running the clock rate too fast. 1024 bits, which OpenSilicon ran for a while, is too wide and too slow, causing more problems than it solves. Also, 512 bits sounds ideal, but it doesn’t work with the packet sizes. For example, the biggest packet, which is 128 bytes would be 8 flits, plus the header and tail, which is 9 flits, which does not divide into 512 bits cleanly.
The HMC itself may reschedule; it has enough performance to multitask, so it can let requests pass each other. This means that requests could return to the controller out of order. Micron can configure logic to the controller to reorder the data if your application requires it, taking into consideration your requirements for low latency versus in-order transactions.
The controller uses approximately 32,000 ALMs/LUTs and 3Mb of memory in Altera® and Xilinx® FPGAs.
GUPs have been implemented on all HMC modules, included with your purchase of the board. Also, an AXI HMC memory test sample application is provided that utilizes the 512-bit AXI interface.
The HMC controller is a fully pipelined block designed to maximize throughput. While both read and write operations require multiple clock cycles to complete, the controller allows users to issue several read and/or write requests before the first response is returned by the HMC. This pipelining of read and write requests greatly improves the throughput of the memory for user applications.
Cyclic redundancy check (CRC) error detection is used on the serializer/deserializer (SerDes) links. The CRC is generated on TX packets and checked on RX packets in the HMC controller. An error will trigger a retry on the failed packet. The HMC memory itself uses error correction code (ECC) error detection and correction inside the memory arrays themselves.