logo-micron

Add Bookmark(s)


To:

Email


Bookmark(s) shared successfully!

Please provide at least one email address.

Micron Advanced Computing Solutions (ACS) FAQs

Advanced Computing Solutions(5)
General(2)
Is Pico Computing now part of Micron?
In 2015 Micron Technology acquired Pico Computing, an industry leader in FPGA solutions. Now known as Micron Advanced Computing Solutions (ACS), our modular, highly scalable FPGA-based HPC and embedded systems comprise the industry’s leading technology for high-performance computing.
Does Micron favor Xilinx® or Altera® (Intel®) FPGAs?
Micron does not favor one or the other. Our ACS modules are designed around the latest-generation FPGA components from either Xilinx or Intel, depending on design goals and specific customer requirements.
Getting Started(3)
How do I get started with your system?
All of our ACS hardware comes with an installer file. Simply print out the Getting Started file and follow the directions. The C++ API source files that are included contain a PicoDrv, which represents an FPGA.
How do I interface with a host processor?
You interface like you would in any other system that utilizes PCIe® add-in cards.
How do I use more than one module?
Our PicoFramework provides access to all basic FPGA functionality regardless of the number of modules. The software API includes a source file called PicoDrv, which creates a PicoDrv object for each FPGA module in a system, making FPGA module communication simple.
Programming FPGAs(3)
How do I upload my bitfile to an FPGA in your system?
Our PicoFramework provides access to all of the basic FPGA functionality in your system. When you build a configuration file for an FPGA, the PicoFramework software will be the top level, and your module will be instantiated inside the framework. You create a PicoDrv object for each FPGA in the system.
What is the loading mechanism for backplane-mounted modules?
Programming an ACS module is accomplished via the PCIe® bus. Our EX-700 and EX-750 backplanes include a Spartan-6 FPGA that is used to load the ACS FPGA modules utilizing API calls. We also support and provide examples of DMA transfers through PCIe.
If I have a size-constrained application, do I need to use a backplane?
Our EX-700 and EX-750 backplanes are not technically required when using Micron’s ACS FPGA modules. Our modules can run in stand-alone with the bitfile programmed into the configuration flash, which then loads the FPGA.
Design Flows(6)
Do I need to migrate my entire application to a Micron ACS FPGA module to realize the performance advantage?
No. Simply move your application’s “hot spot” to the FPGA module and then execute a function call from the main application that remains on the traditional CPU-based system.
How do I recompile my legacy serial code to run on Micron’s ACS products?
Existing code written for serial processors should not be recompiled to run on highly parallel FPGA architectures because the many parallel benefits of the FPGA will not be realized. In fact, FPGAs are clocked much slower than CPUs (a significant power consumption benefit), so serial code would run even slower. Existing code should be analyzed to discern where the parallel nature of FPGAs offers the largest benefits, and only that part of the code should be rewritten to take advantage of the parallel nature of FPGAs. This way, the biggest benefit can be realized with the smallest effort.
Which tools do I need to use to utilize Micron’s ACS FPGA modules?
The PicoFramework doesn’t constrain your selection of FPGA design tools. Use whichever tools you are currently using for your FPGA development and whichever tools you are most comfortable with.
Does Micron ACS support OpenCL?
Yes. Both Intel’s OpenCL™ and Xilinx’s SDAccel can be used with PicoFramework. Use whichever tools you are currently using for your FPGA development and whichever tools you are most comfortable with.
Do I need to start from scratch?
No. To start your own project, simply find the sample that best matches your communication model and ACS module/board, and copy it to your work directory. The copy function will provide all source files for the PicoFramework; you will just need to add your own code.
What simulators does Micron’s ACS support?
We currently support both the Xilinx® ISim and the Altera® ModelSim (Mentor’s simulator) simulators.
HMC Controller(11)
What HMC specification does the Micron® HMC controller implement?

Micron’s Hybrid Memory Cube (HMC) controller implements the Hybrid Memory Cube Consortium’s Specification 1.1. This specification corresponds to second-generation HMC.

What FPGA devices are currently supported?

The HMC controller supports Intel® (formerly Altera®) Stratix® V and Arria® 10 FPGAs as well as Xilinx® Kintex® UltraScale™ and Virtex® UltraScale+™ devices.

What kind of interface does the HMC controller have?

The HMC controller’ has an interface with five 128-bit ports or a 512-bit AXI-4 interface with one 128-bit port used for host accesses.

What clock speeds does the controller operate at?

Controller

Links

Clock Speed

x8

15 Gb/s

187.5 MHz

x16

15 Gb/s

375 MHz

x8

12.5 Gb/s

156.25 MHz

x16

12.5 Gb/s

312.5 MHz

x8

10 Gb/s

125 MHz

x16

10 Gb/s

250 MHz


What is the latency internal to the controller?

The total combined latency for the HMC controller can range from 100ns to 700ns for both the RX and TX sides in a round-trip transaction. The amount of latency depends on how the controller is configured and the features that are used. For example, if using the multiport interface, the controller creates well-formed packets according to the HMC protocol, reducing latency. The 512-bit AXI interface has read data reordering built in so read data is always returned to the user in the order requested, resulting in some packets having more latency.

The link retry feature can also contribute to the controller’s latency, bringing it up to ~300ns. This feature requires the controller to perform a complete cyclic redundancy check (CRC) on all incoming data before it is delivered. Without performing this feature, the controller latency will be at ~140ns to as low as ~100ns. Here are a few reasons to turn off the CRC checks on incoming data prior to delivery:

  • If you have an application architecture (sitting on top of the controller) that enables the error to be resolved downstream. In other words, the controller can do CRCs in parallel to the data being delivered by triggering an error flag that can be addressed within the application architecture itself. In this case, the controller does not have to gate data until you are certain that it is received.
  • If you have hardware that is designed with enough margin that you can turn retry features off or only keep on the feature that activates an error flag but doesn’t retrain the link.

NOTE: In the rare event of a retry, a long tail is added to the 300ns latency.

Why is the interface 640 bits wide rather than a binary multiple like 256, 512 or 1024 bits?

The transceivers for Xilinx and Altera use slightly different gear boxes when they ingest 16 streams of data, turn them into 640 bits, and balance this with clock speed. Narrow is better, so 512 bits is an ideal number because it is a binary multiple, but in this case, the controller would have to process at almost 450 MHz, which runs the clock rate too fast. 650 bits, on the other hand, is as narrow as possible without running the clock rate too fast. 1024 bits, which OpenSilicon ran for a while, is too wide and too slow, causing more problems than it solves. Also, 512 bits sounds ideal, but it doesn’t work with the packet sizes. For example, the biggest packet, which is 128 bytes would be 8 flits, plus the header and tail, which is 9 flits, which does not divide into 512 bits cleanly.

Is there command scheduling in the controller or is it in order from the perspective of the user interface?

The HMC itself may reschedule; it has enough performance to multitask, so it can let requests pass each other. This means that requests could return to the controller out of order. Micron can configure logic to the controller to reorder the data if your application requires it, taking into consideration your requirements for low latency versus in-order transactions. 

How much of the FPGA does the HMC controller use?

The controller uses approximately 32,000 ALMs/LUTs and 3Mb of memory in Altera® and Xilinx® FPGAs.

What are some design examples using the controller?

GUPs have been implemented on all HMC modules, included with your purchase of the board. Also, an AXI HMC memory test sample application is provided that utilizes the 512-bit AXI interface.

What does the controller do to maximize throughput?

The HMC controller is a fully pipelined block designed to maximize throughput. While both read and write operations require multiple clock cycles to complete, the controller allows users to issue several read and/or write requests before the first response is returned by the HMC. This pipelining of read and write requests greatly improves the throughput of the memory for user applications.

Is ECC performed within the HMC or within the controller?

Cyclic redundancy check (CRC) error detection is used on the serializer/deserializer (SerDes) links. The CRC is generated on TX packets and checked on RX packets in the HMC controller. An error will trigger a retry on the failed packet. The HMC memory itself uses error correction code (ECC) error detection and correction inside the memory arrays themselves.