Memory

Why agentic AI is rewriting the rules for CPU and memory infrastructure

Sudharshan Vazhkudai, Sujit Somandepalli and Raj Narasimhan

Abstract AI visualization representing data center compute and memory infrastructure

The rise of agentic AI and its hidden infrastructure cost

 

From chatbots to autonomous agents — the shift is real

Artificial Intelligence is no longer just about answering questions. Today's AI systems are increasingly agentic — they plan, act, observe results, and iterate autonomously until a goal is achieved. Whether it's a software engineering agent inspecting code and running tests, a supply chain agent coordinating procurement and logistics, or a financial research agent synthesizing SEC filings and market news, these systems are fundamentally different from the static inference models of yesterday.

But here's the infrastructure twist that most people miss: Agentic AI doesn't just demand more GPUs — it's quietly driving a massive resurgence in CPU and DRAM demand.

The GPU is no longer the whole story

For years, AI infrastructure conversations began and ended with GPUs. That's changing fast. In an agentic system, the GPU still handles the heavy lifting of LLM inference — the attention mechanisms, matrix multiplications, and token generation. But around the model, an entirely different compute layer comes alive:

  • Planning loops that decompose a high-level goal into subtasks
  • Routers and schedulers that decide which tool or API to call next
  • Tool execution engines that run code, query databases, or call external APIs
  • Retrieval and I/O pipelines that fetch and stage relevant context

This "work around the model" is inherently branchy, stateful, and often serial per request — characteristics that make it a natural fit for CPUs, not GPUs. Think of it this way: the GPU is the factory floor; the CPU is the management office.

As agentic workloads scale, so does the need for more CPU cores and threads per GPU — shifting CPU-to-GPU ratios from the historical 1:4 all the way to 1:1 or 1:2, representing a 2–4x increase in CPU demand, the first scaling curve.

CPU is no longer the support act — emerging architectures speak volumes

The silicon industry has gotten the memo — loudly. Every major CPU enabler is now racing to build purpose-built silicon for agentic AI, and the architectural ambition on display is unlike anything we've seen in a generation.1 These aren't incremental spec bumps. These are ground-up redesigns — chips purpose-engineered to deliver extreme rack-level density, ranging from 86 to 120 CPU cores per GPU, with the explicit goal of keeping accelerators saturated and agent pipelines flowing without interruption.2

And the scale being demonstrated is staggering. A single modern agentic rack can now sustain tens of thousands of concurrent CPU container environments — each one independent, each one running at full performance.3 That's not a benchmark. That's a production architecture.

This is no longer about giving the GPU a hand. The CPU is now a massive parallel fabric — spinning up thousands of agentic containers simultaneously, each hosting the tools, sandboxes, and orchestration frameworks that autonomous AI agents depend on to plan, act, and iterate. The support act just became a headliner.

The memory multiplier — the compounding force nobody saw coming

 

Every agent needs a home — and that home eats memory

If CPU demand is the headline, DRAM demand is the deeper, more consequential story. Every live agent instance must maintain:

  • State and KV/context staging — keeping track of where it is in its reasoning loop
  • Tool outputs and queues — buffering results from API calls and code execution
  • Container/sandbox memory — isolated runtime environments for safe execution
  • Vector/index data — for retrieval-augmented generation and semantic search
  • OS and runtime overhead — the base cost of keeping thousands of environments alive

Multiply this across thousands of concurrent agents in a single rack, and the numbers become staggering. Research shows that up to 90% of agent latency can be attributed to CPU-side tool processing — meaning memory bandwidth and capacity are directly on the critical path for agent performance.

A Second Scaling Curve Is Emerging

Here's the thing about agentic AI that doesn't get talked about enough: it doesn't scale linearly — it scales multiplicatively.

Every new CPU core you spin up for agent orchestration brings with it a live environment that has to live somewhere. That somewhere is DRAM. And when you start stacking containers by the thousands, the math gets uncomfortable fast. Table 1 makes it visual:

CPU Container Environments GrowthMemory per AgentTotal Combined Growth
1 x1 x1 x
2 x2 x4 x
3 x3 x9 x
4 x4 x16 x

Table 1: Combined growth from CPU and Memory. 1x = ~22K agentic containers at 16GB each.

Double the containers, double the memory per agent — and you've just 4x'd your DRAM requirement. Triple both, and you're at 9x. This isn't a rounding error; it's a compounding structural force reshaping data center economics in real time.

And it makes sense when you look at what each agent is actually carrying: state, context buffers, tool outputs, vector indexes, sandbox memory, runtime overhead. Memory per agent isn't staying flat either — as agents get smarter and tackle more complex tasks, their footprint grows too. The bottleneck is no longer the model math happening on the GPU. It's the explosion of agent environments that need to be alive, warm, and responsive — all at once.

This is the second scaling curve. It runs parallel to the compute curve, and in many ways, it's moving faster.

What this means for the industry

The implications ripple across the entire data center ecosystem:

  • Memory vendors face an accelerating demand curve with an emphasis on high-capacity, high-bandwidth DRAM — not just more memory, but smarter memory architectures.
  • CPU vendors — are all pivoting to position their products as agentic orchestration engines, not just AI co-processors.
  • Infrastructure architects must rethink rack design: the era of GPU-dominant racks is giving way to balanced CPU-GPU-DRAM fabrics purpose-built for autonomous agent pipelines.

The bottom line

Agentic AI is not a software trend that lives on top of existing infrastructure — it is actively reshaping the hardware roadmap. The message for anyone building, buying, or investing in AI infrastructure is simple: plan for significantly more DRAM, and plan for it sooner than you think.

The agents are coming. Make sure your infrastructure is ready.

References

1. Arm AGI CPU | AMD on Agentic AI & CPUs

2. Intel: Agentic AI Requires More CPUs

3. NVIDIA Vera CPU Launch — 22,500 CPU container environments in a single rack of 256 CPUs, each with 88 cores.

Fellow of Systems Design Engineering

Sudharshan Vazhkudai

Dr. Sudharshan S. Vazhkudai is a fellow of systems design engineering at Micron Technology. Here he established the Data Center & Client Workload Engineering team, which brings an end-to-end systems perspective in understanding how deep-memory hierarchy is used to create modern system architectures optimized for workloads. Prior to this, for over two decades, he worked at Oak Ridge National Lab, building data center solutions. Dr. Vazhkudai holds a Ph.D. in computer science from the University of Mississippi and has also served as a joint faculty at the University of Tennessee.

Senior Engineering Manager, Data Center Workload Engineering

Sujit Somandepalli

Sujit Somandepalli is a Senior Engineering Manager in the Data Center Workload Engineering group at Micron Technology, Inc., where he leads performance characterization and workload-driven optimization of next-generation memory and storage solutions. His work focuses on bridging application behavior with system-level architecture to drive differentiated value in data center deployments. Sujit brings prior experience from Dell Inc. and Qualcomm Inc., with a strong background in systems design, performance modeling, and workload analysis and is particularly interested in emerging memory technologies, memory hierarchy tuning, and application-aware system design.

Senior VP and GM, Compute & Networking Business Unit

Raj Narasimhan

Raj Narasimhan is senior vice president and general manager of Micron's Compute and Networking Business Unit. He is responsible for leading Micron’s largest business, driving advances in memory products focused on high-performance computing, artificial intelligence, and cloud and client computing.

Raj Narasimhan

Related Blogs