High Bandwidth Memory (HBM)

High Bandwidth Memory (HBM) is a 3D-stacked memory technology that provides the extreme bandwidth AI accelerators need to feed their computational cores with data. It has become the critical memory technology for AI computing — the component that often determines what models can be trained and at what speed.

The architecture is distinctive. Rather than placing memory chips beside the processor on a circuit board (as with traditional GDDR memory), HBM stacks multiple DRAM dies vertically — 4, 8, or 12 layers high — connected by thousands of through-silicon vias (TSVs). This stack is then mounted directly adjacent to (or on top of) the processor die using a silicon interposer, creating an extremely wide data bus with very short signal paths. The result is dramatically higher bandwidth in a smaller footprint and at lower power.

The numbers tell the story. NVIDIA's H100 GPU uses HBM3 providing 3.35 TB/s of memory bandwidth across 80 GB of capacity. The next-generation B200 uses HBM3e at 8 TB/s across 192 GB. For comparison, a high-end consumer GPU with GDDR6X provides roughly 1 TB/s. This bandwidth is essential because AI workloads — particularly transformer-based LLM inference — are memory-bandwidth-limited: the time to generate a token is dominated by reading model weights from memory, not by the arithmetic computation.

The supply chain implications are enormous. HBM is manufactured by just three companies: SK Hynix (market leader), Samsung, and Micron. The complex 3D stacking process has lower yields and longer production cycles than standard DRAM. As AI accelerator demand has surged, HBM has become the most constrained component in the AI supply chain — more so even than the GPUs themselves. SK Hynix's HBM revenue has grown by multiples year over year, and the company's market capitalization reflects its position as a critical AI infrastructure provider.

The technology continues to evolve. HBM4 (expected 2025-2026) will further increase bandwidth and capacity, potentially using hybrid bonding instead of TSVs for denser die stacking. Processing-in-memory (PIM) architectures place compute logic within the HBM stack itself, performing simple operations (like AI inference vector operations) without moving data to the processor at all.

HBM's role in AI infrastructure connects directly to the cost curves that define AI accessibility. The cost of HBM per gigabyte is roughly 5-10x that of standard DRAM, contributing significantly to the $30,000-40,000+ cost of a single AI accelerator. As fabrication advances improve yields and scale, HBM cost reduction will be a key driver of the AI inference cost deflation that enables broader AI adoption.

Further Reading