Overview
HBM, high-bandwidth memory, is formed by vertically stacking multiple DRAM dies. Each die is connected to the logic die via through-silicon vias (TSVs), enabling 8-layer or 12-layer die stacks in a compact package. This combination of small form factor, high bandwidth, and high data rates has made HBM a mainstream memory solution for high-performance AI server GPUs.
The HBM3 extension HBM3E offers up to 8 Gbps transfer speed and 16 GB capacity. It was first announced by SK Hynix and is planned for volume production in 2024.
Market and Adoption
HBM is primarily used in AI servers. The latest HBM3e is deployed in NVIDIA's H200 announced in 2023. According to TrendForce, AI server shipments reached 860,000 units in 2022 and are expected to exceed 2 million units by 2026, a compound annual growth rate of 29%.
Growth in AI server shipments is driving a surge in HBM demand, together with increases in average HBM capacity per server. Estimates project a market size of about $15 billion in 2025, with year-over-year growth exceeding 50%.
HBM suppliers are mainly concentrated among SK Hynix, Samsung, and Micron. According to TrendForce, SK Hynix's market share in 2023 is estimated at 53%, Samsung 38%, and Micron 9%. Major process developments for HBM focus on CoWoS and TSV technologies.
HBM Generations
HBM1 was introduced in 2014 by AMD and SK Hynix as a competitor to GDDR. It used a 4-layer die stack and provided 128 GB/s bandwidth with 4 GB capacity, which outperformed GDDR5 at the time.
HBM2 was announced in 2016 and commercialized in 2018. Initially defined for 4 DRAM dies and later implemented more commonly as 8-die stacks, HBM2 offered 256 GB/s bandwidth, 2.4 Gbps data rate, and 8 GB capacity. HBM2E, proposed in 2018 and formalized in 2020, increased transfer speeds and capacity, offering 3.6 Gbps and 16 GB capacity.
HBM3 was released in 2020 and commercialized in 2022. It increased the number of stacked dies and the number of memory channels, supporting 6.4 Gbps and aggregate data rates up to 819 GB/s with 16 GB capacity. HBM3E, an enhanced HBM3 released by SK Hynix, provides up to 8 Gbps transfer speed and 24 GB capacity, with large-scale production planned for 2024.
Applications in GPUs and Servers
HBM's high bandwidth, low power consumption, and compact size make it well suited for AI server workloads. HBM first appeared in production GPUs with HBM2 in the 2016 NVP100 GPU, followed by use in V100 (HBM2) in 2017, A100 (HBM2) in 2020, H100 (HBM2e/HBM3) in 2022, and the H200 (HBM3e) in 2023, providing higher speeds and larger capacities for server deployments.
Suppliers and Competitive Landscape
HBM supply is dominated by SK Hynix, Samsung, and Micron. The three memory suppliers handle DRAM die production and stacking and compete on technology upgrades. SK Hynix collaborated with AMD to release the first HBM and led early supply of HBM3E, securing first-mover advantages and major OEM relationships. Samsung supplies other cloud providers. TrendForce data for 2022 showed SK Hynix at about 50% market share, Samsung about 40%, and Micron about 10%; 2023 estimates are SK Hynix 53%, Samsung 38%, and Micron 9%.
Packaging Technologies: CoWoS and TSV
HBM packaging advances primarily involve CoWoS and TSV.
1) CoWoS: CoWoS places DRAM dies on a silicon interposer and connects them to the underlying substrate using Chip-on-Wafer (CoW) packaging processes. In practice, chips are bonded to the silicon wafer via the CoW process, and the CoW chips are then connected to the substrate to form a CoWoS assembly. TSMC's CoWoS is the mainstream solution for integrating HBM with GPUs. By shortening interconnect lengths, CoWoS enables higher data transfer speeds and is widely used in accelerators such as A100 and GH200.
2) TSV: Through-silicon vias are the core technology for scaling capacity and bandwidth. TSVs are formed by drilling through the full thickness of the silicon wafer to create thousands of vertical interconnects between the front and back of the chip. In HBM, multiple DRAM dies are stacked and connected with TSVs and solder bumps. Only the bottom die connects externally to the memory controller; the other dies are interconnected internally through TSVs.