Feature Articles

High Bandwidth Memory: What's it to you?

By Bryan Chan & Koh Wanzi - 28 Sep 2015

High Bandwidth Memory: What's it to you?

The Radeon R9 Fury X is one of the first cards from AMD to feature HBM. (Image Source: AMD)

If you’ve kept up with developments in PC hardware lately, you might have heard that AMD recently released two new flagship cards – the Radeon R9 Fury and Fury X – equipped with a new-fangled memory type called High Bandwidth Memory (HBM).

HBM turns much of the conventional wisdom about video memory on its head. The NVIDIA GeForce GTX Titan X came with 12GB of GDDR5 memory, which makes the 4GB of HBM on the Fury and Fury X look scarily inadequate for cards of their stature. However, less is actually more with HBM.

Stacked Memory Chips

Four DRAM stacks are arranged around the GPU die. (Image Source: AMD)

HBM involves stacking memory chips vertically like a skyscraper. In the case of the Radeon R9 Fury X and Fury, four such memory chip towers are arranged in close proximity around the GPU die. Each tower consists of four 256MB dies stacked on top of a logic die, which amounts to 1GB per tower and a total of 4GB per card.

Both the HBM towers and the GPU sit atop an ultra-fast silicon-based interconnect called an interposer, which connects the memory to the GPU. They are all linked to each other and the interposer via microscopic wires called Through-Silicon Vias (TSVs) and structures called microbumps.

Finally, the interposer itself is positioned on top of the package substrate. This on-package integration of memory and GPUs actually isn’t new – one of the solutions to increase the speed or bandwidth of a particular component has always been to integrate it onto the CPU or GPU die.

Four DRAM dies are stacked on top of one another in close proximity to the GPU die. (Image Source: AMD)

However, this has always proved costly, and attempts to integrate DRAM onto CPU dies have run into significant space constraints. With HBM, the principle of increasing bandwidth by decreasing memory proximity to the die is exploited, space constraints are overcome by stacking the memory chips, and on-package (as opposed to on-die) integration keeps costs down.

Ultra-wide Bandwidth and Power Efficiency

A quick look at the key differences between HBM and GDDR5. (Image Source: AMD)

While these HBM stacks are not physically integrated onto the GPU die, they are so closely connected that they are almost indistinguishable from on-die integrated RAM. This close proximity enables extremely wide memory bus widths, and consequently allows much lower memory clock speeds to be used to achieve the required performance.

This is because with GDDR5 memory, individual DRAM chips are placed side by side and connected via long copper traces on the PCB. However, HBM dies are so close to the GPU that the traces that connect them are much shorter, which thus enables much wider memory bus widths.

On the AMD Radeon R9 Fury X and Fury, each HBM tower has a 1,024-bit memory bus width, and the four towers flanking the GPU die combine for a 4,096-bit bus width, the highest on any GPU so far. From buses just hundreds of bits wide, we’ve rocketed to seeing figures in the thousands, and this is just a first-generation product.

Both of AMD’s HBM-equipped GPUs have a 512GB/s of memory bandwidth, beating even the 336.5GB/s of the NVIDIA GeForce GTX Titan X.

Power consumption is also much reduced because the closer proximity enables more efficient communication between the memory and GPU. Input voltage is down to 1.3V, from 1.5V on traditional GDDR5 memory.

Space Savings

The stacked structure of HBM enables significant space savings over traditional GDDR5 memory. (Image Source: AMD)

And just like how skyscrapers help make the most of scarce land, HBM stacks enable significant space savings on PCBs. 1GB of GDDR5 may take up 672mm², but 1GB of HBM would require just 35mm² of space, which translates into over 90% in space savings.

As GDDR5 struggles to keep up with growing memory bandwidth demands, more chips are required, which in turn require more space. Clearly, that isn’t sustainable, and HBM helps reverse the burgeoning size of higher-end cards. The Radeon R9 Fury X’s PCB is just slightly over 19cm long, 30% shorter than that of the Radeon R9 290X, the previous generation’s flagship.

HBM will enable much more compact high-end cards, which will in turn allow powerful systems to be assembled in cases with far smaller space footprints.

Size matters…less

The Radeon R9 Fury X features just 4GB of HBM, but memory capacities matter less when compared to GDDR5. (Image Source: AMD)

At the start of this article we mentioned how 4GB of HBM may seem paltry compared to today’s much larger GDDR5 capacities, but AMD does not see its memory capacity as a potential performance limitation. Larger GDDR5 memory capacities have been an inevitable byproduct of the need to increase memory bandwidth by increasing memory bus width, which in turn required more GDDR5 chips. AMD says that this has actually resulted in more capacity than is actually needed, and unused frame buffer capacities effectively serve no functional purpose whatsoever.

Now, because of the impressive 512GB/s of bandwidth made available by HBM, AMD no longer needs to increase memory capacity for the sake of more bandwidth. Instead, it all comes down to efficiency and how effective drivers and programs are at managing the 4GB frame buffer.

Nevertheless, we can still fully expect larger HBM capacities in the future, either by utilizing larger DRAM dies per stack or increasing the number of stacks.

Why HBM?

It is quickly becoming unsustainable for GDDR5 memory to keep up with growing bandwidth demands. (Image Source: AMD)

Put simply, GDDR5 is fast approaching the limits of its specification. Bandwidth demands continue to grow, and for GDDR5 to keep up, it has had to increase its power consumption accordingly. However, each additional GB/s of bandwidth it squeezes out is beginning to consume too much power to be a sustainable or cost-effective option.

HBM effectively provides more bandwidth for less power, allowing GPU manufacturers to continue to increase bandwidth without running into a “power wall”. In hard numbers, that’s over 35GB/s of bandwidth per watt, compared to GDDR5’s 10.66GB/s – a 3.5x improvement.