Support NeoGAF

SantaC · Dec 17, 2021

Samsungs GDDR6 memory has managed to reach 24.0 Gbps, meanwhile Nvidias RTX 3080 GDDR6X has a maximum speed of 19.0 Gbps

I wouldnt be surprised if RDNA3 takes advantage of this.

GDDR6 | DRAM | Samsung Semiconductor Global

Meet upgraded GDDR6's industry's fastest speed, greater power efficiency, and incredible versatility, with low voltage for high-bandwidth, high-computing tasks.

www.samsung.com

kyliethicc · Dec 17, 2021

That's pretty cool.

Just a note - Micron GDDR6X can hit 21 Gb/s.

https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/gddr/gddr6/gddr6x_sgram_8gb_brief.pdf?rev=161547726f0b45239d3da37ef29b09bf

Nvidia capped theirs around 19-19.5 Gb/s on the 30 cards for energy consumption / thermal reasons.

rofif · Dec 17, 2021

ok I guess. whatever ?

ethomaz · Dec 17, 2021

That great and means GDDRx will reach even higher speeds.

In the opposite side nobody talks anymore about HBM in consumer space.

SantaC · Dec 17, 2021

rofif said:
ok I guess. whatever ?

GDDR6X is more expensive and also more sparse.

rofif · Dec 17, 2021

SantaC said:
GDDR6X is more expensive and also more sparse.

It still won't matter for the prices

Dream-Knife · Dec 17, 2021

Overclocking NVIDIA GeForce RTX 3080 memory to 20 Gbps is easy - VideoCardz.com

NVIDIA GeForce RTX 3080 memory easily overclocks to 20 Gbps Our sources were kind enough to share some benchmarking results of the GeForce RTX 3080. We found the most interesting topic to be overclocking. It appears that currently, the best RTX 3080 variant is … NVIDIA Founders Edition, only...

videocardz.com

K' Dash · Dec 17, 2021

rofif said:
ok I guess. whatever ?

Amazing and insightful post.

StreetsofBeige · Dec 17, 2021

ethomaz said:
In the opposite side nobody talks anymore about HBM in consumer space.

Years back, I remember HBM and charts saying how revolutionary it is.

Havent heard anything about it in a long time.

Kazekage1981 · Dec 17, 2021

thats awesome, good job samsung! I think XsX Pro and PS5Pro will need close to 1 TB/sec of bandwidth for their RAM. RAM bandwidth is also important for content creation.

rofif · Dec 17, 2021

K' Dash said:
Amazing and insightful post.

I mean... I am only 32 but I never really noticed any difference in ram/memory aside from it's quantity.
That one time in my whole life I had to upgrade from 128 to 512 so battle for middle earth would not stutter

ACESHIGH · Dec 17, 2021

I wonder if we will get to a time where we will have a single pool of fast memory on PC for system and graphics like in consoles

Loxus · Dec 17, 2021

ethomaz said:
That great and means GDDRx will reach even higher speeds.

In the opposite side nobody talks anymore about HBM in consumer space.

I don't know why either.
HBM 3 sounds good with RDNA 3 because chiplets already have an interposer.

HBM3: Big Impact On Chip Design
Chipmakers have made it clear that HBM3 makes sense when there is an interposer in the system, such as a chiplet-based design that already was using the silicon interposer for that reason.

Not to mention 32GB per stack.
Rambus Outs HBM3 Details: 1.075 TBps of Bandwidth, 16 Channels, 16-Hi Stacks
The increased number of memory channels supports more memory die, thus supporting up to 16-Hi stacks (supports up to 32 Gb per channel) that deliver up to 32GB of total capacity, with 64GB of capacity possible in the future.

Either way, with 3D stacking becoming the future, HBM going to become the standard sooner rather than later.
Maybe with the PS6/XBSX2.

ethomaz · Dec 17, 2021

Loxus said:
I don't know why either.
HBM 3 sounds good with RDNA 3 because chiplets already have an interposer.

HBM3: Big Impact On Chip Design
Chipmakers have made it clear that HBM3 makes sense when there is an interposer in the system, such as a chiplet-based design that already was using the silicon interposer for that reason.

Not to mention 32GB per stack.
Rambus Outs HBM3 Details: 1.075 TBps of Bandwidth, 16 Channels, 16-Hi Stacks
The increased number of memory channels supports more memory die, thus supporting up to 16-Hi stacks (supports up to 32 Gb per channel) that deliver up to 32GB of total capacity, with 64GB of capacity possible in the future.

Either way, with 3D stacking becoming the future, HBM going to become the standard sooner rather than later.
Maybe with the PS6/XBSX2.

Maybe costs? I don’t know either.
But AMD Infinite Cache basically kills the need to higher bandwidth than what GDDR6 delivery.

TheThreadsThatBindUs · Dec 17, 2021

ethomaz said:
Maybe costs? I don’t know either.
But AMD Infinite Cache basically kills the need to higher bandwidth than what GDDR6 delivery.

Not for future GPU multiple times as fast as the current top end.

There's only so much SRAM you can fit on a die and as we go lower than 5nm costs will balloon further, meaning even less incentive to have large on-die caches.

IC is effectively a stop-gap solution, necessitated because memory bandwidth hasn't scaled nearly as well as transistor count on the processor cores. Once HBM becomes low enough cost in relative terms, IC will be quickly discarded for the TB/s worth of bandwidth and multiple tens of GB memory capacity that HBM can offer.

Dream-Knife · Dec 18, 2021

ethomaz said:
Maybe costs? I don’t know either.
But AMD Infinite Cache basically kills the need to higher bandwidth than what GDDR6 delivery.

Do we have benchmarks of infinity cache with gddr6 vs gddr6x?

ethomaz · Dec 18, 2021

Dream-Knife said:
Do we have benchmarks of infinity cache with gddr6 vs gddr6x?

Nope but GDDR6x is based on GDDR6… it just it is enhanced to reach better speeds… it should not have any difference at the same speed.

ethomaz · Dec 18, 2021

TheThreadsThatBindUs said:
Not for future GPU multiple times as fast as the current top end.

There's only so much SRAM you can fit on a die and as we go lower than 5nm costs will balloon further, meaning even less incentive to have large on-die caches.

IC is effectively a stop-gap solution, necessitated because memory bandwidth hasn't scaled nearly as well as transistor count on the processor cores. Once HBM becomes low enough cost in relative terms, IC will be quickly discarded for the TB/s worth of bandwidth and multiple tens of GB memory capacity that HBM can offer.

I don’t think IC is a stop gap solution… nVidia and AMD are trying to put on die memory for so long that I do believe the future all the memory will be stacked in the GPU package like IC with more layers.

Unknown Soldier · Dec 18, 2021

ethomaz said:
I don’t think IC is a stop gap solution… nVidia and AMD are trying to put on die memory for so long that I do believe the future all the memory will be stacked in the GPU package like IC with more layers.

The cost of stacking 64GB of DDR6 on top of a GPU die is prohibitive. This is also why HBM died in the consumer space.

What Apple is doing with M1 is closer to what we'll see going forward in terms of integrating memory closer to SoC or CPU and GPU dies.

thicc_girls_are_teh_best · Dec 18, 2021

Damn this shit is moving very fast. Didn't think 24 Gbps would happen until GDDR7, and that is probably for 2023 - 2024.

Dunno if RDNA3 will bother with it though, actually. AMD is trending with lower main memory bandwidth offset by a lot of Infinity Cache on-die. Which for their design works out better as you have more data closer to the chip in a (relatively, for SRAM) large pool of memory with higher bandwidth and lower latency than GDDR6/6X etc.

Or, maybe they will use it, but in a way to get good bandwidth on VRAM and simplifying the memory controller set-up; i.e instead of needing 256-bit interface with 8x 16 Gbps GDDR6 chips for 512 GB/s, you can do 192-bit interface with 6x chips and get 576 GB/s, or you can just clock the chips a bit lower to get 512 GB/s.

You lose some capacity (12 GB vs. 16 GB), but have a simpler memory controller setup and a smaller card. It could work for a few different designs in the mid-end maybe.

TheThreadsThatBindUs said:
Not for future GPU multiple times as fast as the current top end.

There's only so much SRAM you can fit on a die and as we go lower than 5nm costs will balloon further, meaning even less incentive to have large on-die caches.

IC is effectively a stop-gap solution, necessitated because memory bandwidth hasn't scaled nearly as well as transistor count on the processor cores. Once HBM becomes low enough cost in relative terms, IC will be quickly discarded for the TB/s worth of bandwidth and multiple tens of GB memory capacity that HBM can offer.

Possibly. But if IC is already doing so well for RDNA 2 in rasterization workloads to keep up (if not beat) Ampere in those, why get rid of it? They'll never fully discard IC IMO, and some of the patents and documentation (at least from sources I've read on talking about them) for V-Cache makes it sound like they will be implementing that into the GPU side as well, so they clearly see it as part of their long-term solution.

HBM prices will have to come down a bit more before we really see it back in commercial, consumer GPU designs, and even then it depends on which HBM we're talking about. HBM3 IIRC already has some samples testing at 6 Gbps or 7 Gbps, but you can bet those chips are going to go for a premium, too much a premium for even high-end consumer GPUs for the next 3-4 years I'd say. Plus while HBM has lower latency and higher bandwidth than GDDR, its latency still can't beat SRAM cache's; even supposing IC is the slowest of the cache on an AMD GPU, latency will still be better vs. HBM and bandwidth will be better when costs are taken into account (you'd need a good few HBM3 8-Hi stacks to match the bandwidth of a couple hundred MB's worth of IC by RDNA 4 gen).

IMHO, IC isn't going anywhere; memory trends have been going towards moving things closer on-chip anyway, so you end up with an arguably worst design ridding of IC and replacing both it and GDDR with just HBM Gen 3 or whatever. But if you balance out the IC capacity with what's a reasonable amount of HBM3, cut out GDDR altogether, you get a very capable design that isn't compromising out of a self-inflicted wound (or compromising at all aside from potentially IC &VRAM capacity, to stay within a certain budget).

MightySquirrel · Dec 18, 2021

This might end up not even being used, maybe on low end cards, since GDDR7 [32Gbps] was already teased by very same samsung.

Krappadizzle · Dec 18, 2021

MightySquirrel said:
This might end up not even being used, maybe on low end cards, since GDDR7 [32Gbps] was already teased by very same samsung.

We won't see gddr7 for at least 2 years in GPU's.

TheThreadsThatBindUs · Dec 18, 2021

ethomaz said:
I don’t think IC is a stop gap solution… nVidia and AMD are trying to put on die memory for so long that I do believe the future all the memory will be stacked in the GPU package like IC with more layers.

IC in its current application totally is. There's really no other option for AMD. If HBM was cheap enough, or GDDR7 was ready, you probably wouldn't have seen IC at all.

That's not to say IC won't have any utility when HBM does eventually become economical. The move to chiplet-based GPU almost certainly necessitates the use of an IC-like solution for inter-chiplet communication.

thicc_girls_are_teh_best said:
Possibly. But if IC is already doing so well for RDNA 2 in rasterization workloads to keep up (if not beat) Ampere in those, why get rid of it? They'll never fully discard IC IMO, and some of the patents and documentation (at least from sources I've read on talking about them) for V-Cache makes it sound like they will be implementing that into the GPU side as well, so they clearly see it as part of their long-term solution.

HBM prices will have to come down a bit more before we really see it back in commercial, consumer GPU designs, and even then it depends on which HBM we're talking about. HBM3 IIRC already has some samples testing at 6 Gbps or 7 Gbps, but you can bet those chips are going to go for a premium, too much a premium for even high-end consumer GPUs for the next 3-4 years I'd say. Plus while HBM has lower latency and higher bandwidth than GDDR, its latency still can't beat SRAM cache's; even supposing IC is the slowest of the cache on an AMD GPU, latency will still be better vs. HBM and bandwidth will be better when costs are taken into account (you'd need a good few HBM3 8-Hi stacks to match the bandwidth of a couple hundred MB's worth of IC by RDNA 4 gen).

IMHO, IC isn't going anywhere; memory trends have been going towards moving things closer on-chip anyway, so you end up with an arguably worst design ridding of IC and replacing both it and GDDR with just HBM Gen 3 or whatever. But if you balance out the IC capacity with what's a reasonable amount of HBM3, cut out GDDR altogether, you get a very capable design that isn't compromising out of a self-inflicted wound (or compromising at all aside from potentially IC &VRAM capacity, to stay within a certain budget).

You misunderstand. I'm not arguing that IC will disappear entirely. It just won't be the primary solution focused on meeting the insane bandwidth requirements of next-gen GPU chips when HBM is ripe and ready for prime time.

As I mentioned above, IC or at least a derivative of the technology is necessary for the move from monolithic GPU dice to chiplet-based GPUs, to facilitate the inter-chiplet comms. Still, it'll be scaled down. On-die SRAM isn't free, and the die area footprint costs to include a sufficient amount to actually be useful on bleeding edge sub-5nm nodes could far exceed the cost of a reasonable HBM stack + packaging. A small, shared, last-level, fabric-integrated cache for inter-chiplet comms plus HBM is the likely solution.

The effectiveness of putting only a piddly few MBs of IC on the die for off-chip comms drops off a cliff as GPUs get bigger and their rendering workloads balloon. GPUs are, as you know, designed to be latency tolerant, so the additional latency cost of going off-die to HBM is not going to kill performance provided the off-die memory can provide sufficient bandwidth (and HBM certainly can). And the move to a chiplet-based design will mean your GPU chiplets have to pay that off-die latency cost for any synchronisation comms anyway, so it's just an inherent inefficiency you'll have to live with.

WitchHunter · Dec 18, 2021

The fastest component keeps getting faster...

Drew1440 · Dec 18, 2021

TheThreadsThatBindUs said:
IC in its current application totally is. There's really no other option for AMD. If HBM was cheap enough, or GDDR7 was ready, you probably wouldn't have seen IC at all.

That's not to say IC won't have any utility when HBM does eventually become economical. The move to chiplet-based GPU almost certainly necessitates the use of an IC-like solution for inter-chiplet communication.

You misunderstand. I'm not arguing that IC will disappear entirely. It just won't be the primary solution focused on meeting the insane bandwidth requirements of next-gen GPU chips when HBM is ripe and ready for prime time.

As I mentioned above, IC or at least a derivative of the technology is necessary for the move from monolithic GPU dice to chiplet-based GPUs, to facilitate the inter-chiplet comms. Still, it'll be scaled down. On-die SRAM isn't free, and the die area footprint costs to include a sufficient amount to actually be useful on bleeding edge sub-5nm nodes could far exceed the cost of a reasonable HBM stack + packaging. A small, shared, last-level, fabric-integrated cache for inter-chiplet comms plus HBM is the likely solution.

The effectiveness of putting only a piddly few MBs of IC on the die for off-chip comms drops off a cliff as GPUs get bigger and their rendering workloads balloon. GPUs are, as you know, designed to be latency tolerant, so the additional latency cost of going off-die to HBM is not going to kill performance provided the off-die memory can provide sufficient bandwidth (and HBM certainly can). And the move to a chiplet-based design will mean your GPU chiplets have to pay that off-die latency cost for any synchronisation comms anyway, so it's just an inherent inefficiency you'll have to live with.

Interesting, could we see gpu's with multiple memory pools with both HBM2 & GDDR6, similar to what the Xbox 360 had with its eDRAM?

MightySquirrel · Dec 18, 2021

Krappadizzle said:
We won't see gddr7 for at least 2 years in GPU's.

Maybe, but 40 series is prob a year away so it's possible.

Bo_Hazem · Dec 18, 2021

Tech seems to be evolving rapidly in the last 1-2 years. All good to wait for PCIe 5.0 and take the greatest possible upgrade.

thicc_girls_are_teh_best · Dec 18, 2021

TheThreadsThatBindUs said:
You misunderstand. I'm not arguing that IC will disappear entirely. It just won't be the primary solution focused on meeting the insane bandwidth requirements of next-gen GPU chips when HBM is ripe and ready for prime time.

As I mentioned above, IC or at least a derivative of the technology is necessary for the move from monolithic GPU dice to chiplet-based GPUs, to facilitate the inter-chiplet comms. Still, it'll be scaled down. On-die SRAM isn't free, and the die area footprint costs to include a sufficient amount to actually be useful on bleeding edge sub-5nm nodes could far exceed the cost of a reasonable HBM stack + packaging. A small, shared, last-level, fabric-integrated cache for inter-chiplet comms plus HBM is the likely solution.

The effectiveness of putting only a piddly few MBs of IC on the die for off-chip comms drops off a cliff as GPUs get bigger and their rendering workloads balloon. GPUs are, as you know, designed to be latency tolerant, so the additional latency cost of going off-die to HBM is not going to kill performance provided the off-die memory can provide sufficient bandwidth (and HBM certainly can). And the move to a chiplet-based design will mean your GPU chiplets have to pay that off-die latency cost for any synchronisation comms anyway, so it's just an inherent inefficiency you'll have to live with.

Okay I see where you're coming from on this, now. Briefly sounded like IC would be removed altogether, but proposing there is still some present (in moderation) balanced out with a good deal of HBM, hopefully by the 2024-2025 period we'll start to see HBM3 not only available but at good enough prices and volumes to supplement GDDR on at least the higher-end consumer GPUs, and for AMD-based ones we could be looking at 128 MB - 192 MB or upwards 256 MB of IC (either as pure IC or IC & V-cache unless V-cache is just another phrasing for IC).

I don't think HBM3 prices will be low enough to ensure the low and mid-gen cards get it though, but at least those would still have something like IC, if in lower capacity, to offset them using GDDR6X or GDDR7 in lieu of HBM3.

BattleScar · Dec 18, 2021

rofif said:
I mean... I am only 32 but I never really noticed any difference in ram/memory aside from it's quantity.
That one time in my whole life I had to upgrade from 128 to 512 so battle for middle earth would not stutter

Memory bandwidth matters - especially for graphics performance.
That's why AMD went through tremendous expense to create HBM, and Nvidia went through so much expense to make GDDR6X.

Just because you haven't noticed or understood what's happening, doesn't mean this entire thread deserves a flippant response.

Unknown Soldier said:
The cost of stacking 64GB of DDR6 on top of a GPU die is prohibitive. This is also why HBM died in the consumer space.

HBM is not the same as stacked DRAM, which is not the same as DDRx RAM.

The reason stacked memory or cache (which in AMD's case is actually SRAM) will take off, is due to the move to MCM GPUs in the future. Note this has already begun with Instinct MI250X.
The interposer complexity will be mandatory to have MCM GPU's in the first place, so adding additional SRAM tiles to act as a last level cache for graphics might increase costs a bit, but not by orders of magnitude, as its already going to be expensive by default.

What Apple is doing with M1 is closer to what we'll see going forward in terms of integrating memory closer to SoC or CPU and GPU dies.

Apple is doing absolutely nothing special at all. The only thing that is unique to them, is the fact that they've got a 512-bit LPDDR5 (8 channel memory) bus on their massive SOC.
Its no different to any other Dual-channel LPDDR Laptop RAM configuration, beyond the sheer width of the memory interface. No unique technology whatsoever.

TheThreadsThatBindUs · Dec 19, 2021

Drew1440 said:
Interesting, could we see gpu's with multiple memory pools with both HBM2 & GDDR6, similar to what the Xbox 360 had with its eDRAM?

Not HBM combined with GDDR at least. That would be entirely redundant. HBM will never be cheap in absolute terms, and GDDR isn't cheap either, so combining them both on the same GPU when they both serve the exact same function is unnecessary.

I think you'll see combinations of last-level caches, e.g. Infinity Cache, together with HBM/GDDR6+. And then eventually, you'll see fast, low latency, non-volatile memory solutions emerging, like Re-RAM and 3D Xpoint phase-change memory that will fit between memory and the SSD.

V4skunk · Dec 19, 2021

StreetsofBeige said:
Years back, I remember HBM and charts saying how revolutionary it is.

Havent heard anything about it in a long time.

I'm sure I read recently that Samsung has made developments with hbm memory and the article was speculating on how it could be viable for high end mobile phones in a few years.

Support NeoGAF

Samsung has managed to maks GDDR6 memory faster than GDDR6X

Gold Member

Member

Banned

Banned

Gold Member

Banned

Banned

Member

Gold Member

Member

Banned

Banned

Member

Banned

Member

Banned

Banned

Banned

Member

Member

Banned

Member

Member

Banned

Member

Banned

Banned

Member

Member

Member

Banned

Similar threads