• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Wii U's eDRAM vs Xbone's eSRAM. Both have 32 mb. Which one's better?

That is not a given. It depends on the configuration used. Compared to the SRAM cache usually seen in GPUs and CPUs, the 32MB in the Xbox One is much, much slower. It is also slower then you might typically expect embedded DRAM to be. In fact it's quite a bit slower than the eDRAM used in the 360.

I don't know that we have bandwidth figures for the WiiU's eDRAM but it could very well have more bandwidth than the Xbox One. It also has a GPU something like 1/8th as powerful. The ratio of size and bandwidth to FLOPs in the Xbox One is actually pretty bad for an embedded memory design.
I may not know a lot about memory types and the expert technicalities of it, but this just seems really really inaccurate. If it's that bad why would MS have even gone with ESRAM?

Exactly.

Edit: I mean, it sounds like you're saying the Wii U is stronger than XBO, which is impossible. Microsoft couldn't make a system weaker than Nintendo's by accident, let alone purposefully. This just sounds like a bunch of FUD.
 

ShamePain

Banned
I may not know a lot about memory types and the expert technicalities of it, but this just seems really really inaccurate. If it's that bad why would MS have even gone with ESRAM?

Exactly.

Edit: I mean, it sounds like you're saying the Wii U is stronger than XBO, which is impossible. Microsoft couldn't make a system weaker than Nintendo's by accident, let alone purposefully. This just sounds like a bunch of FUD.

Because if it wasn't for eSRAM Xbone would have trouble outputting 540p let alone 1080p or something close.
 
Brad Grenz is right, cheezcake is deeply confused, Fourth Storm tells the story.

Short answer: MS wish they could use eDRAM but they could not.


i'm not so sure about this, doesn't the PS4 have the best ram set up, and a 30-40% more powerful GPU, yet the differences are too small in multiplatform games lately, if esram was so bad how the XB1 keeping up with the ps4 when it comes to mulitplatform games, with a much weaker GPU, yes i'm aware it's 900p to 1080p difference, but XB1 seems to have a slight advantage in frame rate in some of those games.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
i'm not so sure about this, doesn't the PS4 have the best ram set up, and a 30-40% more powerful GPU, yet the differences are too small in multiplatform games lately, if esram was so bad how the XB1 keeping up with the ps4 when it comes to mulitplatform games, with a much weaker GPU, yes i'm aware it's 900p to 1080p difference, but XB1 seems to have a slight advantage in frame rate in some of those games.
I was speaking from the perspective of MS wanting an embedded pool in the first place. Had they decided not to go with such a pool, the eDRAM vs eSRAM debate would be immaterial.
 
I was speaking from the perspective of MS wanting an embedded pool in the first place. Had they decided not to go with such a pool, the eDRAM vs eSRAM debate would be immaterial.

oh ok, and of course edram would much cheaper as well.
 

twobear

sputum-flecked apoplexy
I was under the impression that MS chose eSRAM for cost purposes later in the generation. They could have had a bigger, faster pool of eDRAM but it would need a separate die and they'd be limited to contracts with the foundries that can actually produce eDRAM.
 

Overside

Banned
Whuh? The 360 and wii u hardware architectures are nothing alike, they have the same instruction set architecture, but... thats not usually something people really care to compare:

'Oh yeah, look at the instruction set on this baby, reduced just the way daddy likes it!'

Xbone uses mainly sram, which is generally faster and has larger bandwidth, but that advantadge is offset by the fact those sram transistors take up so much space that you can fit a ton of dram into the same footprint you use for just a little sram. Perhaps MS should have considered a psudo static solution with just 1 transistor per cell?

Im pretty sure the Xbone esram outperforms the wii u's edram by a significant margin, when compared directly, however, I dont feel the Xbones Sram, is enough, or does enough for the performance the system has, and needs, while the wii u's edram is pretty plentiful and very flexible and useful for the performance needs of the wii u.
 
I am not an xbone developer so I'm not sure why those chunks are split up, but if I remember right, Xbox1's GPU, CPU and Memory are all split in 2 modules, seems very curious why they did this with memory.

Ultra wide memory buses require a LOT of interconnects. Take a look at this picture of the GK110 die from the Titan:

big_gk110-die-shot.jpg.ashx

The entire outside of the die minus the bit in the top right is basically interconnects back to VRAM.

32MB is both a lot of die area but not much edge length. If you split the RAM into two pieces it gives you more edge length for the interconnects.

Look at a Core i7 die for instance:

980x-die-shot-explainer_maxwidth.jpg


You can see the L3 cache is split up into many small blocks to make sure there's enough room for the interconnects. You'll also notice the memory controller is a lot smaller because it's only 128-bit compared to 384-bit on the GK110 and because the die is much much longer as well.
 

Aroll

Member
I think the big difference in all of this is hardware design. I believe, technically speaking, the Xbox One version COULD have been capable of more than the Wii U - but due to the hardware it serves, it's purpose is sort of left behind. However, since Wii U's hardware is what it is, that eDRAM can make a HUGE difference for game performance. As in, Nintendo's inclusion is better utilized due to the hardware, meaning it makes a bigger difference, where as the Xbox One's usage is a bit of a wash, making it not be much of an advantage over choosing to not use it much at all.
 

cheezcake

Member
Brad Grenz is right, cheezcake is deeply confused, Fourth Storm tells the story.

Short answer: MS wish they could use eDRAM but they could not.

That's weird since fourth storm is also saying eSRAM is faster than eDRAM. Do you guys really believe that the memory architecture used on a console in 2005 offered more BW than both current gen consoles and no one in the meantime decided to replicate it?
 
That's weird since fourth storm is also saying eSRAM is faster than eDRAM. Do you guys really believe that the memory architecture used on a console in 2005 offered more BW than both current gen consoles and no one in the meantime decided to replicate it?

Perhaps it was underutilized and they went for a low-power consumption part, but then again I wonder how much it costs to make one to the other?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
That's weird since fourth storm is also saying eSRAM is faster than eDRAM. Do you guys really believe that the memory architecture used on a console in 2005 offered more BW than both current gen consoles and no one in the meantime decided to replicate it?
SRAM is faster than DRAM at random access - namely it's better at latencies*. Same goes for eSRAM vs eDRAM. This entire thread has mostly been about bandwidth, where SRAM has nothing on DRAM. Actually, BW is entirely a function of bus width and clocks. Xb360's huge ROP BW was due to the massive connectivity between ROP logic and actual memory macros - something which xbone does not have between its GPU and eSRAM pool. What it does have, though, is a wider/faster bus than the original Xenos <-> eDRAM bus, so the BW between xbone's GPU and the eSRAM is better for anything other than ROPs/ZOPs. Moreover, the full extent of the ROP advantage of Xenos was only applicable during MSAA - something which the amount of eDRAM in xb360 was a hindrance to.

* And that is true only up to certain sizes, above which eDRAM's inherently better densities start to yield better overall latencies.
 

HTupolev

Member
That's weird since fourth storm is also saying eSRAM is faster than eDRAM. Do you guys really believe that the memory architecture used on a console in 2005 offered more BW than both current gen consoles and no one in the meantime decided to replicate it?
SRAM is a "low overhead" memory with a simple interface, so small pools get minimal latency, giving good random-access performance. It doesn't really have a peak bandwidth advantage, however, and the physical size means that it can actually fall behind DRAM for latency with large pools.

The reason that the 360's eDRAM BW is so high is that the DRAM is in a pool that's coupled very tightly to the ROPs. The 256GB/s number is very real, it's just ROP-only and the 10MB size mitigates some of its usefulness (i.e. you can't use the "free" MSAA on a high-resolution render target unless you tile your buffer).

The reason console manufacturers don't always use pools like that is simply that they make the design more complex and, although they accelerate some tasks, they can have disadvantages in other tasks (sometimes the pool size can be an issue, sometimes it's simply a waste of die space when the BW or latency isn't a bottleneck, etc).

//========================================

In the case of XB1, it's likely that SRAM was used partly to guarantee future manufacturability. DRAM cells are a little bit odd, and not all fabrication processes can make them, whereas SRAM uses the same transistor logic as used in the processors. Any fab that can manufacture the CPU and GPU can also manufacture the SRAM.
 

Alchemy

Member
Even if eDRAM vs eSRAM ended in a draw, Xbone games are pushing much more information (games have 5GB of RAM to play with) per frame so memory bandwidth is significantly more important compared to Wii U games, which has a total system memory of less then half that at 2GB, not sure how much is accessible to games.
 

Rolf NB

Member
I may not know a lot about memory types and the expert technicalities of it, but this just seems really really inaccurate. If it's that bad why would MS have even gone with ESRAM?
It's an engineering fuckup. It's what happens if you try to design things in a field you don't have expertise in. The moment they started using the term ESRAM it was clear they wouldn't be able to compete on price nor performance.
 

ShapeGSX

Member
It's an engineering fuckup. It's what happens if you try to design things in a field you don't have expertise in. The moment they started using the term ESRAM it was clear they wouldn't be able to compete on price nor performance.

Oh yeah, SRAM cache's never increase performance. That's why you never see SRAM on modern processors.

Wait...what?
 

LordOfChaos

Member
We don't know the bandwidth of the Wii Us eDRAM. But based on its main memory being 12.8GB/s vs 68, and its GPU being a 160 (or 192, maybe, probably not) shader part, I'd guess it's a fair bit slower than the 'Bones, because it doesn't have to be as fast.

I think the educated guesses for it were around 70GB/s?

I'm curious if the Wii U uses it as a cache rather than a memory pool though. The important difference being that on the latter, like in the Bone, you have to micromanage what goes into it. While a cache would guess what it needs next and prefetch it before it needs it.

Intels Iris Pro 5200 graphics do this with 128MB eDRAM, it works pretty well, at least double the performance of the next fastest part without that cache even with similar numbers of shaders and clock speed.

therefore obviously we're not seeing problems with native 720p and even 1080p.

On the same games? I think not. The XBO has some issues because it's pushing way more operations per pixel, and often at 900p. Only a few Wii U games are 1080p, and those are simpler than what the XBO struggles with. All games at the same resolutions aren't the same.
 

LordOfChaos

Member
In fact it's quite a bit slower than the eDRAM used in the 360.




Only to the ROPs though, not everything else. The ROPs were right in the eDRAM on the 360.

X360bandwidthdiagram.jpg


You got a crazy for the time 200+ GB/s exclusively to the ROPs, the rest of the GPU got it at a relatively ho hum 32GB/s.


Edit: Oh you already knew this and posted it later cheezcake...But to this part:
Dude just look at the diagram. How is the GPU going to feed the ROPs without first being BW limited by the actual eDRAM-GPU bus BW which is 32 GB/s.



Because the ROPs are ON the eDRAM daughter die. They get direct, very fast access to it, while the rest of the GPU goes through that 32GB/s bus. And Brad was correct in saying that the ROPs are a huge bandwidth hog on GPUs, so it does take a lot of strain off the GDDR bus on 360.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
We don't know the bandwidth of the Wii Us eDRAM. But based on its main memory being 12.8GB/s vs 68, and its GPU being a 160 (or 192, maybe, probably not) shader part, I'd guess it's a fair bit slower than the 'Bones, because it doesn't have to be as fast.

I think the educated guesses for it were around 70GB/s?
Assuming a 1024bit bus, it'd be 70.4GB/s (please note the use of GB vs GiB, since we've had similar confusion in previous threads). Now, somebody did an educated macro/bus guesstimate long ago (in the Espresso thread, I think) based on Renesas' publicly available macro data, but I don't recall what it was. But 70GB/s does make the most sense design-wise, as the non-embedded ROPs mean that the Read-Modify-Write ROP cycle has to be met by the bus (vs Xenos' Write-only, due to embedding of ROPs).
 

LordOfChaos

Member
My point with Wii U's cache being better, is that it is a luxury, considering the much lower performance of the Wii U's GPU, it is proportionally more of an edge for Wii U.

That's one way to look at it, but on the other hand think about how the XBO has nearly 6x the main memory bandwidth, making the Wii U more reliant on that "luxury" of 32MB eDRAM.

Heck, the XBOs main bandwidth is nearly the speed of the Wii Us eDRAM as we've estimated above.

I do see your point though, not shooting it down, just a different perspective. Of course a GPU with maybe 5x the shaders will need nearly proportionally more bandwidth.

Hm, actually come to think about it, working out the rough numbers they seem pretty proportional on the main memory bandwidth at least.

These serve identical functions, I wouldn't consider them to be advantages over the other, but Wii U's extra 3MB on the GPU should help with some smaller effects, such as fur. Honestly though it must feel like some sort of super power on the Wii U, giving it an unbalanced edge considering it's much lower graphical performance, so I'd give it to Wii U just based on that.

The 2MB part can be used as general purpose RAM in Wii U mode, the 1MB may not be able to.
https://twitter.com/marcan42/status/298922907420200961
 

AlStrong

Member
Only to the ROPs though, not everything else. The ROPs were right in the eDRAM on the 360.

Yeah, they are essentially designed to read or write exactly what the 8 ROPs are capable of @ full rate (32bpp).

i.e. 8 ROPs @ 500MHz (read + write) * 32bpp colour + depth pixels, be it 1 sample per pixel or 4 samples per pixel.

@ 1 sample per pixel, it's 32GB/s read + write (64GB/s)
@ 4 samples per pixel, it's 128GB/s read + write (256GB/s)

The bandwidth is simply what was needed and provided since that was the exact connection in HW.

However, in order to do any further ALU/tex on buffers (post-processing etc), you have to resolve to main memory, so MSAA'd targets get spit out at 1 sample per pixel anyway.
 

DopeyFish

Not bitter, just unsweetened
Brad Grenz is right, cheezcake is deeply confused, Fourth Storm tells the story.

Short answer: MS wish they could use eDRAM but they could not.

Er... MS actually wanted to use ESRAM and if the implementation wasn't possible, they'd just use EDRAM.

why? ESRAM is better than EDRAM... Problem is it costs a LOT. it takes up more space per bit... (They would have probably used 256 MB of EDRAM if ESRAM wasn't possible)

this stuff was in the original Yukon leak fyi
 

ShapeGSX

Member
It's not a cache.

True, it is manually managed, rather than automatically managed by hardware like a traditional cache is. Other than that, it is relatively similar in actual structure to an SRAM cache or register file. There are tons of SRAM structures all over modern CPUs.

But it definitely does increase the performance of the Xbox One.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Er... MS actually wanted to use ESRAM and if the implementation wasn't possible, they'd just use EDRAM.

why? ESRAM is better than EDRAM...
No, for the purposes of xbone it isn't. You got it in reverse - they used SRAM because they could not get EDRAM, which is the obvious choice for their purposes. They ended up with a huge die for no good reason but time-to-market.
 
Yes, eDRAM could have allowed them to add far more on chip memory at a much higher bandwidth for the same die size. That's the whole reason companies developed eDRAM instead of using just using more SRAM.
 

z0m3le

Banned
That's one way to look at it, but on the other hand think about how the XBO has nearly 6x the main memory bandwidth, making the Wii U more reliant on that "luxury" of 32MB eDRAM.

Heck, the XBOs main bandwidth is nearly the speed of the Wii Us eDRAM as we've estimated above.

I do see your point though, not shooting it down, just a different perspective. Of course a GPU with maybe 5x the shaders will need nearly proportionally more bandwidth.

Hm, actually come to think about it, working out the rough numbers they seem pretty proportional on the main memory bandwidth at least.



The 2MB part can be used as general purpose RAM in Wii U mode, the 1MB may not be able to.
https://twitter.com/marcan42/status/298922907420200961

The slower Wii U ram is fast enough for assets, you can't use the main memory for GPU operations to speed up performance, which is the main reason you'd want embedded memory on the chip rather than across a bandwidth limited bridge anyways. The latency is too high to help with individual frames, however you can brute force past this, which is what PS4 does with over twice XBone's bandwidth (on the main memory) However if programmed without care, it can lead to frame drops that XBone's ESram will avoid by feeding the GPU. (Since PS4's memory has much more latency than embedded on die memory of either other console)

With all consoles having the same performance, PS4 ram bandwidth is a simple solution for developers, as long as they aren't reckless with their coding, it allows for the data to feed the GPU in time for the next frame. Wii U's is the next best set up IMO, because while the Xbone's ram is similarly set up and the main memory is faster (leading to better load times thanks to assets being loaded faster) Wii U's latency is probably faster, there is nearly 10% more memory which has even lower latency, and the CPU can read/write from the edram on Wii U's GPU which isn't possible on XBone's ESRAM afaik. You also have a physically smaller space for the data to be in, I'm not sure what the latency is again, but for what you would be using this for (individual frames) it should be more than enough speed.

There is nothing here in way of which consoles are stronger, they are very clearly bottlenecked by ALU/clock speed to ever have memory be a factor. All consoles have different bottlenecks, Wii U's is limited by just simply having weaker hardware than XB1, likewise Xbone's hardware is similarly weaker to PS4 (though not as severely).

My perspective though is that Wii U for it's performance has the best memory set up, and is similar to Intel's "Iris" configuration, with the difference being that the CPU is not on the same die (though again it can still read/write to this memory which is hugely important for some special tasks like gpgpu)

If Nintendo released a console with 10x the performance, I'd expect them to use 64MB minimum for this cache in order to take advantage of the extra ALUs more readily, of course it would help to have faster main memory, but since you are mainly using that to load assets, I don't see this as a bottleneck, and more a reason for slower loading times in the first year or two before developers really noticed this.
 
I see now. I was wondering why MS didn't go for eDRAM for Xbone like they did with 360, wasn't it a separate chip next to GPU? Might have freed some die space for more ROPs and what not. But I guess this would be a less effective solution than we have now.

Ms claims they couldn't fit that much memory of edram on a single die, and going with 2 dies again would mess up their design (This time around they needed the whole GPU to access the e-ram, instead of just the ROPs like 360. (Which did have lots of bandwidth but it was kinda wasted due the design)
 

z0m3le

Banned
Ms claims they couldn't fit that much memory of edram on a single die, and going with 2 dies again would mess up their design (This time around they needed the whole GPU to access the e-ram, instead of just the ROPs like 360. (Which did have lots of bandwidth but it was kinda wasted due the design)

Wii U's single GPU die has more memory (35MB) on it, almost all of which is edram... so this is a false claim.
 
The on chip bandwidth of the 360's 10MB EDRAM is 256GB/s. The ESRAM in the has a typical bandwidth of only 109GB/s with some potential for more than that in ideal read/write operations.

That's not due the ram per se, but rather the bus in which the devices are connected to the ram.

On 360, they had very simple ROPs with no compression and the high BW was simple as fast as it could possible read/write, in the worst case scenario (the heaviest data format it could write with 4XMSAA).

On xbone, the same rule also applies, the esram provides exactly as much memory as the gpu bus can possibly use, but in this case you have the entire gpu addressing this ram, not just a few fixed operation units which can make the bus way more wide.
 
Wii U's single GPU die has more memory (35MB) on it, almost all of which is edram... so this is a false claim.

Xbone GPU is done on a smaller node process than Wiiu's gpu (which was an issue even on 360, the main die shrank better than the edram die), so they either moved the entire gpu to a bigger node which would made it way larger, or switched to esram which was available on the desired manufacturing process.

And actually, xbone soc has more die memory. They have 48mb of esram in total. 32 are for the "special" buffer, but there are lots of caches there too.
 
No, for the purposes of xbone it isn't. You got it in reverse - they used SRAM because they could not get EDRAM, which is the obvious choice for their purposes. They ended up with a huge die for no good reason but time-to-market.

Yes, eDRAM could have allowed them to add far more on chip memory at a much higher bandwidth for the same die size. That's the whole reason companies developed eDRAM instead of using just using more SRAM.

It is known.
 

twobear

sputum-flecked apoplexy
So the issue to do with fabrication is just a red herring? I'd thought that at least part of the reason they went with eSRAM is so that they were free to shop around different foundries to produce the chip rather than just be limited to the ones that can produce eDRAM.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
So the issue to do with fabrication is just a red herring? I'd thought that at least part of the reason they went with eSRAM is so that they were free to shop around different foundries to produce the chip rather than just be limited to the ones that can produce eDRAM.
That is a good part of the issue. HTupolev actually addressed it in an earlier post - SRAM can be made on the exact same process as the rest of the chip. Not so with eDRAM, which might require a different process (read mupltiple litho masks etc), and is the prime reason eDRAM in xb360 was on a separate die from Xenos. So in order for MS to get their high-BW pool as eDRAM and also keep it on the same die with the GPU, they had to make serious adaptation of the eDRAM macros to the factory's 28nm. They chose the faster-to-market but less efficient WRT transistors/area option - eSRAM.
 

z0m3le

Banned
Xbone GPU is done on a smaller node process than Wiiu's gpu (which was an issue even on 360, the main die shrank better than the edram die), so they either moved the entire gpu to a bigger node which would made it way larger, or switched to esram which was available on the desired manufacturing process.

And actually, xbone soc has more die memory. They have 48mb of esram in total. 32 are for the "special" buffer, but there are lots of caches there too.

47MB it seems, I was unaware of this, thanks for the correction:
http://www.extremetech.com/gaming/1...ered-reveals-sram-as-the-reason-for-small-gpu

Also in the die shot I posted earlier, the main 32MB of sram that the GPU has full access to. The 14MBs of this 47MB ram is unknown as 4MBs come from the CPU and assumed 10MB is the smaller chip with the rest being L2 cache for the GPU, this smaller 10MB might be a mirror cache for the CPU and GPU to share data, which is how it would have to do GPGPU calculations? also the 32MB is split into 4 separate blocks of 8MBs... if that is addressed separately, I can see why it would be a hassle, managing so many different areas with slightly different latencies could cause a headache. This is not the way you make a memory system simple and user friendly.
 

LordOfChaos

Member
I can see why it would be a hassle, managing so many different areas with slightly different latencies could cause a headache. This is not the way you make a memory system simple and user friendly.

Nothing has suggested the other 10MB is dev accessible, it possibly just reduces other memory accesses invisibly as a victim buffer, or has another purpose. Everything I've seen has indicated 32MB was under dev control.

That complication is also what that Wii U subsystem you were just praising does a lot of, with a 32MB chunk, a 2MB chunk with different latencies, and a further 1MB chunk with different latencies.

My perspective though is that Wii U for it's performance has the best memory set up, and is similar to Intel's "Iris" configuration, with the difference being that the CPU is not on the same die (though again it can still read/write to this memory which is hugely important for some special tasks like gpgpu)


There's a huge distinction between Iris Pro and the consoles (actually two), the first being that the Iris has an automatically managed cache that greatly reduces main memory requirements without dev interaction. This is great for automatically increasing the performance in all games, but the consoles don't do this, presumably for performance consistency reasons, giving control over it to the devs with no automatic caching.

The second difference being that the Iris Pro Crystalwell memory does not actually store framebuffers at all, interestingly. Rather it's focused on caching assets before they'll be needed.

Intel said you would need a 170GB/s GPU memory bandwidth to equal what it has with the eDRAM+DDR3.

Also, like you said the CPU can use it, but it's used as a victim buffer for anything that spills out of L3, again automatically, and again great for increasing performance of everything, but the consoles (the Wii U alone in this case) again want consistency and dev control so it's manual if you want to use it as an in-between scratchpad.

The slower Wii U ram is fast enough for assets, you can't use the main memory for GPU operations to speed up performance, which is the main reason you'd want embedded memory on the chip rather than across a bandwidth limited bridge anyways. The latency is too high to help with individual frames, however you can brute force past this, which is what PS4 does with over twice XBone's bandwidth (on the main memory) However if programmed without care, it can lead to frame drops that XBone's ESram will avoid by feeding the GPU. (Since PS4's memory has much more latency than embedded on die memory of either other console)

Not sure where you're getting this. GPUs are by design highly latency-insensitive. And frames are 16-33 milliseconds, while memory speeds are measured in single to low double digit nanoseconds, even with GDDR5. You could absolutely find assets through main memory in time for a frame.
 
Top Bottom