• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

About the Xbone's ESRAM size ..

Drek

Member
Well isn't it that CU cores mostly wait for data? Assuming this is true and the data to be read lies in the ESRAM and not the main RAM you have barely idle cycles especially since the move engines can independently move data from the DDR RAM into the ESRAM in parallel while processing this data in the same time.(compression/decompression of textures, tiling, untiling - which are normally also GPU tasks)

This assumes that the computing elements are being fed entirely by ESRAM, which means you're using the ESRAM as a fixed buffer with zero work around. Meanwhile the entire memory architecture of the XB1 is supposed to allow for simultaneous data flow from the ESRAM and the DDR3 main RAM to the computing elements.

Basically, you'd significantly cripple total system bandwidth to protect against a missed cache that costs you a few milliseconds of compute time, when all the operations that are likely to cause such a missed cache are things where you can live with the drop on the consumer side (because it's only really going to happen when the consumer just suddenly changed apps/function, so a brief hick-up/load interval won't seem out of place).

No one will do that, and that is only even a factor on non-gaming related operations. Anything gaming related will do just fine with GDDR5 latency levels.

The ESRAM could make for an incredibly snappy OS, but since MS wants to simultaneously run the game environment and the full OS I'd imagine they won't let the OS touch the ESRAM in most situations. Its a little too important to game performance to let the OS "own" any of it.
 
The ESRAM is in 8MB blocks right? Not sure why they would need to double or quadruple it? That (128MB ESRAM) was not what I was insinuating at all ..

I figure 8MB more ESRAM would ~30mm2 at most.

Oh wow .. first the "Poopstation" comment and this. Some people are seriously trying hard to derail this thread for obvious reasons.

I think Microsoft may have picked the perfect amount for eSRAM. It was really interesting when even Intel said they found that 32MB of EDRAM seemed very optimal, even though they went much further than that.

Yea, we won't focus on what people choose to refer to the system as anymore, since it would likely derail one of the more civil threads on this kind thing so far.

This assumes that the computing elements are being fed entirely by ESRAM, which means you're using the ESRAM as a fixed buffer with zero work around. Meanwhile the entire memory architecture of the XB1 is supposed to allow for simultaneous data flow from the ESRAM and the DDR3 main RAM to the computing elements.

Basically, you'd significantly cripple total system bandwidth to protect against a missed cache that costs you a few milliseconds of compute time, when all the operations that are likely to cause such a missed cache are things where you can live with the drop on the consumer side (because it's only really going to happen when the consumer just suddenly changed apps/function, so a brief hick-up/load interval won't seem out of place).

No one will do that, and that is only even a factor on non-gaming related operations. Anything gaming related will do just fine with GDDR5 latency levels.

The ESRAM could make for an incredibly snappy OS, but since MS wants to simultaneously run the game environment and the full OS I'd imagine they won't let the OS touch the ESRAM in most situations. Its a little too important to game performance to let the OS "own" any of it.

It doesn't necessarily have to be something that devs do for every piece of data, but just certain pieces of data that they feel would benefit greatly from being resident inside eSRAM. The process of moving data from DDR3 to eSRAM, be it with a shader or one of the move engines, no doubt happens a lot faster than maybe people are appreciating. And in this particular case, the eSRAM being a rather small 32MB, may largely work out to the system's benefit when you start thinking about passing very small pieces of data over to it using memory pathways that are, by any stroke of the imagination, more than up to the task. There will never be too much data being passed at any one time, or even consistently enough, that there would be enough time wasted to cripple overall system bandwidth, unless you were just purposely being careless and ignoring how much data is being sent there. Accounting for the fact that some things will simply remain in eSRAM at all times, possibly never leaving, that leaves you with even less than 32MB worth of data to think about once data copying is considered. Whether it be through the move engines or a shader, you're dealing with a bare minimum of 25.6GB/s worth of memory bandwidth for a single move engine transferring data to a reasonably small 32MB pool of memory. And if you are only ever copying less than 32MB of data at any one time, then 25.6GB/s of memory bandwidth starts to look quite massive for such a small task. Even if you had all 4 move engines working together, that would still give you a very favorable looking 6.4GB/s per move engine for copying to small 32MB pool. And things look just as favorable working in the reverse, if you're moving something from ESRAM to DDR3.

The move engines only require a smaller portion of the system's memory bandwidth, and they can work simultaneously with GPU computation.

[quoite]when all the operations that are likely to cause such a missed cache are things where you can live with the drop on the consumer side (because it's only really going to happen when the consumer just suddenly changed apps/function, so a brief hick-up/load interval won't seem out of place).[/quote]

Cache misses are definitely a risk in graphics operations outside of the scenario you described, but even with the explicit desire to avoid them, it may be a positive development if and when a cache miss does occur, it doesn't cost you nearly as much if you're dealing with a low latency on chip memory. It could really help to greatly increase the utilization of the GPU's ALUs in situations when they're waiting on data to be accessed from memory in order to keep them busy. There's scenarios where eSRAM looks like a smart move for MS, and I doubt they didn't consider these.
 
"Poopstation" is terrible because you need to change letters in the name to arrive at it. The beauty of Xbone is that it's a nickname Microsoft served to us on a silver platter wrapped in a golden bow.

XBONE is to Xbox One as XB360 was to Xbox 360... and XB360 was/is used everywhere.

Giftwrapped.
 
I view Xbone in the same vein as M$. Just seems a bit juvenile but whatever.



Doesn't the frame buffer take up the vast majority of the esram space?

You dont have to render the framebuffer in Esram.
Its just too soon to tell without devs showing benchmarks and esram behavior characteristics.

For all i know they do render the render targets into esram.
Just to say that the 68GB/s of the ddr3 is a bit below what an 7750 & 7770 has as it memory speed(72GB/s) those are the chips people used to compare the X1 with.
 

astraycat

Member
But presumably still quite a bit less than if that ESRAM were a pool of GDDR5 when a cache miss occurred, right?

Presumably, but the more I look into AMD memory latencies the more I begin to think that the latency is not mainly due to the memory controllers and just an intentional design decision -- they chose to spend their transistors maximizing some other part of the card instead of trying to lower latency.
 
Presumably, but the more I look into AMD memory latencies the more I begin to think that the latency is not mainly due to the memory controllers and just an intentional design decision -- they chose to spend their transistors maximizing some other part of the card instead of trying to lower latency.

Upping bandwidth is probably a shitload cheaper to accomplish then to reduce latency.
And graphics in general is more bandwidth constrained and because of all the jobs being worked on you can actually hide latency/access times pretty well. If this shader doesn't have its data lets go to the shader that does.
 

ekim

Member
This assumes that the computing elements are being fed entirely by ESRAM, which means you're using the ESRAM as a fixed buffer with zero work around. Meanwhile the entire memory architecture of the XB1 is supposed to allow for simultaneous data flow from the ESRAM and the DDR3 main RAM to the computing elements.

Basically, you'd significantly cripple total system bandwidth to protect against a missed cache that costs you a few milliseconds of compute time, when all the operations that are likely to cause such a missed cache are things where you can live with the drop on the consumer side (because it's only really going to happen when the consumer just suddenly changed apps/function, so a brief hick-up/load interval won't seem out of place).

No one will do that, and that is only even a factor on non-gaming related operations. Anything gaming related will do just fine with GDDR5 latency levels.

The ESRAM could make for an incredibly snappy OS, but since MS wants to simultaneously run the game environment and the full OS I'd imagine they won't let the OS touch the ESRAM in most situations. Its a little too important to game performance to let the OS "own" any of it.

Hm? Why do only sudden changes cause a cache miss? I was always under the impression that you have to deal with those nearly every time when rendering an image (texturing geometry, applying shader...)
But as I said - I'm not an expert on those things. So I'm genuinely asking. :)
 

BWJinxing

Member
My two cents:

ASSUMING, if you factor that both consoles have at-least 75% bandwidth efficiency in the real world (it would be a real shame of both designs were less than 75% efficient):

these are rough numbers, the mean nothing but are to give an idea (nothing accounts for the move engines and other tricks):

PS4 = 176*.75= 132GB/sec

Xbox 1= 204*.75 = 100* - 153GB/sec (roughly) (Pre-clock boost, mentions atleast at least 109GB/sec )

Since esram is roughly double that of DRR, 153*.66 = 100.98 GB \ sec (Assuming its ful duplex, else half if half-duplex) and DDR3 = 50.49 GB\sec. Then the move engines have to saturate the ESRAM, thers is penalty for bandwidth there as surely its not 100% efficient (20 something GB x4 so already thats a max of 80GB ish)
 
It's only offensive or annoying if you let it be. Seriously, it's just "Xbone" or just "The 'bone" by this point. Unless the poster is explicitly making a boner comparison or some such, there's no harm in it in and of itself.
Well it makes me take you less seriously, it makes whoever says it sounds like a fanboy

Is the setup similar to the 360's, just scaled up? Honest question,I have no idea.
 

Vizzeh

Banned
From my reading on the GDDR5 latency, obviously they went with bandwidth, but the amd CPU's are tolerances are withing the GDDR5 times? - or possibly its more so that cache miss is infrequent and the data is spread accross a multicore cpu, putting less pressure on latency? Sony seem to have most other bases covered, I wonder if they thought they had an issue, would they just include flash cache.

Cerny as quoted below addressed GPU latency, so hes not ignorant towards it with it not being an issue, but completely side stepped mentioning CPU, so either the latency is within tolerance or they have customised the CPU cache/flash cache ?

Mark Cerny "Latency in GDDR5 isn’t particularly higher than the latency in DDR3. On the GPU side… Of course, GPUs are designed to be extraordinarily latency tolerant so I can’t imagine that being much of a factor."
 
Everything Sony said can be proved mathematically. So it's not just believing.


Mathematics to prove a theoretical peak, not practical bandwidth. We won't know what's the real world number until developers get a crack at it. I think Anandtech had something on desktop GPUs where they estimated the real world bandwidth.
 

Klocker

Member
"Poopstation" is terrible because you need to change letters in the name to arrive at it. The beauty of Xbone is that it's a nickname Microsoft served to us on a silver platter wrapped in a golden bow.

XBONE is to Xbox One as XB360 was to Xbox 360... and XB360 was/is used everywhere.

Giftwrapped.

technically I always said X360

so XONE is easier and more correct... politically and literally. ;)
 
Presumably, but the more I look into AMD memory latencies the more I begin to think that the latency is not mainly due to the memory controllers and just an intentional design decision -- they chose to spend their transistors maximizing some other part of the card instead of trying to lower latency.

Very good point, and I've noticed this myself from some reading.
 
I kinda-sorta thought that these X1 tech threads were going to stop until a few weeks from now when we get a more detailed look at it when MS and AMD start revealing things post-NDA expiration? Seems like it's the same old ground being tread over and over and over again.
 
so XONE is easier and more correct... politically and literally. ;)
Even still, that reads as ex-own, or ex-won, or ex + unnatural pause + one, neither of which has the sublime phonetic satisfaction to be found in the word xbone. Its the cellar door of our time.
 

twobear

sputum-flecked apoplexy
Don't say "they managed this" and then post a fucking bullshot which looks 10times better than the actual game and doesn't factor in the bad framerate. From a technical standpoint, Last of Us is a severly flawed game. And it's quite bad for 2013 standards. If anything your pointing out quite the opposite of what you intended - consoles are too weak and it shows, making devs compromissing the quality of their titles on technical AND gameplay layers.

Calm down, jesus. My point was that they managed that with 256MB of VRAM. No matter how you cut it that's impressive.

[edit] xbone rolls off the tongue and keyboard. xbone it is.
 

artist

Banned
I think Microsoft may have picked the perfect amount for eSRAM. It was really interesting when even Intel said they found that 32MB of EDRAM seemed very optimal, even though they went much further than that.
I wouldnt say perfect - reasoning behind this is future-proofing the design. Anand notes that it's more than enough for current workloads and there isnt much room with 32MB.

If there is a gradual change in workloads of the future, then the devs would have to optimize more and more. The split-pool already requires some juggling around, more potential headaches in the future are something that 3rd party devs may not look forward to.
 
I wouldnt say perfect - reasoning behind this is future-proofing the design. Anand notes that it's more than enough for current workloads and there isnt much room with 32MB.

If there is a gradual change in workloads of the future, then the devs would have to optimize more and more. The split-pool already requires some juggling around, more potential headaches in the future are something that 3rd party devs may not look forward to.

Well, virtual texturing seems like it's going to be a much bigger part of game engines this gen, and the Xbox One looks pretty purpose built with that in mind. Deferred rendering will likely continue to be popular, and eSRAM likely benefits that the same way eDRAM on the 360 did.

And Microsoft has a track record of providing pretty solid to excellent developer support, so a lot of the puzzles that devs may have to solve may largely be things that Microsoft are already helping to make as stress free as possible. They did it with the Xbox 360 and edram, and I expect them to do the same with the Xbox One, which should definitely be much easier to develop for compared to the Xbox 360. There was that semi-accurate article, where it sounded like Microsoft's SDK automatically seems to put specific things into ESRAM for the developer, probably because they're best suited to being in there, but they also leave open the option if a dev decides they want to do something else.

Check this part here.

http://semiaccurate.com/2013/08/30/a-deep-dive-in-to-microsofts-xbox-one-gpu-and-on-die-memory/

This 32MB of embedded memory is not directly contiguous with the main DRAM but the MMUs have the ability to make it appear so in a transparent manner to the programmer. While it is multi-purpose and Microsoft said it was not restricted in any specific manner, there are some tasks like D3D surface creation that default to it. If a coder wants to do something different they are fully able to, why you would want to however is a different question entirely.

I'm extremely iffy about using anything from semi-accurate, since they literally don't appear to look over what they wrote sometimes, leading to glaring inaccuracies, but I think this bit might be safe to quote.
 

artist

Banned
Well, virtual texturing seems like it's going to be a much bigger part of game engines this gen, and the Xbox One looks pretty purpose built with that in mind. Deferred rendering will likely continue to be popular, and eSRAM likely benefits that the same way eDRAM on the 360 did.

And Microsoft has a track record of providing pretty solid to excellent developer support, so a lot of the puzzles that devs may have to solve may largely be things that Microsoft are already helping to make as stress free as possible. They did it with the Xbox 360 and edram, and I expect them to do the same with the Xbox One, which should definitely be much easier to develop for compared to the Xbox 360. There was that semi-accurate article, where it sounded like Microsoft's SDK automatically seems to put specific things into ESRAM for the developer, probably because they're best suited to being in there, but they also leave open the option if a dev decides they want to do something else.

Check this part here.

http://semiaccurate.com/2013/08/30/a-deep-dive-in-to-microsofts-xbox-one-gpu-and-on-die-memory/



I'm extremely iffy about using anything from semi-accurate, since they literally don't appear to look over what they wrote sometimes, leading to glaring inaccuracies, but I think this bit might be safe to quote.
I'm not doubting the devrel from MS for the Xbone but the example you provided (feature defaulting to ESRAM) is hardly indicative of there being no issues in case the ESRAM size begins to show it's age.

http://beyond3d.com/showpost.php?p=1782653&postcount=6243

Thought this was interesting. He worked on the system's audio block.
That doesnt preclude the possibility that they had decided on DDR3 by then. (Size of RAM pool was not the reason for ESRAM, type of RAM pool(?))

I guess it would be more interesting if we found out what kind of ESRAM sizes were considered, atleast internally.
 

TheD

The Detective
It does not matter that Intel thought that 32MB was enough for today because they are talking about having it with their GPU that is much slower than the one in the xb1 and even then they went with 128MB!
 
http://beyond3d.com/showpost.php?p=1782653&postcount=6243

Thought this was interesting. He worked on the system's audio block.

total bullshit. without the ESRAM, DDR3 would be so slow as to be basically unusable next to the competition. They would have gotten blown out of the water by sony even at 4 gigs.

We also know that the long term plan for the Xbone was TV integration and kinect functionality from day 1. Those NEED more than 4 gigs due to the massive amounts of RAM the OS requires, meaning it was definitely DDR3, not GDDR5 (which would have made ESRAM unnecessary). maybe it wasn't always 8, maybe it could have been 6...but there was NO WAY MS was going with 4.
 

The Flash

Banned
I hope Albert can get one MS's Technical Fellows to do an AMA. I don't understand most of this kind of stuff but it's fascinating to watch the back and forth discussion in a weird way. Hopefully the mystery behind the power of the XBO will be revealed as well. Anyways, carry on Tech savvy Gaf!
 

Respawn

Banned
Mathematics to prove a theoretical peak, not practical bandwidth. We won't know what's the real world number until developers get a crack at it. I think Anandtech had something on desktop GPUs where they estimated the real world bandwidth.

Well they have been quite vocal about getting the crack at it.
 

RoboPlato

I'd be in the dick
Mathematics to prove a theoretical peak, not practical bandwidth. We won't know what's the real world number until developers get a crack at it. I think Anandtech had something on desktop GPUs where they estimated the real world bandwidth.

Didn't DF say that the XBO practical bandwidth is around 133GB/s? I also remember an indie dev saying they were hitting 172GB/s on PS4, which is pretty close to the theoretical max. Unfortunately that thread got locked since the title was wrong and I can't seem to find it now.
 

artist

Banned
It does not matter that Intel thought that 32MB was enough for today because they are talking about having it with their GPU that is much slower than the one in the xb1 and even then they went with 128MB!
Pretty close in terms of fillrate :p

Wonder what kind of workloads Intel saw in the future that they doubled it twice (32 -> 64 -> 128). And they are not the kind of people to just "throw away" transistors at the problem.
 
I'm not doubting the devrel from MS for the Xbone but the example you provided (feature defaulting to ESRAM) is hardly indicative of there being no issues in case the ESRAM size begins to show it's age.


That doesnt preclude the possibility that they had decided on DDR3 by then. (Size of RAM pool was not the reason for ESRAM, type of RAM pool(?))

I guess it would be more interesting if we found out what kind of ESRAM sizes were considered, atleast internally.

Yea, learning what sizes they maybe considered would be interesting. Yea, perhaps ram type was a consideration, but I also doubt that for one reason. Microsoft showcased quite clearly with the Xbox 360 that they had no issue still going for a capable embedded memory solution even when they had 512MB of GDDR3, so it wouldn't surprise me at all if they were also considering eSRAM + GDDR5, however brief that may have been.

With regards to the size of the eSRAM showing its age, I think Microsoft effectively eliminated, or at least greatly staved off any such possibility when they introduced things like tiled resources. Virtual texturing is going to be a very big deal this upcoming gen, if not one of the most important trends in upcoming games, and Microsoft really did seem to go out of their way to ensure the Xbox One is very well equipped to handle it.

The decompressing hardware on one of the Move Engines seems practically tailor made for helping to improve the handling of virtual texturing. Beyond just that decompression hardware, the Move Engines themselves appear to obviously be a crucial part of protecting or ensuring that the ESRAM continues to prove invaluable for developers in a variety of ways over the life of the system.

It does not matter that Intel thought that 32MB was enough for today because they are talking about having it with their GPU that is much slower than the one in the xb1 and even then they went with 128MB!

But I think Microsoft are giving developers more freedom to determine how, and on what terms they take advantage of the 32MB of ESRAM, which more than likely is a good way to make sure it continues to be useful to devs over the lifetime of the system.

total bullshit. without the ESRAM, DDR3 would be so slow as to be basically unusable next to the competition. They would have gotten blown out of the water by sony even at 4 gigs.

We also know that the long term plan for the Xbone was TV integration and kinect functionality from day 1. Those NEED more than 4 gigs due to the massive amounts of RAM the OS requires, meaning it was definitely DDR3, not GDDR5 (which would have made ESRAM unnecessary). maybe it wasn't always 8, maybe it could have been 6...but there was NO WAY MS was going with 4.

Well, the guy worked on the console personally, so I don't think he's wrong. Microsoft after the Xbox 360 more than likely always intended on pursuing yet another fast embedded memory solution, because it worked out quite well for them on the 360. Just because the ESRAM may be a very crucial cog in the overall memory bandwidth of the system, it in no way automatically suggests that its inclusion wouldn't still have happened even if Microsoft opted to go with GDDR5 memory, because they've already shown with the Xbox 360 that they were more than willing to go with embedded memory while using, at the time, the functional equivalent to GDDR5, which at that time was GDDR3. But, as you say, Kinect and TV integration was a major part of the plan, so it's entirely possible that this course of action meant they needed to find a way to keep the cost of the box down, which likely meant cheaper DDR3 memory, and a more easily manufactured and managed pool of ESRAM, but this in itself is still very little proof that embedded memory in some form wasn't always in the cards, regardless of Microsoft's Kinect and overall entertainment and multi-tasking ambitions.
 

lord pie

Member
My understanding is the 128MB memory in haswell is a system managed general cache - whereas the 32MB in xb1 is a user managed scratch pad.

If this is the case, the implication of this is subtle but important. For example, if you wanted to use the 32MB as a temporary location for render target data (pixel storage, effectively) then you are going to potentially run out of space - just like the 10MB edram buffer in the 360 limited resolution.

Hypothetical examples for a 1920x1080 frame buffer:

forward rendered FP16 HDR with 2x MSAA:
(8b colour + 4b depth/stencil) x 2 x 1920 x 1080 = 47MB

typical (eg frostbite) deferred renderer g-buffer, 4 MRTs each at 32bpp (no MSAA):
(4b mrt x 4 + 4b depth/stencil) x 1920 x 1080 = 39MB


This doesn't necessarily mean these cases are impossible - you could render the scene in tiles or leave some buffers in DDR - but it does add a significant layer of complexity (it won't 'just work' efficiently and automatically like the haswell cache).

The other concern I have is that it doesn't mitigate the need to copy data in/out of ESRAM - which still will be limited by DDR bandwidth. So using ESRAM will only make sense in cases where you are reading/writing the memory a large number of times within the frame - *and* those reads are often missing the on-chip caches (which in a well designed renderer isn't as common as you'd think).
 

aronmayo2

Banned
"Poopstation" is terrible because you need to change letters in the name to arrive at it. The beauty of Xbone is that it's a nickname Microsoft served to us on a silver platter wrapped in a golden bow.

XBONE is to Xbox One as XB360 was to Xbox 360... and XB360 was/is used everywhere.

Giftwrapped.

Except nobody in the history of the internet calls the 360 the XB360 :p So no.
 

Melchiah

Member
I remember reading something about the xbox360 edram how it was only 10mb because they thought developers weren't going to use deferred renderers. And if they went with 12mb it would have made a big difference? And that 10mb was only good for free aa so the xbox one's 32mb esram although it doesn't have as high a bandwidth as edram, will be more than enough in this coming gen.

IIRC, someone over here said Killzone Shadow Fall uses +40mb for framebuffer, and that 32mb wouldn't be enough for 1080p deferred rendering. If that's true, 32mb doesn't sound much. I'm no expert in these matters though.
 
My understanding is the 128MB memory in haswell is a system managed general cache - whereas the 32MB in xb1 is a user managed scratch pad.

If this is the case, the implication of this is subtle but important. For example, if you wanted to use the 32MB as a temporary location for render target data (pixel storage, effectively) then you are going to potentially run out of space - just like the 10MB edram buffer in the 360 limited resolution.

Hypothetical examples for a 1920x1080 frame buffer:

forward rendered FP16 HDR with 2x MSAA:
(8b colour + 4b depth/stencil) x 2 x 1920 x 1080 = 47MB

typical (eg frostbite) deferred renderer g-buffer, 4 MRTs each at 32bpp (no MSAA):
(4b mrt x 4 + 4b depth/stencil) x 1920 x 1080 = 39MB


This doesn't necessarily mean these cases are impossible - you could render the scene in tiles or leave some buffers in DDR - but it does add a significant layer of complexity (it won't 'just work' efficiently and automatically like the haswell cache).

The other concern I have is that it doesn't mitigate the need to copy data in/out of ESRAM - which still will be limited by DDR bandwidth. So using ESRAM will only make sense in cases where you are reading/writing the memory a large number of times within the frame - *and* those reads are often missing the on-chip caches (which in a well designed renderer isn't as common as you'd think).

This post is a pretty great summation of my concerns with 32mb ESRAM.

One thing that we do know currently... is that Ryse is a combined forward and deferred renderer and is also 1080p. If we knew more about its internals and how it uses ESRAM, we would know quite a bit how games further into the generation will fair.
 

KidBeta

Junior Member
My understanding is the 128MB memory in haswell is a system managed general cache - whereas the 32MB in xb1 is a user managed scratch pad.

If this is the case, the implication of this is subtle but important. For example, if you wanted to use the 32MB as a temporary location for render target data (pixel storage, effectively) then you are going to potentially run out of space - just like the 10MB edram buffer in the 360 limited resolution.

Hypothetical examples for a 1920x1080 frame buffer:

forward rendered FP16 HDR with 2x MSAA:
(8b colour + 4b depth/stencil) x 2 x 1920 x 1080 = 47MB

typical (eg frostbite) deferred renderer g-buffer, 4 MRTs each at 32bpp (no MSAA):
(4b mrt x 4 + 4b depth/stencil) x 1920 x 1080 = 39MB


This doesn't necessarily mean these cases are impossible - you could render the scene in tiles or leave some buffers in DDR - but it does add a significant layer of complexity (it won't 'just work' efficiently and automatically like the haswell cache).

The other concern I have is that it doesn't mitigate the need to copy data in/out of ESRAM - which still will be limited by DDR bandwidth. So using ESRAM will only make sense in cases where you are reading/writing the memory a large number of times within the frame - *and* those reads are often missing the on-chip caches (which in a well designed renderer isn't as common as you'd think).

Its good to note that you do not want to be moving memory from eSRAM to DDR3 often, whilst it may not take up a significant amount of bandwidth, it does take 3x's its size if you want to use it in the pool your writing too.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
The other concern I have is that it doesn't mitigate the need to copy data in/out of ESRAM - which still will be limited by DDR bandwidth. So using ESRAM will only make sense in cases where you are reading/writing the memory a large number of times within the frame - *and* those reads are often missing the on-chip caches (which in a well designed renderer isn't as common as you'd think).

We know that the data move engines will free the GPU from actually performing those copy operations. Nevertheless, these operations will be necessary if the ESRAM is also used as a texture cache. In this case, copy operations will consume bandwidth that would not have to be consumed with a single pool setup. It would be interessting to know if those copy operations can be scheduled such that they only occupy DDR3 bandwidth if no other client is saturating it.
 

Drek

Member
Hm? Why do only sudden changes cause a cache miss? I was always under the impression that you have to deal with those nearly every time when rendering an image (texturing geometry, applying shader...)
But as I said - I'm not an expert on those things. So I'm genuinely asking. :)

Generally well programmed and uninterupted software caches well enough for misses to be minimal and the latency of the ram to have limited impact. This is why no one has any concerns about all your GPUs being GDDR5. In-game functionality is highly predicable.

The most likely event that will cause a full cache dump is user input, more specifically user task changes. This is where latency matters, on the OS side when users can multitask, because for all the OS and app programmers know a user might start up and close five different apps in the span of a minute before settling on what they want to do, and every one of those will involve some kind of significant re-caching.

The latency gap between memory types would need to be massive for it to matter in games, and even then it would need the latency of GDDR5 to be significant enough to be a hindrance at times. This isn't the case on either front.

I do think however that when you don't have a game buffered in the background on each respective OS that the XB1's OS will generally be a whole lot snappier to navigate. Hell, it'll be that even with a game in the background, but when there isn't one it should be the most responsive home OS we've seen to date I'd think.
 

Applecot

Member
Where do you go to learn this stuff? Computer Engineering class? Cuz from what I've seen of comp sci and IT they don't go over hardware in this fashion.

It's a sub level of electrical engineering. Ultimately any computational device is a circuit of some kind that has been fashioned onto a really tiny chip.

The second aspect is the computer science aspect which is basically the architecture and logic aspects.
 

Applecot

Member
This post is a pretty great summation of my concerns with 32mb ESRAM.

One thing that we do know currently... is that Ryse is a combined forward and deferred renderer and is also 1080p. If we knew more about its internals and how it uses ESRAM, we would know quite a bit how games further into the generation will fair.

To be honest I'm not too familiar with how rendering works but I would imagine anything thats polygonal based involves the polygon calculations and rasterisation processes to get the output. Assuming this is the case you could compartmentalise the rendering process to have esRAM deal with the bandwidth sensitive aspects while keeping the DDR3 to do other background tasks which won't require huge simultaneous bandwidth or isn't latency sensitive.
 

artist

Banned
My understanding is the 128MB memory in haswell is a system managed general cache - whereas the 32MB in xb1 is a user managed scratch pad.

If this is the case, the implication of this is subtle but important. For example, if you wanted to use the 32MB as a temporary location for render target data (pixel storage, effectively) then you are going to potentially run out of space - just like the 10MB edram buffer in the 360 limited resolution.

Hypothetical examples for a 1920x1080 frame buffer:

forward rendered FP16 HDR with 2x MSAA:
(8b colour + 4b depth/stencil) x 2 x 1920 x 1080 = 47MB

typical (eg frostbite) deferred renderer g-buffer, 4 MRTs each at 32bpp (no MSAA):
(4b mrt x 4 + 4b depth/stencil) x 1920 x 1080 = 39MB


This doesn't necessarily mean these cases are impossible - you could render the scene in tiles or leave some buffers in DDR - but it does add a significant layer of complexity (it won't 'just work' efficiently and automatically like the haswell cache).

The other concern I have is that it doesn't mitigate the need to copy data in/out of ESRAM - which still will be limited by DDR bandwidth. So using ESRAM will only make sense in cases where you are reading/writing the memory a large number of times within the frame - *and* those reads are often missing the on-chip caches (which in a well designed renderer isn't as common as you'd think).
Interesting, thanks.

Didn't Semi accurate article mentioned that performance of esram was in the range of 140~150 GB/s from their own internal sources?
According to semiaccurate's internal sources, the Xbone CPU was clocked at 1.9 GHz too.
 
Top Bottom