The ESRAM is in 8MB blocks right? Not sure why they would need to double or quadruple it? That (128MB ESRAM) was not what I was insinuating at all ..
I figure 8MB more ESRAM would ~30mm2 at most.
Oh wow .. first the "Poopstation" comment and this. Some people are seriously trying hard to derail this thread for obvious reasons.
I think Microsoft may have picked the perfect amount for eSRAM. It was really interesting when even Intel said they found that 32MB of EDRAM seemed very optimal, even though they went much further than that.
Yea, we won't focus on what people choose to refer to the system as anymore, since it would likely derail one of the more civil threads on this kind thing so far.
This assumes that the computing elements are being fed entirely by ESRAM, which means you're using the ESRAM as a fixed buffer with zero work around. Meanwhile the entire memory architecture of the XB1 is supposed to allow for simultaneous data flow from the ESRAM and the DDR3 main RAM to the computing elements.
Basically, you'd significantly cripple total system bandwidth to protect against a missed cache that costs you a few milliseconds of compute time, when all the operations that are likely to cause such a missed cache are things where you can live with the drop on the consumer side (because it's only really going to happen when the consumer just suddenly changed apps/function, so a brief hick-up/load interval won't seem out of place).
No one will do that, and that is only even a factor on non-gaming related operations. Anything gaming related will do just fine with GDDR5 latency levels.
The ESRAM could make for an incredibly snappy OS, but since MS wants to simultaneously run the game environment and the full OS I'd imagine they won't let the OS touch the ESRAM in most situations. Its a little too important to game performance to let the OS "own" any of it.
It doesn't necessarily have to be something that devs do for every piece of data, but just certain pieces of data that they feel would benefit greatly from being resident inside eSRAM. The process of moving data from DDR3 to eSRAM, be it with a shader or one of the move engines, no doubt happens a lot faster than maybe people are appreciating. And in this particular case, the eSRAM being a rather small 32MB, may largely work out to the system's benefit when you start thinking about passing very small pieces of data over to it using memory pathways that are, by any stroke of the imagination, more than up to the task. There will never be too much data being passed at any one time, or even consistently enough, that there would be enough time wasted to cripple overall system bandwidth, unless you were just purposely being careless and ignoring how much data is being sent there. Accounting for the fact that some things will simply remain in eSRAM at all times, possibly never leaving, that leaves you with even less than 32MB worth of data to think about once data copying is considered. Whether it be through the move engines or a shader, you're dealing with a bare minimum of 25.6GB/s worth of memory bandwidth for a single move engine transferring data to a reasonably small 32MB pool of memory. And if you are only ever copying less than 32MB of data at any one time, then 25.6GB/s of memory bandwidth starts to look quite massive for such a small task. Even if you had all 4 move engines working together, that would still give you a very favorable looking 6.4GB/s per move engine for copying to small 32MB pool. And things look just as favorable working in the reverse, if you're moving something from ESRAM to DDR3.
The move engines only require a smaller portion of the system's memory bandwidth, and they can work simultaneously with GPU computation.
[quoite]when all the operations that are likely to cause such a missed cache are things where you can live with the drop on the consumer side (because it's only really going to happen when the consumer just suddenly changed apps/function, so a brief hick-up/load interval won't seem out of place).[/quote]
Cache misses are definitely a risk in graphics operations outside of the scenario you described, but even with the explicit desire to avoid them, it may be a positive development if and when a cache miss does occur, it doesn't cost you nearly as much if you're dealing with a low latency on chip memory. It could really help to greatly increase the utilization of the GPU's ALUs in situations when they're waiting on data to be accessed from memory in order to keep them busy. There's scenarios where eSRAM looks like a smart move for MS, and I doubt they didn't consider these.