• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

AMD: PlayStation 4 supports hUMA, Xbox One does not

I would say that by AMD's definition the Xbox One is not hUMA. While I think it is probable both CPU and GPU have coherent access to the full DDR3 memory space, the ESRAM is exclusive to the GPU and that is enough to disqualify it. It seems clear the One's design is firmly rooted in GCN 1.0 and AMD's HSA roadmaps have always suggested that extending coherent access to discrete GPUs with their own local memory would be for 2014/2015 products.

The reason this matters is that many have pointed to the potential latency benefits of ESRAM in the Xbox One for GPGPU workloads. Unfortunately, if the CPU cannot easily access data in the ESRAM it prevents those calculations from being able to effectively feedback into something with an impact of gameplay. Think of PhysX where all you get is prettier debris. On the PS4 hUMA creates a greater potential for complex simulations to be accelerated on the GPU AND impact the gameplay.
 
Fair enough.

What is your take on hUMA for Xbox One, guys?

I don't think it'll happen, and if it does, it will not be quite like the PS4. My theory's main reason for this is the Xbox One's eSRAM. Hear me out:

The eSRAM is for the GPU only, right? So assuming MS forces hUMA on the Xbox One, thus forcing the eSRAM to be shared between CPU and the GPU, the situation will get bad when the CPU and GPU are fighting for that 32 MB of "last-level cache." So a fight ensues between a, say, CPU-intensive snapping app and the game on that eSRAM block, so then you get something akin to cache thrashing, and an unresponsive system.

True hUMA is supposed to simplify things, the addition of eSRAM is complicating things. So you get low-latency graphics at the expense of the need to manage the eSRAM at all cost. I presume MS foresaw this and probably decided (or will decide) against it.
 

KidBeta

Junior Member
Standard GCN is 2 ACEs with 1 queue for each ACE (2 ACEs/2 queues)

PS4 will have 8 ACEs with 8 queues for each ACE (8 ACEs/64 queues)

HD8000 or beyond (Kabini, Kyoto, for example) have 4 ACEs with 8 queues each (4 ACEs/32 queues)

This is a very interesting feature since this is not a AMD-only thing. Nvidia calls it "Hyper-Q" and claims that it will greatly improve GPU utilization. Titan for example has 32 queues:





Yep, that's very true.

Well I know the XBONE GPU is 8 Queues for each command processor, so its certainly not GCN1.0 its probably GCN1.1
 

benny_a

extra source of jiggaflops
So in conclusion the the GPGPU capabilities for the Xbox One (eSRAM not fully coherent, not as many command processors and compute queues) are lower than on PS4.

Which in turn means we won't be seeing that being pushed across the board for next-gen.

I'm quite bummed by this, as third-parties most likely won't take advantage of something only the PS4 really benefits from.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
So you get low-latency graphics at the expense of the need to manage the eSRAM at all cost.

The latency won't really matter much in GPU-workflows. The ESRAM is there to provide enough bandwidth for rendering into pixelbuffers while the DDR3-bandwidth is there for texture fetching, either direct or indirect by copying textures into ESRAM first via the DMEs. Latency doesn't really matter here.
 

artist

Banned
So in conclusion the the GPGPU capabilities for the Xbox One (eSRAM not fully coherent, not as many command processors and compute queues) are lower than on PS4.

Which in turn means we won't be seeing that being pushed across the board for next-gen.

I'm quite bummed by this, as third-parties most likely won't take advantage of something only the PS4 really benefits from.
It will still get pushed on PC and PS4.
 

benny_a

extra source of jiggaflops
It will still get pushed on PC and PS4.
But on PC it's not the same kind. I keep being unspecific when I say GPGPU. I mean the particular brand that AMD is lobbying for with hUMA that allows for quick interplay of CPU and GPU without taking the huge latency hit of the copy process.
 

Perkel

Banned
So in conclusion the the GPGPU capabilities for the Xbox One (eSRAM not fully coherent, not as many command processors and compute queues) are lower than on PS4.

Which in turn means we won't be seeing that being pushed across the board for next-gen.

I'm quite bummed by this, as third-parties most likely won't take advantage of something only the PS4 really benefits from.

True and not true.

New effects ? No. Because no one will include new effects based on GPGPU if only one console has power to do it.
Bigger scale or more precise ? Yes. Because scaling scale or precision is easy.


BTW how does GPGPU stuff compare to PC market ?. I don't know much about how things are done on GPGPU and what can be done (beside physic stuff or audio raytracing) but from looks PS4 does have a lot of GPGPU power.
 

BigDug13

Member
But on PC it's not the same kind. I keep being unspecific when I say GPGPU. I mean the particular brand that AMD is lobbying for with hUMA that allows for quick interplay of CPU and GPU without taking the huge latency hit of the copy process.

And it seems like since most multiplats hit PC as well, none of them will really take advantage of the GPGPU capabilities. Should make for some interesting uses on first party titles I guess.
 

mrklaw

MrArseFace
I don't think esram matters that much. The developer would simply allocate ram in main memory and if that can be treated as fully coherent then that is enough. They don't need the entire system to be compliant.

And both systems are bandwidth limited with this anyway. Xbox has 30Gb/s and Ps4 has 10GB/s for the GPU alongside the 20Gb/s for the CPU

So both seem pretty evenly matched regarding this particular area.
 
Hotchips 25 - what we've learned:

Two compute command processors (ACEs) and most likely two compute queues for Xbox One. PS4 has eight ACEs and 64 compute queues. That means that Xbox One will probably suck at at GPGPU compared to PS4 and will not be able to do asynchronous fine-grain compute (GPGPU without penalty for rendering) efficiently.

That's not necessarily true. Multiple queue engines are used so you have many threads on flying around so you can switch between then to hide memory latency. With esram that's potentially a much lesser problem on xbone than it is on PC or Ps4.
 
The latency won't really matter much in GPU-workflows. The ESRAM is there to provide enough bandwidth for rendering into pixelbuffers while the DDR3-bandwidth is there for texture fetching, either direct or indirect by copying textures into ESRAM first via the DMEs. Latency doesn't really matter here.

Gpus are good to hide latency with common graphical tasks, that follows predicable and linear memory access patterns. Common graphical tasks also have a predicate execution flow, for example where you read some data from a texture and multiply the read value with a bunch of normals or something like that..

Throw in more complex shaders and gpgpu computations, which break the memory access and execution flow and latency on the gpu becomes a real issue. And that's is pretty noticeable from the performance impact that even high end pc parts have.

That's why Ps4 has so many more compute threads on the fly that even the highest end card current available.

Edit: I'm not sure if the results are directly applied to gpgpu in gaming, but if you look for academic papers for gpgpu researches, it's pretty common to find out that optimizing memory access patterns for the gpu can yield extremely high performance gains, much more so than bruteforcing the problem by adding more processing power. So having a low latency memory for that kind of computations, might actually help achieve better performance faster.
 

chaosblade

Unconfirmed Member
8000 series is what AMD uses for their latest 28nm APUs. These GPUs have improved compute capabilities compared to the 7000 series.

How recent is this? Because all of the desktop 8000 parts are just rebadges for OEMs. What I saw is that Kaveri is going to use Volcanic Islands-based GPUs, and that would be 9000 series now.
 

astraycat

Member
Gpus are good to hide latency with common graphical tasks, that follows predicable and linear memory access patterns. Common graphical tasks also have a predicate execution flow, for example where you read some data from a texture and multiply the read value with a bunch of normals or something like that..

Throw in more complex shaders and gpgpu computations, which break the memory access and execution flow and latency on the gpu becomes a real issue. And that's is pretty noticeable from the performance impact that even high end pc parts have.

That's why Ps4 has so many more compute threads on the fly that even the highest end card current available.

Edit: I'm not sure if the results are directly applied to gpgpu in gaming, but if you look for academic papers for gpgpu researches, it's pretty common to find out that optimizing memory access patterns for the gpu can yield extremely high performance gains, much more so than bruteforcing the problem by adding more processing power. So having a low latency memory for that kind of computations, might actually help achieve better performance faster.

We've had the ESRAM latency song and dance before, but no one has ever really mentioned plausible numbers on it. AMD cards tend to have pretty high latency costs everywhere -- according to the GCN presentation at GDC Europe latency to L1 was still on average 20x that of latency to LDS memory, and I doubt that ESRAM is closer than L1 and L2 in the memory hierarchy.
 

artist

Banned
That's not necessarily true. Multiple queue engines are used so you have many threads on flying around so you can switch between then to hide memory latency. With esram that's potentially a much lesser problem on xbone than it is on PC or Ps4.
tumblr_lyommnW8Wg1r63wvro1_400.gif
 

IN&OUT

Banned
German tech site Planet3DNow (specialized in AMD hardware) comes to the conclusion:

Xbox One does not have hUMA

Mark Diana and Heise were right. ESRAM is not coherent and Xbox One basically looks like an APU on Llano/Trinity level. They're not saying what this means for performance, but they say that coding for Xbox One will be a bigger effort than coding for a more advanced HSA stage.



Your confusing HD8000 with HD8000M. The former is what AMD uses for 28nm APUs (Kabini, Kyoto, etc), the latter is a rebrand for discrete notebook GPUs.

AMD already said that but people reacted as usual.
Well wasn't it kinda obvious from the beginning that X1 doesn't support hUMA?

hUMA is based on the concept of coherent interaction between CPU/GPU and the RAM. X1 broke this coherency by adding ESRAM that only communicate with the GPU.
 
German tech site Planet3DNow (specialized in AMD hardware) comes to the conclusion:

Xbox One does not have hUMA

Mark Diana and Heise were right. ESRAM is not coherent and Xbox One basically looks like an APU on Llano/Trinity level. They're not saying what this means for performance, but they say that coding for Xbox One will be a bigger effort than coding for a more advanced HSA stage.



Your confusing HD8000 with HD8000M. The former is what AMD uses for 28nm APUs (Kabini, Kyoto, etc), the latter is a rebrand for discrete notebook GPUs.

As I presumed. The addition of the GPU-only eSRAM only meant complications as it would be forced to be shared between the CPU and GPU. The drawbacks simply outweighed the benefits.
 

thuway

Member
What does does this mean for the PS4 fidelity advantage? Is this just something to make coding easier or is this something that will markably improve performance?
 

RoboPlato

I'd be in the dick
German tech site Planet3DNow (specialized in AMD hardware) comes to the conclusion:

Xbox One does not have hUMA

Mark Diana and Heise were right. ESRAM is not coherent and Xbox One basically looks like an APU on Llano/Trinity level. They're not saying what this means for performance, but they say that coding for Xbox One will be a bigger effort than coding for a more advanced HSA stage.

Thanks. I was hoping someone could clarify this since my understanding from the Hot Chips conference was that XBO could support HSA but not hUMA. Seems I was correct.
 

Perkel

Banned
As I presumed. The addition of the GPU-only eSRAM only meant complications as it would be forced to be shared between the CPU and GPU. The drawbacks simply outweighed the benefits.


I would say that is completely wrong.

Without ESRAM Xbone would have only ~70GB/s bandwidth which would be really bad for their APU.

Remember that they choose it long before 8GB of GDDR5 was available and with their app strategy they couldn't just use 4 GB of GDDR5.

ESRAM is benefit for their hardware just in comparison to PS4 architecture it is worse.
 

McHuj

Member
What does does this mean for the PS4 fidelity advantage? Is this just something to make coding easier or is this something that will markably improve performance?

It does make the coding easier. For fidelity probably nothing, but that depends on how you define fidelity. The purpose of GPGPU would be to effectively utilize the GPU for highly parallel non-graphics work. That could be animation, physics, maybe ai, maybe audio, who knows that's for the developers to figure out. Do you consider animation/physics within the realm of fidelity? or just IQ/framerate? Because the later are already done on the GPU and don't really need coherency with the CPU.
 

onQ123

Member
Hotchips 25 - what we've learned:

Xbox One is custom HD7000, PS4 is custom HD8000. You can see that on this pic:



Two compute command processors (ACEs) and most likely two compute queues for Xbox One. PS4 has eight ACEs and 64 compute queues. That means that Xbox One will probably suck at at GPGPU compared to PS4 and will not be able to do asynchronous fine-grain compute (GPGPU without penalty for rendering) efficiently. This is somewhat disappointing: Even a Kabini Notebook APU with 2 GCN CUs has four ACEs and 32 Queues. To me it looks like Microsoft never intended to do GPGPU with this console. So why would they need hUMA at all?

I also noticed that Xbox One GPU is DirectX 11.1+ & PS4 GPU is DirectX 11.2+

DirectX_11.2__PS4-pcgh.jpg
 

i-Lo

Member
Seems like that PS4 GPU will have longer legs and be best exploited by first parties as time passes for GPGPU functions.
 
We've had the ESRAM latency song and dance before, but no one has ever really mentioned plausible numbers on it. AMD cards tend to have pretty high latency costs everywhere -- according to the GCN presentation at GDC Europe latency to L1 was still on average 20x that of latency to LDS memory, and I doubt that ESRAM is closer than L1 and L2 in the memory hierarchy.

It's difficult to know hard numbers without actually testing, but it's obvious that a memory that sits physically close to the execution unit, enclosed on the same package will have a much lower latency to get the data ready when compared to the main memory bus.

I doubt esram would have much worse latency compared to a L2 cache. Why it would? Aren't caches usually made of sram too?
 

astraycat

Member
It's difficult to know hard numbers without actually testing, but it's obvious that a memory that sits physically close to the execution unit, enclosed on the same package will have a much lower latency to get the data ready when compared to the main memory bus.

I doubt esram would have much worse latency compared to a L2 cache. Why it would? Aren't caches usually made of sram too?

The thing about the L2 cache is it will have several times the latency of the L1 cache, which already has 20x the latency of LDS, which means it's also going to be the in the 100s of cycles.

If ESRAM is has similar latency to L2, it's definitely not going to be in the 10s of cycles.
 

Darklor01

Might need to stop sniffing glue
lol ...u know directx are a microsoft thing right?

http://www.geek.com/games/sony-iimprove-directx-11-for-the-ps4-blu-ray-1544364/

"The PS4 sees Sony move to a 64-bit x86 chip architecture, which will be music to the ears of developers, especially those used to working on PC games. The good news doesn’t stop there, though. Developers will be able to take advantage of Microsoft’s latest industry standard DirectX API — DirectX 11.1, but Sony has taken the time to improve upon it, pushing the feature set beyond what is available for PC games development.
Those improvements include better shader pipeline access, improved debugging support features out the box, and much lower level access to the system hardware enabling developers to do “more cool things.” That’s achieved not only through an modified DirectX 11.1 API, but also a secondary low-level API specifically for the PS4 hardware.
As well as having more control over the hardware, Norden explained that developers get complete control over the CPU, GPU, and RAM. That means the GPU isn’t just limited to handling graphics on the system, it can run arbitrary code. Developers also get to decide how to use/split the 8GB RAM between different tasks. There’s no predefined limits meaning there’s more headroom to experiment.
One area of the console that is sure to please gamers is the Blu-ray drive the PS4 will ship with. Not only does that ensure the console can double as a Blu-ray player just like the PS3 did, it’s also a 3x faster drive than in the PS3, which should help with install and game load times."
 
Nothing really stops Sony from using directx. They just have to make their own implementation of it is all. The API is not IP protected (at least that is how current court decisions would see it).
 
I'm guessing this could be pushed to the Vgleaks thread since it clarifies the uses of hUMA on ps4... since they already confirmed (vgleaks) that it exist
 
Top Bottom