• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Full details of the Scorpio engine in Xbox one X from Hotchips conference 2017

BigEmil

Junior Member
No full support of SATA 3 is unfortunate due to some bottleneck somewhere

Hopefully next gen it does
 
I don't think that's true and it doesn't make sense either (due to unified shaders). 3TF for vertex shaders alone sounds like an overkill. Vertices don't need that much processing power.

There are 3TF available for all types of shaders. Still plenty enough for old, unpatched games.

ps: You also forgot to mention geometry shaders.

I agree it doesn't make much sense, but that's what the documentation said.

I don't recall the thread but it was something like that:

Xbox One BC with very old SDKs will have the full gpu exposed, but with that odd limit.
Xbox One BC with the (I think it was June 2017) SDK would have the full gpu exposed in a unified way.
Xbox One games with the October SDK will be able to target xbonex specifically.
 

onQ123

Member
Nah, it's ~52 NGCs glued together.

...There is zero reason to put two PS4 GPUs next to each other when you are building a single chip SoC system.
It's like the PS4 & Xbox One CPU being made up of 2 Jaguar clusters but in this case PS4 Pro GPU is made up of 2 PS4 GPU size clusters, this is pretty much fact I'm not sure what the argument is at the moment but when I said this a year ago & said that I would guess that it would be 64 ROPs people didn't think it would be but right now everything points to it being 64 ROPs.

That's because they are counting the full render time, which includes vertex and other work that won't scale at all with resolution.

CB will be costlier than just rendering half the pixel load, because it has do to something extra (but even a simple software upscale would), but the extension of that is currently undisclosed.

Why wouldn't they use the full render time?

The ROPs not telling the full story wasn't aimed at you, I said it doesn't tell the full story as the post I quoted mentioned advertising it as an advantage, to which I responded with "it doesn't tell the whole story about how the GPU performs as there are other variables at play depending on the software".

Oh
 
Why wouldn't they use the full render time?

They should use the total render time, I'm just saying the total render time is not useful to draw comparison on how much CB saves on the stuff that is resolution dependent.


Edit: And it has been confirmed Pro's GPU have 64 ROPs? Seems like a utterly waste, considering how little the bandwidth increased.
 

rokkerkory

Member
I absolutely love threads like this even though I understand like 10% of it haha. But still learn a bit more each time. Thanks for the good dialogue folks.
 

dr_rus

Member
It's like the PS4 & Xbox One CPU being made up of 2 Jaguar clusters but in this case PS4 Pro GPU is made up of 2 PS4 GPU size clusters, this is pretty much fact I'm not sure what the argument is at the moment but when I said this a year ago & said that I would guess that it would be 64 ROPs people didn't think it would be but right now everything points to it being 64 ROPs.

There is no "PS4 GPU size cluster" (contrary to a 4 core Jaguar CPU module which is actually a thing) and the sole fact that Neo's SIMDs are different (FP16x2 support for example) is enough to make Neo's GPU a completely new piece of silicon, in no way related to what's present in PS4. ROPs in all versions of GCN architecture are decoupled from shader core anyway so it doesn't matter if there's 64 or 32 or 48 - this doesn't mean that Neo's GPU is "2 modified PS4 GPUs next to each other" either.

Sony's inability to effectively abstract PS4's h/w to avoid running weird h/w configurations like half GPU reservation on Neo for titles running in legacy mode is nothing more than a sign of PS4's APIs weaknesses and this is done via the (OS/driver/firmware level) software, not by putting "2 modified PS4 GPUs next to each other".
 

onQ123

Member
There is no "PS4 GPU size cluster" (contrary to a 4 core Jaguar CPU module which is actually a thing) and the sole fact that Neo's SIMDs are different (FP16x2 support for example) is enough to make Neo's GPU a completely new piece of silicon, in no way related to what's present in PS4. ROPs in all versions of GCN architecture are decoupled from shader core anyway so it doesn't matter if there's 64 or 32 or 48 - this doesn't mean that Neo's GPU is "2 modified PS4 GPUs next to each other" either.

Sony's inability to effectively abstract PS4's h/w to avoid running weird h/w configurations like half GPU reservation on Neo for titles running in legacy mode is nothing more than a sign of PS4's APIs weaknesses and this is done via the (OS/driver/firmware level) software, not by putting "2 modified PS4 GPUs next to each other".

How is it not a PS4 GPU size cluster when Cerny said that it's a mirror of it's self placed next to it? that's 2 18 CU clusters next to each other & why are you telling me about the difference in the GPU as if I'm saying that it's actually 2 PS4 GPUs in the PS4 Pro?
 

LordOfChaos

Member
There is no "PS4 GPU size cluster" (contrary to a 4 core Jaguar CPU module which is actually a thing) and the sole fact that Neo's SIMDs are different (FP16x2 support for example) is enough to make Neo's GPU a completely new piece of silicon, in no way related to what's present in PS4. ROPs in all versions of GCN architecture are decoupled from shader core anyway so it doesn't matter if there's 64 or 32 or 48 - this doesn't mean that Neo's GPU is "2 modified PS4 GPUs next to each other" either.

Sony's inability to effectively abstract PS4's h/w to avoid running weird h/w configurations like half GPU reservation on Neo for titles running in legacy mode is nothing more than a sign of PS4's APIs weaknesses and this is done via the (OS/driver/firmware level) software, not by putting "2 modified PS4 GPUs next to each other".


That was actually an early weakness of GCN, ROPs being tied to CU counts. This was addressed in later versions. It's possible the Pro plucked from the then-near-future feature release and decoupled them, just like they borrowed 8 ACEs from the future.

In fact I'd bet on that. If the Pro doubled ROP pixel throughput, I'd think they would have talked about it, just like they talked about shading performance, FP16, and memory bandwidth. The PS4 was already very overkill for 1080p with 32. Perhaps, likely, ROPs were never the limit to scaling up to double that.

Also 32 being overkill for 1080p also fits in with PSVR needing around double that throughput, same with what the PS4 Pro usually ends up rendering at pre-checkerboarding.
 

dr_rus

Member
How is it not a PS4 GPU size cluster when Cerny said that it's a mirror of it's self placed next to it? that's 2 18 CU clusters next to each other & why are you telling me about the difference in the GPU as if I'm saying that it's actually 2 PS4 GPUs in the PS4 Pro?
Every GCN GPU has half of CUs placed as a mirror on another side of the chip. PS4 GPU is arranged in the exact same fashion, with 9+9 CUs on different sides. 18 CUs means nothing as these are different CUs.

That was actually an early weakness of GCN, ROPs being tied to CU counts. This was addressed in later versions. It's possible the Pro plucked from the then-near-future feature release and decoupled them, just like they borrowed 8 ACEs from the future.
GCN never had ROPs tied to CU counts. Tahiti (the first GCN GPU) had 32 CUs and 32 ROPs while Pitcairn (the second GCN GPU) had 20 CUs and 32 ROPs. Hawaii had 44 CUs and 64 ROPs.

If you're thinking about changes made to GCN in Vega/GCN5 then I'm quite sure that ROPs/MCs used in Neo's GPU were in fact from Polaris and not GCN5 as these were made to work with HBM2 and not GDDR5 which is used in PS4Pro. It's also somewhat telling that they haven't mentioned ROVs and CR support in PS4Pro - both are ROP features and both are added in GCN5. Both would be pretty useless for a system designed to run PS4 games in higher resolution though.

In fact I'd bet on that. If the Pro doubled ROP pixel throughput, I'd think they would have talked about it, just like they talked about shading performance, FP16, and memory bandwidth. The PS4 was already very overkill for 1080p with 32. Perhaps, likely, ROPs were never the limit to scaling up to double that.

Also 32 being overkill for 1080p also fits in with PSVR needing around double that throughput, same with what the PS4 Pro usually ends up rendering at pre-checkerboarding.
It's rather unlikely that they are using more than 32 ROPs in Neo for a simple reason of peak memory bandwidth not being significantly higher than in PS4. With just 217GB/s of bandwidth putting more than 32 ROPs in the chip would likely be an overkill as the backend isn't in fact that much limited by the pixel output as it is limited by other things like memory bandwidth and the need to run long shaders over a course of several cycles.
 

onQ123

Member
10% boost from Out of Order Rasterization?


This is from a web post about OOO Rasterization not about Xbox One X but it's a feature of Xbox One X




And another one


Out-of-Order Rasterization On RadeonSI Will Bring Better Performance In Some Games

AMD developer Nicolai Hähnle has published a set of patches today for adding out-of-order rasterization support to the RadeonSI Gallium3D driver. Long story short, this can boost the Linux gaming performance of GCN 1.2+ graphics cards when enabled.

Nicolai posted this patch series introducing the out-of-order rasterization support. This is being used right now for Volcanic Islands (GCN 1.2) and Vega (GFX9) discrete graphics cards (though support might be added to other GCN hardware too). It can be disabled via the R600_DEBUG=nooutoforder switch.

This out-of-order rasterization support is also wired in for toggling it and some attributes via DRIRC for per-game Linux profiling in order to enable/disable depending upon where this helps Linux games or otherwise causes issues.

Nicolai has posted an explanation of out-of-order rasterization on his blog for those interested in a technical explanation, " Out-of-order rasterization can give a very minor boost on multi-shader engine VI+ GPUs (meaning dGPUs, basically) in many games by default. In most games, you should be able to set radeonsi_assume_no_z_fights=true and radeonsi_commutative_blend_add=true to get an additional very minor boost. Those options aren't enabled by default because they can lead to incorrect results."

Once the patches land in Mesa Git (or while still in patch form, if I magically find extra time before then), I intend to try out the support to see their impact on popular Linux games.


Samstag, September 09, 2017
radeonsi: out-of-order rasterization on VI+


Background: Out-of-order rasterization

Out-of-order rasterization is an optimization that can be enabled in some cases. Understanding it properly requires some background on how tasks are spread across shader engines (SEs) on Radeon GPUs.

The frontends (vertex processing, including tessellation and geometry shaders) and backends (fragment processing, including rasterization and depth and color buffers) are spread across SEs roughly like this:



(Not shown are the compute units (CUs) in each SE, which is where all shaders are actually executed.)

The input assembler distributes primitives (i.e., triangles) and their vertices across SEs in a mostly round-robin fashion for vertex processing. In the backend, work is distributed across SEs by on-screen location, because that improves cache locality.

This means that once the data of a triangle (vertex position and attributes) is complete, most likely the corresponding rasterization work needs to be distributed to other SEs. This is done by what I'm simplifying as the "crossbar" in the diagram.

OpenGL is very precise about the order in which the fixed-function parts of fragment processing should happen. If one triangle comes after another in a vertex buffer and they overlap, then the fragments of the second triangle better overwrite the corresponding fragments of the first triangle (if they weren't rejected by the depth test, of course). This means that the "crossbar" may have to delay forwarding primitives from a shader engine until all earlier primitives (which were processed in another shader engine) have been forwarded. This only happens rarely, but it's still sad when it does.

There are some cases in which the order of fragments doesn't matter. Depth pre-passes are a typical example: the order in which triangles are written to the depth buffer doesn't matter as long as the "front-most" fragments win in the end. Another example are some operations involved in stencil shadows.

Out-of-order rasterization simply means that the "crossbar" does not delay forwarding triangles. Triangles are instead forwarded immediately, which means that they can be rasterized out-of-order. With the in-progress patches, the driver recognizes cases where this optimization can be enabled safely.

By the way #1: From this explanation, you can immediately deduce that this feature only affects GPUs with multiple SEs. So integrated GPUs are not affected, for example.

By the way #2: Out-of-order rasterization is entirely disabled by setting R600_DEBUG=nooutoforder.


Why the configuration options?

There are some cases where the order of fragments almost doesn't matter. It turns out that the most common and basic type of rendering is one of these cases. This is when you're drawing triangles without blending and with a standard depth function like LEQUAL with depth writes enabled. Basically, this is what you learn to do in every first 3D programming tutorial.

In this case, the order of fragments is mostly irrelevant because of the depth test. However, it might happen that two triangles have the exact same depth value, and then the order matters. This is very unlikely in common scenes though. Setting the option radeonsi_assume_no_z_fights=true makes the driver assume that it indeed never happens, which means out-of-order rasterization can be enabled in the most common rendering mode!

Some other cases occur with blending. Some blending modes (though not the most common ones) are commutative in the sense that from a purely mathematical point of view, the end result of blending two triangles together is the same no matter which order they're blended in. Unfortunately, additive blending (which is one of those modes) involves floating point numbers in a way where changing the order of operations can lead to different rounding, which leads to subtly different results. Using out-of-order rasterization would break some of the guarantees the driver has to give for OpenGL conformance.

The option radeonsi_commutative_blend_add=true tells the driver that you don't care about these subtle errors and will lead to out-of-order rasterization being used in some additional cases (though again, those cases are rarer, and many games probably don't encounter them at all).

tl;dr

Out-of-order rasterization can give a very minor boost on multi-shader engine VI+ GPUs (meaning dGPUs, basically) in many games by default. In most games, you should be able to set radeonsi_assume_no_z_fights=true and radeonsi_commutative_blend_add=true to get an additional very minor boost. Those options aren't enabled by default because they can lead to incorrect results.
 

onQ123

Member
nope. my bad.

OOO Rasterization it's something that was enabled last year for AMD GPU's with GCN 1.2 & higher that's after PS4 & Xbox One was released so this would be a new feature on console & I'm guessing it's always enable for Xbox One X so that's an extra boost over Xbox One that add to the boost of more GPU Flops.

Microsoft-Xbox-One-X-Scorpio-Engine-Hot-Chips-29-04.png
 

Locuza

Member
That's a performance boost for GPUs with multiple Shader Engines in cases where you don't have to synchronise between them.
The PS4 and Xbox One only had 2 SEs, so the cost of waiting naturaly occurs less often there, than on 4 SEs.
The performance boost in comparison to the old consoles will likely be small if present at all.
 
Top Bottom