• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

XBOX Scorpio: DX12 Built Directly Into GPU

Tripolygon

Banned
Sony hasn't took anything, whatever was added was added by AMD (possibly on Sony's request) and all of it came straight from Vega feature set. I don't see why Scorpio's APU will be any different - if anything, I expect a lot more Vega features in it simply because it launches after Vega itself in PC space. I'd say that it's fair to expect basically all new Vega features in Scorpio but the new memory architecture (HBCC/HBM2/tiling).
I know you've been peddling this idea that Scorpio would be based on Vega for a while now, but all evidence currently available to us point to it being based on Polaris. This is going by all the information Microsoft themselves have provided via Digital foundry just like how Sony did for PS4 Pro announcement last year, with extensive information on changes they made to the standard Polaris.

Scorpio feature set was decided probably sometime last year and just because Sony added features from Vega does not mean Microsoft must do the same, they have different design goals and philosophy. Same way Sony added features beyond what GCN 1.0 offered on regular PS4 when it launched in 2013.

Everything that goes into these consoles is decided by Sony and Microsoft, not probably. There are also custom features that are co-designed by the respective companies which will probably never appear on regular consumer CPU or GPU, although some have. You expecting more Vega features simply because it launches after Vega is illogical. Scorpio is also launching after Zen, doesn't mean we should expect Zen in Scorpio. Heck a few people were peddling HBM and features from AMD upcoming Navi architecture using the exact same "Sony did it so Microsoft must do it too" argument.

That being said, it could just be that Microsoft is yet to release more information or they aren't allowed to speak on Vega features in Scorpio because AMD hasn't launched it yet, but that is probably unlikely because Sony was able to speak about Vega features in Pro. Until Microsoft says otherwise, I am of the opinion that Scorpio GPU is based off Polaris architecture with no sign of Vega features.
 

jroc74

Phone reception is more important to me than human rights
^Good points too, lol.

Sony hasn't took anything, whatever was added was added by AMD (possibly on Sony's request) and all of it came straight from Vega feature set. I don't see why Scorpio's APU will be any different - if anything, I expect a lot more Vega features in it simply because it launches after Vega itself in PC space. I'd say that it's fair to expect basically all new Vega features in Scorpio but the new memory architecture (HBCC/HBM2/tiling).

Would Scorpio even need those Vega features?

I think these are some good questions.
 
I thought they said they saved 50% on CPU Rendering time not overall CPU usage.
Not even that. It was up to 50% saved on draw calls alone, which is a subset of rendering-related tasks done by the CPU.

Can someone explain baked in DirectX12 vs coding to the metal to me like I am five years old? I am trying to wrap my mind around on what the Xbox team tried to achieve here.
The "metal" in modern microchips is literally billions of tiny on/off switches. To perform computation these are flipped in precise patterns. Obviously, at the lowest level the instructions to control those huge patterns have to be extremely long and complicated. Setting them all by hand is prohibitively time-consuming.

So programmers use an API like DirectX 12, into which they feed more general instructions to do needed tasks like "test which poly covers the pixel center". (Actual instructions aren't in natural language like that, they're still in code--just not the long-winded and repetitive machine code.) The API translates those instructions into the much longer versions.

But because the translation is itself work, it requires computational power...which comes from the same pool of resources that will be used to draw graphics, etc. Using an API thus reduces a system's usable power for its intended purpose.

So engineers over time figure out how to make APIs "thinner", simplifying and speeding up the translation work. New versions of an API let programmers use a larger percentage of a system's resources, getting "close to the metal". (No one actually codes all the way "to the metal" level anymore; there's always an abstraction layer.)

By putting DX12 "in hardware", Microsoft are trying a different approach to reduce the API cost: rather than speed up translation, they make it so the GPU understands instructions with less translation.

This is good, but DirectX started from a worse place than other console APIs. Making it thinner is working toward the level of efficiency that Vulkan or GNM (Sony's main PS4 API) already have.
 

Locuza

Member
Would Scorpio even need those Vega features?
Depends on what you define as needed.

Vega brings major efficiency, performance and feature enhancements over the current Polaris architecture.
It would be sexy from a technical perspective if Scorpio would have at least some of them but it's not like 6TF, 326 GB/s, 12 GB and so on are not good enough, the competition doesn't offer more.

You could also ask does Scorpio need Zen?
Well better would be better but it's coming without Zen cores and will be still the strongest console.
 

onQ123

Member
Scorpio has 50% more bandwidth and compute. You expect fp16 to boost total gpu performance by over 50%?

If Scorpio doesn't feature RPM then PS4 Pro peak performance is 8.4TF FP16 / 4.2TF FP32 while Scorpio would be 6TF FP32 & FP16.


But we don't know the full specs of Scorpio it could be 12TF FP16.
 
I know signs point to using Polaris architect, but really they wouldn't implement Vega features? Could it really be that true, and I'm not basing it off Vega is about to come out or anything but just off the fact that this thing has been brewing for so long it seems foolish.
 

quest

Not Banned from OT
If Scorpio doesn't feature RPM then PS4 Pro peak performance is 8.4TF FP16 / 4.2TF FP32 while Scorpio would be 6TF FP32 & FP16.


But we don't know the full specs of Scorpio it could be 12TF FP16.

I would say any real Vega features looks more grim by the day. It's frustrating the Scorpio could of been a premium monster with Vega and its TBR. Shame Sony was 1 year to early I am sure a 2017 pro would of had the big time Vega features.
 
If Scorpio doesn't feature RPM then PS4 Pro peak performance is 8.4TF FP16 / 4.2TF FP32 while Scorpio would be 6TF FP32 & FP16.


But we don't know the full specs of Scorpio it could be 12TF FP16.

I feel the RPM in PS4 is for clawing as much performance they can in higher resolutions. It's not a magic bullet that can be applied in all cases. Scorpio might not need it because they're already hitting 6TF.
 

quest

Not Banned from OT
I know signs point to using Polaris architect, but really they wouldn't implement Vega features? Could it really be that true, and I'm not basing it off Vega is about to come out or anything but just off the fact that this thing has been brewing for so long it seems foolish.

It was the design goal of 6tf and 12 gigs of ram cheap as possible. They hit those but had to leave Vega features on the table. Wish they did not engineer around pr numbers instead done a 5.5 tf with full Vega features minus the hbm.
 
Would the type of chip that they use even matter at this point? As long they showed that the console can run 4k native 60fps with some room to spare is great.
 
Would the type of chip that they use even matter at this point? As long they showed that the console can run 4k native 60fps with some room to spare is great.

Yeah the extra 1.8 TF of gpu power and the 100+Gbps bandwidth and 3 extra GB of ram means shit.

Nothing to see here. No 16fp or confirmed vega benefits. Pack your bags, time to go.
 

timlot

Banned
Would the type of chip that they use even matter at this point? As long they showed that the console can run 4k native 60fps with some room to spare is great.

Folks are trying to poke holes where there are none. Claiming Scorpio's chip is missing this feature or that feature like its fact. Its a wrap fellas. More productive time would be spent hypothesizing about PS5. lol
 

Pif

Banned
Not even that. It was up to 50% saved on draw calls alone, which is a subset of rendering-related tasks done by the CPU.


The "metal" in modern microchips is literally billions of tiny on/off switches. To perform computation these are flipped in precise patterns. Obviously, at the lowest level the instructions to control those huge patterns have to be extremely long and complicated. Setting them all by hand is prohibitively time-consuming.

So programmers use an API like DirectX 12, into which they feed more general instructions to do needed tasks like "test which poly covers the pixel center". (Actual instructions aren't in natural language like that, they're still in code--just not the long-winded and repetitive machine code.) The API translates those instructions into the much longer versions.

But because the translation is itself work, it requires computational power...which comes from the same pool of resources that will be used to draw graphics, etc. Using an API thus reduces a system's usable power for its intended purpose.

So engineers over time figure out how to make APIs "thinner", simplifying and speeding up the translation work. New versions of an API let programmers use a larger percentage of a system's resources, getting "close to the metal". (No one actually codes all the way "to the metal" level anymore; there's always an abstraction layer.)

By putting DX12 "in hardware", Microsoft are trying a different approach to reduce the API cost: rather than speed up translation, they make it so the GPU understands instructions with less translation.

This is good, but DirectX started from a worse place than other console APIs. Making it thinner is working toward the level of efficiency that Vulkan or GNM (Sony's main PS4 API) already have.

Thanks for the explanation.

So is this xbox optimized API still the same used across desktop windows ?

Meaning, devs basically program once and it jist works on both PC and Scorpio?
 

onQ123

Member
I feel the RPM in PS4 is for clawing as much performance they can in higher resolutions. It's not a magic bullet that can be applied in all cases. Scorpio might not need it because they're already hitting 6TF.

Not a magic bullet ,Scorpio using raw power & more RAM is more of a magic bullet while devs would have to optimize a lot to take advantage of the higher FP16 performance of PS4 Pro.
 

Locuza

Member
Thanks for the explanation.

So is this xbox optimized API still the same used across desktop windows ?

Meaning, devs basically program once and it jist works on both PC and Scorpio?
Currently no, DX12 (PC) and DX12.x (Xbox One) have differences.

On DX12.x (And even before with DX11.x) you can for example:

- Programm the sample points for unique Anti-Aliasing-Implementations.
For example Ubisoft used it for HRAA.
- Use cross-lane-operations where threads on the SIMDs can share data on a register level which means higher performance.
- Expose barycentric coordinates which can help in writing code in a different, more efficient way.
- Changing the Pipeline-State-Objects from the GPU, what was referred as "DX12 in Hardware".
- And many more

(some :p) Sources:
http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
http://gpuopen.com/gaming-product/barycentrics12-dx12-gcnshader-ext-sample/
http://www.frostbite.com/2016/03/optimizing-the-graphics-pipeline-with-compute/

With the Windows 10 Creators Update some upcoming features for DX12 PC are currently in a experimental phase, like programmable sample points and cross-lane-operations.
But with Scorpio the featureset will probably increase and it's likely that even with the updates coming to DX12 PC not everything can be covered.
So certain differences should remain.
 

Pop_smoke

Neo Member
Thanks for the explanation.

So is this xbox optimized API still the same used across desktop windows ?

Meaning, devs basically program once and it jist works on both PC and Scorpio?

They are not the same. The console dx and pc are close but i mean a console is a closed locked system soooo.... anyways!

What the big deal here is not the API per say. Microsoft always has been touting dx 12 and close to the metal programming.

What i am seeing is misunderstood a lot in this thread is the fact the the DX12 API is getting a dedicated processor in the Scorpio.

A processor dedicated to purely run the API and not have if run on the CPU like traditionally every console and desktop computer does.
This is actually extremely innovative for the consumer markets. By making a dedicated chip for the API itself, it frees up lots of CPU power and processes for other tasks instead of API overhead. Doing this also makes the gpu less dependent of a weak CPU. They probably still use the potato jaguar cpu like in the ps4 and xbox one.

When devs say they program close to the metal, there is still a over head. In this case, the API is the metal. It's crazy cool tech. Like how engineers have made chips for ray tracing or only for cell phone antennas, etc.
 
Not a magic bullet ,Scorpio using raw power & more RAM is more of a magic bullet while devs would have to optimize a lot to take advantage of the higher FP16 performance of PS4 Pro.

Outside of BF1 having one checkerboard shader that improved 30% due to FP16 what else is there? You have to remember that for every 2xFP16 calculation you are giving up a precious FP32 calculation. FP16 is limited case usage for now. I don't think devs will be having to optimize "a lot" to take advantage of the "possible" higher performance of a PS4 Pro. More than likely Scorpio will have the same features and maybe more of the Pro. We have one "developer" that claimed the Pro had more Vega features, at least as far I know.
 

onQ123

Member
Outside of BF1 having one checkerboard shader that improved 30% due to FP16 what else is there? You have to remember that for every 2xFP16 calculation you are giving up a precious FP32 calculation. FP16 is limited case usage for now. I don't think devs will be having to optimize "a lot" to take advantage of the "possible" higher performance of a PS4 Pro. More than likely Scorpio will have the same features and maybe more of the Pro. We have one "developer" that claimed the Pro had more Vega features, at least as far I know.

Actually RPM was the work of Mark Cerny & AMD I posted the patent last year so you can't be so sure of Scorpio having it just because it's in the PS4 Pro. But as far as the performance gains go we also have the demo shown by AMD.
 
I was under the impression that Win10 DX12 inherently useless CPU overhead, especially when it comes to draw calls. Is this bringing portions of the API one step further?

Would this be of less importance on a PC with a good strong CPU?
 

Locuza

Member
Actually RPM was the work of Mark Cerny & AMD I posted the patent last year so you can't be so sure of Scorpio having it just because it's in the PS4 Pro. But as far as the performance gains go we also have the demo shown by AMD.
FP16 is totally generic stuff, no matter in which way FP16 went into AMDs portfolio they have all IP rights to it.
 

Locuza

Member
It would explain why PS4 Pro is using RPM early even without a Vega GPU.
Because it was a roadmap feature anyway and it's not rocket science to include double-rate FP16 if you already have native FP16 ops.
There is nothing from Sonys side preventing MS to include RPM.
 

onQ123

Member
Because it was a roadmap feature anyway and it's not rocket science to include double-rate FP16 if you already have native FP16 ops.
There is nothing from Sonys side preventing MS to include RPM.

lol no one said that Sony was stopping MS from using it.

If Sony & AMD worked together on putting this feature in PS4 Pro & it's on the market in a console before AMD release it as part of their GPU & MS is using a GPU from AMD that comes from the generation before the feature was added you can't just say that Scorpio is going to have that feature just because it's coming out a year later.
 
Actually RPM was the work of Mark Cerny & AMD I posted the patent last year so you can't be so sure of Scorpio having it just because it's in the PS4 Pro. But as far as the performance gains go we also have the demo shown by AMD.

No, the 2xFP16 per core is a native ability of the cores, the same way that it's part of NVIDIA's Tesla cores.

It's basically a universal configuration unless excluded ala GP 104. NVIDIA purposely excluded it to save lost profit on their Pro cards.
 

Locuza

Member
lol no one said that Sony was stopping MS from using it.

If Sony & AMD worked together on putting this feature in PS4 Pro & it's on the market in a console before AMD release it as part of their GPU & MS is using a GPU from AMD that comes from the generation before the feature was added you can't just say that Scorpio is going to have that feature just because it's coming out a year later.
You talked about patents which might lead to the believe this would be stuff also locked by Sony.

MS had the same roadmap visibility as Sony and since they are coming one year later it's just very likely in my point of view that Scorpio will also feature it.
But a guarantee can't be made, the tech reveal was very sparse in regards to low level details.

No, the 2xFP16 per core is a native ability of the cores, the same way that it's part of NVIDIA's Tesla cores.

It's basically a universal configuration unless excluded ala GP 104. NVIDIA purposely excluded it to save lost profit on their Pro cards.
The GP100 is the only Pascal chip (and maybe the new Tegra) with shader clusters which can do double rate FP16.
All other Pascal chips are lacking this ability, it's not just a Tesla restriction.
 

onQ123

Member
You talked about patents which might lead to the believe this would be stuff also locked by Sony.

MS had the same roadmap visibility as Sony and since they are coming one year later it's just very likely in my point of view that Scorpio will also feature it.
But a guarantee can't be made, the tech reveal was very sparse in regards to low level details.

I'm saying the reason that PS4 Pro is using RPM even before Vega is because Sony & AMD worked together on how to take advantage of RPM for rendering . so just because Sony used RPM before Vega was released does not mean that Scorpio will also use RPM.
 

dr_rus

Member
The Pixel Engine is what AMD calls their ROP-Design.
It might be that Scorpio doesn't have the new Vega Pixel Engine.

Well, of course, since ROPs are a part of the Vega's new memory architecture and they should be different for GDDR5 anyway.

I know you've been peddling this idea that Scorpio would be based on Vega for a while now, but all evidence currently available to us point to it being based on Polaris. This is going by all the information Microsoft themselves have provided via Digital foundry just like how Sony did for PS4 Pro announcement last year, with extensive information on changes they made to the standard Polaris.

Scorpio feature set was decided probably sometime last year and just because Sony added features from Vega does not mean Microsoft must do the same, they have different design goals and philosophy. Same way Sony added features beyond what GCN 1.0 offered on regular PS4 when it launched in 2013.

Everything that goes into these consoles is decided by Sony and Microsoft, not probably. There are also custom features that are co-designed by the respective companies which will probably never appear on regular consumer CPU or GPU, although some have. You expecting more Vega features simply because it launches after Vega is illogical. Scorpio is also launching after Zen, doesn't mean we should expect Zen in Scorpio. Heck a few people were peddling HBM and features from AMD upcoming Navi architecture using the exact same "Sony did it so Microsoft must do it too" argument.

That being said, it could just be that Microsoft is yet to release more information or they aren't allowed to speak on Vega features in Scorpio because AMD hasn't launched it yet, but that is probably unlikely because Sony was able to speak about Vega features in Pro. Until Microsoft says otherwise, I am of the opinion that Scorpio GPU is based off Polaris architecture with no sign of Vega features.

You have to remember that MS won't be able to tell you that they are using Vega tech before AMD will actually disclose the details on that tech. So they are somewhat tied to Vega launch here if they are using anything from it.

But consider that Vega's new CUs are supposedly what gives it a nice increase in working frequencies - and coincidentally this is also what sets Scorpio's GPU aside from Polaris. So there are some signs that it will actually be based on Vega more than Polaris (with the new frontend and multiprocessors the only "old" part would be the memory architecture - and I'm pretty sure it's actually "old" because otherwise they'd probably use GDDR5X on a narrow bus instead of GDDR5 on 384 bit one).

Zen argument is irrelevant because console h/w upgrades do not need Zen while they do in fact need all the possible increase in GPU power they can get.
 

Locuza

Member
Well, of course, since ROPs are a part of the Vega's new memory architecture and they should be different for GDDR5 anyway.
[...]
Well, not of course.
Vega is also coming to Raven Ridge as an APU.
AMD can and has to adapt the backend to different memory solutions.
They did it with Fiji and they will do it with Vega.

Now that with Vega the ROPs are a client of the L2$ slices which then connects to the new HBCC, you could speculate that the memory type is even less challenging from a design perspective than before.

Here is the oddity with Scorpio.
It has a 384-Bit wide interface with 32 ROPs which would indicate the old backend design if this would be the only measurement to go by but it also has just 2MB L2$ which AMD never did, distributing a different amount of L2$ slices across memory controllers.
It might be that with Vega such design would be possible without being too ugly because the ROPs are connected to the L2$ slices which are connected to the new HBCC which is connected to the Memory Controllers.
They could be a new flexible way in designing the backend.

Or MS uses the old backend design and simply has an uneven distribution of L2$ slices and ROP partitions.
 

longdi

Banned
All the questions about whether Scorpio has Vega/RPM featues, tell me MS were more interested in PR purposes rather than tech dive, when they gave DF the unveil.
 

gamz

Member
All the questions about whether Scorpio has Vega/RPM featues, tell me MS were more interested in PR purposes rather than tech dive, when they gave DF the unveil.

Well, to be fair, DF is still releasing information. So perhaps this will be mentioned.
 

dr_rus

Member
Well, not of course.
Vega is also coming to Raven Ridge as an APU.
AMD can and has to adapt the backend to different memory solutions.
They did it with Fiji and they will do it with Vega.

Now that with Vega the ROPs are a client of the L2$ slices which then connects to the new HBCC, you could speculate that the memory type is even less challenging from a design perspective than before.

There is no guarantee that Vega in Raven Ridge will have the new memory architecture in the first place. ROPs being hardwired to L2 means that now to support a different memory type you need to redesign not only L2s and MCs but ROPs as well. So no, it's not less challenging. The only positive point here lies in the fact that AMD will need to design a new GDDR controller anyway, as they'll most likely want to support GDDR5X and GDDR6 on their future GPUs - but this is why I don't think that it will be the case for Scorpio since it's still using the old GDDR5 and...

It has a 384-Bit wide interface with 32 ROPs which would indicate the old backend design if this would be the only measurement to go by but it also has just 2MB L2$ which AMD never did, distributing a different amount of L2$ slices across memory controllers.

...Yes, the 32 ROPs point to them still using the old crossbar between ROPs and MCs.

Where does the GPU's L2 amount come from? The original quote is a bit unclear:
As you can see, we doubled the amount of shader engines. That has the effect of improvement of boosting our triangle and vertex rate by 2.7x when you include the clock boost as well. We doubled the number of render back-ends, which has the effect of increasing our fill-rate by 2.7x. We quadrupled the GPU L2 cache size, again for targeting the 4K performance.
Notice how he's talking about the whole chip changes. The "quadrupled the GPU L2 cache size" could mean that it's 4x from what was in XBO, not 4X per MC. Although that would be strange as it would produce a number which can't be shared between 6 MCs.
 

Locuza

Member
There is no guarantee that Vega in Raven Ridge will have the new memory architecture in the first place
There is also no guarantee that AMD will use bastard tech for every Vega based chip because it doesn't use HBM.

ROPs being hardwired to L2 means that now to support a different memory type you need to redesign not only L2s and MCs but ROPs as well. So no, it's not less challenging [...]
That depends on how the interconnection works in detail and what data paths need to be adjusted.
Vega also brings the new HBCC into play, where we are simply lacking information how the design is architected.

Notice how he's talking about the whole chip changes. The "quadrupled the GPU L2 cache size" could mean that it's 4x from what was in XBO, not 4X per MC. Although that would be strange as it would produce a number which can't be shared between 6 MCs.
That's my point, if you are quadrupling the L2$ size then you get 2048KB L2$ (Xbox One was 512KB).
You can't spread 2048KB evenly across 6 MCs.
MS has to do something funny.
 

dr_rus

Member
That depends on how the interconnection works in detail and what data paths need to be adjusted.
Vega also brings the new HBCC into play, where we are simply lacking information how the design is architected.

Hardwired ROPs mean that there is no interconnects as they are an integral part of L2/MC unit now.

HBCC is most likely just a fancy name for system wide unified memory which every Pascal GPU supports (only in CUDA though, at least for now) for almost a year. I will be surprised if they are handling anything else but address translation in h/w here. It's not needed in a UMA system anyway.
 

Locuza

Member
There are still dataports connecting the units together, for example what adjustments would be needed for the ROPs if the MC type changes?

The main logic will stay the same, the number of ROPs and L2$ sizes are scalable.
AMD has all different kinds of IPs, GDDR5, HBM, the old and new pixel backend.

Just as a rough reference that's probably how Nvidia does it:
01.jpg

http://pc.watch.impress.co.jp/docs/column/kaigai/752331.html

Which doesn't show the whole picture because we also know from Nvidia that there is a interconnect between the L2$ <---> MC connections:
GM204_arch_0.jpg
 

dr_rus

Member
There are still dataports connecting the units together, for example what adjustments would be needed for the ROPs if the MC type changes?

The main logic will stay the same, the number of ROPs and L2$ sizes are scalable.
AMD has all different kinds of IPs, GDDR5, HBM, the old and new pixel backend.

Not really, unless you mean in the sense of taking out or adding in the whole unit consisting of 4 ROPs, 1 L2 partition and some memory channels. L2 size for example hasn't changed since GM107 in NV's GPUs, and that's hardly because it's easily scaleable.
 

Locuza

Member
The size of the L2$ slices can vary.
On GCN Gen 1 AMD used 128-256 KB per MC (in the white paper they stated 64-128KB per MC), with Polaris it's 256KB.
Page 10:
https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

Nvidia used big L2$ slices for the GM107, 1024KB per MC.
But went down to 512KB for every chip after (Don't know about GP100).

The number of ROP clients also changed.
On GM107 there were 8 ROPs connected to 1024KB L2$.
On a GP104 there are 16 ROPs (2x8) using 512KB L2$ per MC.
 
Top Bottom