"We can have custom features and they can eventually end up on the [AMD] roadmap," Cerny says proudly. "So the ACEs... I was very passionate about asynchronous compute, so we did a lot of work there for the original PlayStation 4 and that ended up getting incorporated into subsequent AMD GPUs, which is nice because the PC development community gets very familiar with those techniques. It can help us when the parts of GPUs that we are passionate about are used in the PC space."
In actual fact, two new AMD roadmap features debut in the Pro, ahead of their release in upcoming Radeon PC products - presumably the Vega GPUs due either late this year or early next year.
"One of the features appearing for the first time is the handling of 16-bit variables - it's possible to perform two 16-bit operations at a time instead of one 32-bit operation," he says, confirming what we learned during our visit to VooFoo Studios to check out Mantis Burn Racing. "In other words, at full floats, we have 4.2 teraflops. With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance."
A work distributor is also added to the GPU design, designed to improve efficiency through more intelligent distribution of work.
"Once a GPU gets to a certain size, it's important for the GPU to have a centralised brain that intelligently distributes and load-balances the geometry rendered. So it's something that's very focused on, say, geometry shading and tessellation, though there is some basic vertex work as well that it will distribute," Mark Cerny shares, before explaining how it improves on AMD's existing architecture.
"The work distributor in PS4 Pro is very advanced. Not only does it have the fairly dramatic tessellation improvements from Polaris, it also has some post-Polaris functionality that accelerates rendering in scenes with many small objects... So the improvement is that a single patch is intelligently distributed between a number of compute units, and that's trickier than it sounds because the process of sub-dividing and rendering a patch is quite complex."
Beyond that, we're moving into the juicy stuff - the custom hardware that Sony has introduced, elements of the 'secret sauce' that allow the Pro graphics core to punch so far above its weight. In creating 4K framebuffers, a lot of the technological underpinnings are actually based on advanced anti-aliasing work with the creation of new buffers that can be exploited in a number of ways.
Right now, post-process anti-aliasing techniques like FXAA or SMAA have their limits. Edge detection accuracy varies dramatically. Searches based on high contrast differentials, depth or normal maps - or a combination - all have limitations. Sony had fashioned its own, highly innovative solution.
"We'd really like to know where the object and triangle boundaries are when performing spatial anti-aliasing, but contrast, Z [depth] and normal are all imperfect solutions," Cerny says. "We'd also like to track the information from frame to frame because we're performing temporal anti-aliasing. It would be great to know the relationship between the previous frame and the current frame better. Our solution to this long-standing problem in computer graphics is the ID buffer. It's like a super-stencil. It's a separate buffer written by custom hardware that contains the object ID."