• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

GDC 2016 papers and presentations

dr_rus

Member

Rather disappointing gains from async compute again (2% PS4, 5% XBO, 12% FuryX if I'm not mistaken). But the gains from compute based culling are extreme on the other hand. GCN's geometry pipeline need to improve.

All wonderful presentations. Oxide game's render is pretty darn unique.

They're definitely unique in shading all stuff in a 4Kx4K buffer even when a user is running in 1080p resolution. 512 bit GDDR5 and HBM cards will probably shine there 8)
 
Rather disappointing gains from async compute again (2% PS4, 5% XBO, 12% FuryX if I'm not mistaken). But the gains from compute based culling are extreme on the other hand. GCN's geometry pipeline need to improve.



They're definitely unique in shading all stuff in a 4Kx4K buffer even when a user is running in 1080p resolution. 512 bit GDDR5 and HBM cards will probably shine there 8)

its just 1 use case for async compute. its probably much easier to saturate the smaller amt of CUs on the consoles as oppose to the massive arays on higher end pc parts.
 

tuxfool

Banned
Looks like it. I had hoped that VR would benefit more from multi-gpu set ups. 35% is disappointing.

This is just one implementation, but it is known that SFR actually is less scalable than AFR.

What it does allow is decent frame pacing which is something that is highly desirable.
 

dr_rus

Member
its just 1 use case for async compute. its probably much easier to saturate the smaller amt of CUs on the consoles as oppose to the massive arays on higher end pc parts.

Yeah, well, it's one case but it's not the first which gives that ~+10% performance result on PC GCN cards.

It's also highly questionable if such compute culling optimization will actually help anything but GCN - it's pretty clear that consoles are the main target of these compute tricks which stem from the issues the GCN geometry pipeline has. I also kinda wonder if the primitive discard unit shown in GCN4 will perform something like this in h/w for Polaris.
 
Yeah, well, it's one case but it's not the first which gives that ~+10% performance result on PC GCN cards.

It's also highly questionable if such compute culling optimization will actually help anything but GCN - it's pretty clear that consoles are the main target of these compute tricks which stem from the issues the GCN geometry pipeline has. I also kinda wonder if the primitive discard unit shown in GCN4 will perform something like this in h/w for Polaris.

most development optimizations are going to be for amd going forward. its going to be an uphill battle for nvidia. 10 to 15% here and there is nothing to scoff at. esp considering its just extra performance without any cost to the user. good developers have pretty much solved gcns geomoetry weakness. its not even an issue outside of nvidias over tessellated gameworks libraries.
 

dr_rus

Member
https://www.youtube.com/watch?v=tVHH3-bP-fE
"THE GIFT" GDC2016 Trailer (created using "MARZA Movie Pipeline for Unity")

most development optimizations are going to be for amd going forward. its going to be an uphill battle for nvidia.

This remains to be seen. There's only so much optimization you can perform on a fixed architecture before the gains become negligible. And when that happens you're starting to research other stuff on PC h/w. I'd say that by 2017 the industry will squeeze all the juices from console APUs and move on to the future h/w.
 
most development optimizations are going to be for amd going forward. its going to be an uphill battle for nvidia. 10 to 15% here and there is nothing to scoff at. esp considering its just extra performance without any cost to the user. good developers have pretty much solved gcns geomoetry weakness. its not even an issue outside of nvidias over tessellated gameworks libraries.

Low utilization during depth-only passes (or from zero pixel draw due to poor occlusion/hiz) is definitely an issue independent of tessellation. The one launch per SE referred to in the paper is something folks deal with in varying ways (caching results of shadow draws, pre-culling on both CPU and GPU, etc.).

Also async workload doesn't really have anything to do with CU count per se, unless your workloads are so tiny they can't fill up available CUs. This isn't usually the issue, the issue is poor wavefront utilization due to high resource usage (usually VGPR) resulting in low SIMD occupancy. If you have low resource workloads on compute, especially if they're ALU heavy and don't steal memory bandwidth, you can find wins. But the better you get at reducing low utilization parts of your frame, the less opportunities you have to leverage async.
 

Kezen

Banned
I have to say I was expecting much more gains due to async compute bearing in mind the hub hub about this feature.
 

c0de

Member
Rather disappointing gains from async compute again (2% PS4, 5% XBO, 12% FuryX if I'm not mistaken). But the gains from compute based culling are extreme on the other hand. GCN's geometry pipeline need to improve.

What I do find interesting is that the times for XBO and PS4 are so close together although this is data coming from the rather slow DDR3 RAM.
 

dr_rus

Member
What I do find interesting is that the times for XBO and PS4 are so close together although this is data coming from the rather slow DDR3 RAM.

Console engines are optimized for low memory bandwidth obviously. An engine which is optimized for XBO's DDR3 limitations won't benefit from PS4's x2 bandwidth as it simply won't need it. This is where the lower common denominator kicks in and where PS4 exclusive titles have an opportunity to shine by comparison.
 

dr_rus

Member

img_2050_resizegmpc6.jpg


5-10% again.
 

BigTnaples

Todd Howard's Secret GAF Account
Was really looking forward to this one...

Global Illumination in 'Tom Clancy's The Division'
Nikolay Stefanov | Technical Lead, Ubisoft Massive

Location: Room 2016, West Hall
Date: Friday, March 18
Time: 1:30pm - 2:30pm
Format: Session
Track:
Programming,
Visual Arts
Pass Type: All Access Pass, Main Conference Pass - Get your pass now!

The session will describe the dynamic global illumination system that Ubisoft Massive created for "Tom Clancy's The Division". Our implementation is based on radiance transfer probes and allows real-time bounce lighting from completely dynamic light sources, both on consoles and PC. During production, the system gives our lighting artists instant feedback and makes quick iterations possible.
The talk will cover in-depth technical details of the system and how it integrates into our physically-based rendering pipeline. A number of solutions to common problems will be presented, such as how to handle probe bleeding in indoor areas. The session will also discuss performance and memory optimization for consoles.
Takeaway

Attendees will gain understanding of the rendering techniques behind precomputed radiance transfer. We will also share what production issues we encountered and how we solved them - for example, moving the offline calculations to the GPU and managing the precomputed data size.
Intended Audience

The session is aimed at intermediate to advanced graphics programmers and tech artists. It will also be of interest to lighting artists who are interested in improving their workflow. Knowledge of key rendering techniques such as deferred shading and 3D volume mapping will be required.


Hopefully someone here will post info or slides.
 

mckmas8808

Mckmaster uses MasterCard to buy Slave drives
Console engines are optimized for low memory bandwidth obviously. An engine which is optimized for XBO's DDR3 limitations won't benefit from PS4's x2 bandwidth as it simply won't need it. This is where the lower common denominator kicks in and where PS4 exclusive titles have an opportunity to shine by comparison.

Interesting.
 

Bl@de

Member
Does anybody have a list of Youtube links to talks? Really enjoyed Carmacks VR talk and the Vulkan presentation last year. Slides alone are not that useful when it comes to presentation recap :(
 
His opinion on this? Sure.

Err do you have anything to add to that rather than saying it's just his opinion? No offence, but it just comes across as pretty snarky and pretentious that you're trying to reduce his claim to just an 'opinion' without offering any insight at all to why it maybe an opinion rather than a fact.

Not very productive to the conversation, no?
 
img_2050_resizegmpc6.jpg


5-10% again.

I never claimed it was going to bring huge increases in perf, altho i can see it bringing maybe 15 to 20 in the best use cases as developrs come to grips with it. Ands again, itsfree perf just by installing a new vers of windows and just extends the lead amd already has lately. Amd being ~30% faster under dx12 titles is pretty substantial
 

c0de

Member
You have a different one?

"Console engines are optimized for low memory bandwidth obviously."
To me it is not obvious as they started their slides with the CU specs and not with memory specs. To me it is
a) not obvious that the engine is optimized for low bandwidth and
b) coming from the wrong hypothesis, this doesn't mean that first party would make better use of it.
Especially when you look at what PC is able to do. Wouldn't it also suffer from the XBO? Or do you think that they make a console version and a PC version?
The biggest difference as shown in their starting slides is GPU performance and then in the end results are shown. Drawing the conclusion that the difference isn't that big because they "gimped" the PS4 console engine version sounds strange and is not proved by the slides.
That's why I think it is not "obvious".
 
If there's one thing I've learned posting on GAF, it's that real technical knowledge is not particularly valued--these discussions are almost always proxies for validating some pre-existing thesis along a YCS axis. It is what it is.
 

dr_rus

Member
"Console engines are optimized for low memory bandwidth obviously."
To me it is not obvious as they started their slides with the CU specs and not with memory specs. To me it is
a) not obvious that the engine is optimized for low bandwidth and
b) coming from the wrong hypothesis, this doesn't mean that first party would make better use of it.
Especially when you look at what PC is able to do. Wouldn't it also suffer from the XBO? Or do you think that they make a console version and a PC version?
The biggest difference as shown in their starting slides is GPU performance and then in the end results are shown. Drawing the conclusion that the difference isn't that big because they "gimped" the PS4 console engine version sounds strange and is not proved by the slides.
That's why I think it is not "obvious".
You think PC doesn't suffer from these optimizations? Take a look at how Fury X is doing against 980Ti. Optimizing for low bandwidth is the first thing you should do on a console h/w, CUs and all else comes after that or as a results of that - allowing you to load the h/w with math while the memory fetch is running.
When your shaders are balanced in such a way which allows them to execute on XBO without stalling from memory latency - any additional bandwidth is essentially wasted on them because they can't saturate it unless the GPU is proportionally faster in math. This may be somewhat true for PC h/w but for PS4 the GPU is just 30% more powerful while the bandwidth it has is 2.5 times more. This will lead to bandwidth being underused on PS4 in multiplatform titles. Which is actually illustrated quite well by most PS4 versions of multiplatform titles running in a higher resolution.
 
I have to say I was expecting much more gains due to async compute bearing in mind the hub hub about this feature.
I think if you are already pushing the GPU hard in a lot of places to produce a frame, it is hard to get a lot out of it as there is just too much "pressure".
This was also great. I like how he went into the sample pattern and even a way to deal with the resolve/disolve artifact problem.

Cool!

edit:
Flying WildHog's Krzysztof Narkowicz ‏put out a blog collating all the GDC papers he could find atm.
 
I think if you are already pushing the GPU hard in a lot of places to produce a frame, it is hard to get a lot out of it as there is just too much "pressure".

Pretty sure the dev all profiled and look up where the gaps were, and optimized their rendering pipeline/process to minimize those gaps.
 

dr_rus

Member

This post is updated with:

“Math for Game Programmers: Building A Better Jump” – Kyle Pittman (Minor Key Games)
“An Excursion in Temporal Supersampling” – Marco Salvi (NVIDIA)
“Improving Geometry Culling for ‘Deus Ex: Mankind Divided'” – Nicolas Trudel (Eidos-Montréal)
“Fast, Flexible, Physically-Based Volumetric Light Scattering” – Nathan Hoobler (NVIDIA) - this one is NV's Volumetric Lightning actually which they've open sourced on GDC
“Building Paragon in Unreal Engine 4” – Benn Gallagher, Martin Mittring (Epic Games)

“Digital Humans: Crossing the Uncanny Valley in Unreal Engine 4” – (Epic Games)
“A Real-Time Rendered Future” – (Epic Games)
“Visual Effects Roundtable” – Drew Skillman (Google)

etc

Some slides from “Building Paragon in Unreal Engine 4” talk:

croppercapture1its1k.png
croppercapture2dqsy5.png
croppercapture37rsdc.png
croppercapture4ufs1f.png
 
I'll never forget my shock when I first started working on a UE3 game many years ago and found they had no parallel command buffer generation, and only two SPU modules, both packed into the same elf, one of which was just EDGE culling, the other fragment program patching of course. We actually ran out of bits in a 64-bit flag mask on MW3 because we had so many modules (my port of some of the core physics modules pushing it over the limit). Good times.
 

bj00rn_

Banned
Fixed foveated rendering is just having the center of each eye higher quality, and doesn't change depending on where your eyes are actually looking, right?

Do you spend a majority of your time looking at the center of screen, and using your head to look around? If so, that seems like a decent solution to reduce resources required.

This is a really bad solution which will have negative impacts in experiences where you use your eyes to the extents like you do in real life - Unless it's somewhat synchronized with lens-flaws using resources which otherwise wouldn't be used.
 

MIMF

Member
Anyone could download correctly the remedy paper from the OT? After downloading it I just end up with a broken file with all the sheets empty.
 

dr_rus

Member
Anyone could download correctly the remedy paper from the OT? After downloading it I just end up with a broken file with all the sheets empty.

Works fine here. Something must be interfering with the download on your connection.
 

MIMF

Member
Works fine here. Something must be interfering with the download on your connection.

Thanks, I opened it with the PowerPoint online client from OneDrive and it finally opened well, for some reason my installed PowerPoint does not like the file.

Regarding the article/presentation, I was expecting much more information specially after all the recent news from the game rendering internals, pretty disappointing as it is just a DX12 small walkthrough.
 

dr_rus

Member
A new PDF: DirectX 12 Advancements / Max McMullen, Direct3D Development Lead; Chas. Boyd, DirectX PM; Microsoft Silicon, Graphics and Media (SigMA)

Dammit... =)

Thanks, I opened it with the PowerPoint online client from OneDrive and it finally opened well, for some reason my installed PowerPoint does not like the file.

Regarding the article/presentation, I was expecting much more information specially after all the recent news from the game rendering internals, pretty disappointing as it is just a DX12 small walkthrough.

Yeah, it's pretty basic.
 
Top Bottom