Since this is the most visited tech-related thread on NX at the moment, a question: Nvidia flops aren't equal to AMD flops, but what's the (possible) exchange rate between them? A few days ago, I was hearing it was 1 Nvidia flop = ~ 1.15 AMD flops, while GhostTrick stated it was "more like 1.3". I'm pretty curious to hear the opinion of other posters with good tech knowledge on the matter.
Obviously the very concept of "exchange rates" between FLOPs is a bit silly, but I'll still give answering the question a go.
FIrst if all, the basic idea of all of this is getting from FLOPs to some estimate of their relative in-game performance. As such, it's really not about converting FLOPs -- if you run a well-optimized version of a pure FLOPs benchmark like DGEMM on these GPUs, they'll get very close to their theoretical rates, without a significant difference in utilization.
When it comes to games, the situation is more complex. The reasons these rates like "1.15" or "1.30" come about is that people compare the performance of a set of PC games on GPUs by NV and AMD respectively. Now, there are two main issues when converting such an observation (which
is valid for the PC side of things) to a situation such as comparing a custom console APU with a custom handheld SoC:
- A GPU is not just its shader units. These differences might well come about partially due to a difference in the relative number of e.g. TMUs and ROPs, which might not manifest in the same way when comparing these custom parts.
- The driver and optimization situations may be quite different. On PC, one reason that you might get factors as large as 1.3 and as low as 1.15 is that AMD's DX11 (and worse, OpenGL) drivers are comparatively bad at actually extracting the real GPU performance from a given card. On dedicated consoles (or handhelds) you'd hope that at least for high-end games better hardware utilization is achieved.
Is FP16 useful in real world scenarios?
The short answer is yes. The long answer is that programmers would need to take care to consider, for each shader calculation, whether it can use FP16 or needs FP32, which is a very platform-specific optimization which currently would only make sense for Tegra out of all the mainstream gaming platforms, so it's probably not something which will find wide adoption.